Structured feature extraction from real estate listings

A company specialized in real estate appraisals needed to automate the extraction of structured features (e.g., number of rooms, floors, property type) from a large volume of unstructured real estate listings.
10/06/2022
Natural Language Processing
Real Estate

Context

Real estate listings are a rich source of data for appraisals and price forecasts. However, extracting structured features from these listings can be a tedious and error-prone process, often requiring manual effort. Named Entity Recognition (NER) models can automate this process by identifying and extracting entities such as property types, floors, number of rooms, and more directly from real estate listings. Additionally, text classification models can automate the process of tagging listings under one or more general categories, such as holiday home or ski residence.

Solution

Our solution was to use state-of-the-art Natural Language Processing (NLP) techniques to annotate a custom dataset of real estate listings and train NER and text classification models on the task of structure feature extraction. We targeted over 10 entities for the NER models and 2 categories for the text classification models. A web app was deployed to host the models and act as an interface for the end user. The application supported the testing of the models on new real estate listings and could automatically generate structured tables of real estate features directly from unstructured listings. Additionally, a clustering and recommendation engine was built to identify and suggest similar listings.

Approach

We started by collecting a dataset of real estate listings in Switzerland and manually labelling thousands of examples across several languages. To accelerate the labelling, we used a process called active learning, in which models were trained during the annotation loop to suggest annotations on unlabelled data. Once enough data were collected, separate NER and classification models were trained for each language. In parallel, a webapp interface was developed to host the finished models. The webapp interface supported the loading and running of any of the models on new real estate listings, which could be copy-pasted directly into the application.

Technologies

Python
Natural Language Processing
Docker
Azure Cloud

Challenges

  • Multilingual data: real estate listings in Switzerland may be posted in one of several languages (e.g. English, French, Italian, or German), requiring custom-trained models specialized for each language
  • Limited training data: limited public data exists for training NER or classifications models on real estate listings, requiring the data scraping and labeling to be handled in-house
  • User-friendly interface: the models must be accessible in a user-friendly UI with low latency.

Similar case studies

123444

test 1

test2
10/06/2022
Real Estate

Structured feature extraction from real estate listings

Extracting structured real estate information from text using cutting-edge NLP.
15/01/2023
Financial Services

Advanced analytics in the payment industry

Leveraging analytics and ML on transaction data to better understand consumers.
28/05/2022
Financial Services

Thematic portfolio construction for a private bank

Using NLP to score companies against investments themes.
25/04/2022
Financial Services

Automated construction of a qualitatively diversified portfolio

Mathematical modeling to combine ETF’s along mulitple objectives.
01/11/2022
FMCG

Supply chain optimization for a FMCG company

Mathematical modeling for manufacturing process optimization.

Newsletter

Stay tuned, get inspired

Join our newsletter by filling out the form below

Stay tuned !​

Don’t miss out on our latest news – subscribe to our newsletter today!