Summary
Mosaic helped one of the US’s largest regional homebuilders deploy an AI Homebuilding Engine that recommends the optimal community features based on different geographic regions. The recommender delivers data-driven insights the company can trust when planning, building, and positioning new communities.

Take Our Content to Go
Introduction
Without data analytics in today’s data-driven world, any business is operating blind. Real estate is no different. With large amounts tied up in capital, Artificial Intelligence is quickly becoming the backbone of the real estate industry.
Data science techniques can be applied to compile statistics in meaningful ways for industry predictions, decisions, and motivations on pricing, location, demographic preferences, and community planning. This collected data could be from consumer and business surveys, government or public databases, census figures, private historical data from customers, or collating information online.
In particular, predictive analytics can be a powerful tool to inform real estate planning decisions. According to McKinsey, nearly 60% of the predictive power of AI-driven real estate analytics comes from non-traditional variables, such as the quality of points of interest and proximity to points of interest. The data-driven approach of unsupervised machine learning algorithms can help identify and map new relationships between a gamut of data points.
The Need for an AI Homebuilding Engine
A regional homebuilder with communities in dozens of metropolitan areas across several states wanted to use an AI Homebuilding Engine to help bring a more systematic approach to positioning homes and new communities. The company – with a homebuilding arm that sells and constructs homes under three brands – needed a true partner for flexible, long-term, practical analytics support and reached out to Mosaic Data Science for on-demand access to high-performing data scientists.
The project would help determine a Minimum Viable Product (MVP) for community development as well as the foundational elements of customer and community segmentation. The company also wanted to leverage AI to improve its current decision-making around market positioning. Mosaic was an ideal partner to deliver a custom, innovative AI-backed solution to their legacy problem.
Defining the AI Homebuilding Engine
Mosaic applied unsupervised machine learning techniques to historic demographic and community data to better inform community positioning. The goal was to develop clear profiles of customer groups that share preferences for certain product and community attributes, which would help guide what community features to include when developing.
To optimize market positioning, Mosaic developed a predictive analytics model to recommend initial home prices and price changes for a given community, with the goal of optimizing sales while maintaining profit within acceptable targets. The homebuilding company believed they had enough historical data for base prices plus the full prices paid, including all options. For future phases, they were also prepared to supplement their data with third-party consumer and real estate data sets where necessary.
During the first phase, Mosaic worked with the company’s stakeholders to determine the combination of metrics that best aligned with their business objectives. After exploring the available data, phase two consisted of comparing the performance of multiple modeling approaches and selecting the modeling approach that performed best on the data according to the selected metrics.
AI Homebuilding Engine Phase I: Data Preparation and Analysis
Mosaic assessed available data sources, artifacts, and information to determine the value of incorporating other supporting data, potentially from external sources, to bolster available resources. This assessment aimed to identify opportunities where data availability would support high-value modeling efforts. The company had a wide array of data coming from multiple sources, which called for a practical analytics approach. The data analyzed included:
- Community characteristics data (community brands and product mix, major features, community amenities, community nearby attractions).
- Demographic data on customers (3rd-party, mortgage application, and socioeconomic data).
- Historical sales prices (base, options) by home model and community for a minimum of 5 years.
- Historical realized sales prices (total) and volume by home model and community for a minimum of 5 years.
- Historical price changes (initial revisions) by home model and community for at least 5 years.
- Profit targets and costs by home model and community for a minimum of 5 years.
- Community information for all relevant communities (brand, division, location, community market, amenities, competitors where known, etc.)
This data mining phase included:
- Data visualization: Plotting of key price and market data, including time series to identify seasonal patterns; correlation plots to identify correlated variables (including time-lagged correlations).
- Data reduction: Leveraging principal components analysis (PCA) and related techniques to reduce the dimensionality of the data and isolate underlying trends driving multiple key variables.
- Cluster/outlier analysis: Identifying natural clusters of communities, home styles, etc. in terms of pricing and sales; leveraging cluster analysis and outlier/anomaly detection techniques to identify communities, home styles, etc., that exhibit atypical patterns.
Throughout this phase of the project, Mosaic collaborated with the customer’s IT team to share insights from the analysis and solicit business insights to guide additional review or analysis activities. All initial analyses were performed in Python, with HTML-based reports for stakeholders.
At the conclusion of the first phase, using insights from the analysis and feedback from stakeholders, Mosaic collaborated with the company’s IT team to refine a work plan for developing and deploying the AI homebuilding engine. This empowered the customer to provide direct feedback on the shape and design of the predictive models to be developed in the second phase based on the initial insights drawn from the exploration phase.

AI Homebuilding Engine Phase II: Model Development, Evaluation and Deployment
After compiling a strong data set and testing its quality, Mosaic used custom techniques for feature and model selection to optimize model performance relative to community and market positioning metrics. Mosaic used Python for the development and deployment of models. This architecture would support future extensions to add additional predictive models and features.
Community Positioning Analysis
Mosaic first focused on extracting patterns in customer demographics and community types with the goal of developing clear profiles of customer groups that share preferences for certain product and community attributes. Collected customer information was used to develop community segments of product features, community amenities, and nearby attractions that together made up an offering that was attractive to a certain type of customer.
Mosaic’s team of data scientists applied unsupervised machine learning techniques to the demographic and community data using clustering algorithms. The community clustering segments were based upon community attributes related to home buying behavior, while customer clustering segments were based upon customer attributes related to home buying behavior.
The AI homebuilding engine development also involved feature engineering, combining demographic and product data fields or mathematical transformations to inform the creation of data visualizations such as heatmaps. This would be used to share insights and findings on the distribution of customer segments across community segments.
Market Positioning Analysis
Following a successful community and customer segmentation effort, the data science team worked on developing a solution that would increase the homebuilding company’s location opportunity analysis capabilities. The focus was to create an AI-driven product to help analysts evaluate the best use and positioning for a new land opportunity. Mosaic developed predictive models that would recommend initial prices and when to raise or lower the prices in response to market acceptance. This included:
- Mosaic tested various modeling approaches, including linear regression, generalized linear models, random forest, and other classification techniques.
- After completing the algorithm prototyping phase, Mosaic recommended ensemble models combining two or more model classes as the best performer.
- Incorporation of external input data (external market indices, consumer data profiles, and community features) to improve model performance.
- Mosaic aided the deployment of models to production, including integration with production data sources, development of supporting software to enable scheduled model execution and integration of model forecasts with front-end systems.
Mosaic conducted feature engineering and selection in collaboration with the homebuilding company’s subject matter experts to improve model performance and strike a balance between performance and model interpretability throughout all modeling activities. Priority was placed on the iterative deployment of models and model improvements to ensure that improved forecasts were providing value to end users as quickly as possible during this process.

Conclusion
Ongoing progress in predictive analytics for real estate depends on new innovations, models, and approaches, offering novel ways to transform big data and public-facing resources into fruitful machine learning analytics algorithms. Identifying potential home buyers and strong market locations correctly is among the biggest pain points of real estate companies.
This was precisely what Mosaic Data Science helped a leading real estate community developer solve. The company came to Mosaic because they had an innovative idea to pull data from different places to influence how they make decisions about which area would be profitable real-estate geography to build homes in — and they needed a practical analytics approach.
Using historical and market data, Mosaic deployed the AI homebuilding engine to help solve two components: an advanced market analysis that involved building customer clusters to help guide community planning and intelligent positioning and pricing to ensure the company stayed competitive while delivering strong ROI. The end result proved that with AI, real estate developers can eliminate guesswork and better understand which consumers are likely to buy their properties at what price, and help guide decisions on where to build their communities.