Build Your First Model

Nov 20, 2025

Cover image for Build Your First Model

Building a machine learning model isn’t rocket science. I want to show you that it’s fundamentally simple math, even if it looks intimidating from the outside. We’ll start by building a basic House Price Predictor entirely from scratch, just math and pure Python, no third-party libraries. After that, we’ll switch to the standard tools, and you’ll actually understand the “magic” happening under the hood because you’ve built it yourself first.

Features (x) and Targets (y)

Before jumping into code, it’s important to understand how a machine “sees” a problem. Think like a real estate agent: if I ask, “How much is this house?” your first question will be, “How big is it?”. In machine learning terms, we call these:

  1. Feature (x): The input data — the information you know. (Example: the size of the house.)

  2. Target (y): The output data — the value you want to predict. (Example: the price of the house.)

The goal of our model is simple: look at x and discover the mathematical relationship that leads to y. To teach a computer anything, we need examples. So let’s imagine we have historical data for three houses:

  1. House A: 600 sq ft → sold for 150,000
  2. House B: 1000 sq ft → sold for 250,000
  3. House C: 1500 sq ft → sold for 375,000

In most programming languages, we can store this data using arrays (lists). For example:

The “Magic Number” (Weights)

Now we have data, but no intelligence yet not in us, in our program. In machine learning, that “intelligence” is simply a mathematical relationship. You might wonder how that’s possible, but hold that thought for a moment. First, focus on finding a relationship of the form:

price=weightsizeprice = weight * size

This multiplier is called w (the weight). It tells us how strongly the input affects the output. A higher weight means even a small increase in size leads to a large jump in price, and vice versa.

Let’s build a simple prediction function using the sizes and prices from earlier. Look at the numbers: what is the “magic number” w that converts size into price? Multiply each size by the same value and you get the corresponding price. Even a kindergartener would spot it. The number is 250.

How did we find it? Straightforward math:

weight=price/sizeweight = price / size

Now the model can predict house prices based purely on size. A simple start, but a solid one.

The “Loss” (How wrong are we?)

In the earlier example, everything lined up perfectly. Every house followed the rule size * 250 without a single mistake. Real-world data never behaves this cleanly. Imagine a new house hits the market:

  • Size: 1200 sq ft
  • Actual Price: $330,000 (maybe it has a great view)

Using our model (w = 250), the prediction becomes:

1200250=300,0001200 * 250 = 300,000

Our model says 300,000. The real price is 330,000. We’re off by 30,000. This difference is the Loss (or Error). To compute it:

Loss=PredictedPriceActualPriceLoss = |Predicted_Price - Actual_Price|

The absolute value ensures the loss is never negative. The entire purpose of machine learning is essentially: Reduce the loss as close to zero as possible.

Hot or Cold

Now comes the interesting part. We have a model and a way to measure loss, but we still need an optimizer logic that automatically adjusts the weight to reduce the error. Think of it like tuning an old radio: you twist the knob, hear static (high error), adjust again, and keep going until the signal becomes clear.

The process:

  1. Start with a random guess for the weight (say, 0).
  2. Make a prediction.
  3. Compute the loss.
  4. Update: If the loss is high, adjust the weight.
  5. Repeat until the mistake shrinks.

Gradient Descent (The “Step”)

Brute force worked only because the correct answer 250 was small and clean. If the correct weight were 250.4217 or 1,000,000, counting one by one would be useless. Real machine learning moves intelligently, following the slope of the error toward the minimum. This method is called Gradient Descent.

Picture yourself on a foggy mountain:

  • If the ground slopes down, you walk that way.
  • If it slopes up, you turn around.

The same idea applies here. Logic:

  1. Make a prediction.
  2. If the prediction is too low, increase the weight (+ step).
  3. If the prediction is too high, decrease the weight (– step).

The “Best Fit”

So far, we’ve matched one house perfectly. Real datasets contain thousands of houses, each with noise, quirks, and outliers. Consider these two:

  1. House A: 1000 sq ft → $200,000 (ratio: 200)
  2. House B: 1000 sq ft → $300,000 (ratio: 300)

Maybe House B has a gold-plated bathroom; maybe House A has termites. Either way, no single weight can satisfy both perfectly.

  • If weight is 200, House A is perfect and House B is wrong.
  • If weight is 300, House B is perfect and House A is wrong.

When perfection is impossible, we aim to be “the least wrong.” We search for the line that best threads through the middle. This is the Line of Best Fit. To find it, we compute the Total Error:

TotalError=Error(HouseA)+Error(HouseB)Total Error = Error(House A) + Error(House B)

The best weight is the one that produces the smallest total. Data:

  • House A: size = 1000, price = 200,000
  • House B: size = 1000, price = 300,000

The Missing Piece (The Bias)

There’s a flaw in our current formula (Price = Size * Weight). If a house has a size of 0, the predicted price becomes 0. That’s obviously wrong. Even a tiny house sits on land, and land alone has value. In school, you saw this in the familiar line equation:

y=mx+cy = mx + c

Machine learning uses the same structure with different symbols:

y=wx+by = wx + b

  • w (Weight): How strongly the input affects the output (the slope).
  • b (Bias): The baseline value, the price even when the size is zero (the y-intercept).

Image of linear regression with y-intercept

With this addition, our prediction formula becomes:

[Price=(SizeWeight)+Bias][\text{Price} = ( \text{Size} \cdot \text{Weight} ) + \text{Bias}]

The prediction comes out to 300,000.

This piece completes the fundamental unit of modern AI: the linear neuron.

  • Deep Learning is thousands of these (wx + b) units stacked and connected.
  • Models like ChatGPT are billions of them working together.

Image of single neuron diagram vs neural network

At this point, the core idea is clear: Guess a weight, measure the loss, adjust the weight. Manually coding this teaches the fundamentals, but real-world projects rely on optimized libraries that perform these steps instantly for millions of items.

Next, we move to Scikit-Learn (sklearn), a widely used machine learning toolkit. Instead of writing predict or train functions by hand, we hand the data to a model object that learns the best weight and bias automatically.

  1. model = LinearRegression() — create an empty model.
  2. model.fit(x, y) — run the training loop internally.
  3. model.predict(x) — apply the learned formula ((x \times w) + b).

One detail: libraries expect data in 2D form, not plain lists like [600, 1000]. Why? Because real houses usually have multiple features (size, rooms, age, etc.). So the expected format is:

[[600], [1000], [1500]]

Here’s the full working example. Try it on your own cuz in our environment scikit-learn is not installed