A complete visual lesson — from raw data to prediction
A Decision Tree is just a series of yes/no questions. Like a professor guessing if a student will pass — no complex math, just logic.
The machine learns which question to ask first and in what order — automatically, from data.
5 students, 3 features, 1 outcome. The tree will learn from this.
| Student | Study Hrs/day | Attendance % | Prev. Marks | Result |
|---|---|---|---|---|
| A | 2 | 60 | 45 | Fail |
| B | 5 | 75 | 60 | Pass |
| C | 8 | 90 | 80 | Pass |
| D | 1 | 50 | 35 | Fail |
| E | 6 | 80 | 70 | Pass |
We have 3 features. Which one should we ask about first?
Not randomly — we pick the feature that best separates Pass from Fail students. That's the Root Node.
To measure "best separation" we use a concept called Entropy (Impurity) →
Entropy tells us how mixed a group is.
Goal: Find splits that make groups as pure as possible (low entropy).
Attendance ≥ 70%
All students PASS
Hours ≥ 4
Mixed result
For a node with proportion p of one class and q of the other:
Don't memorize the formula. Remember the concept: lower entropy = purer group = better split.
Imagine a bag of colored balls.
A good split creates bags where you're not surprised by what you pull out.
After splitting on a feature, how much did entropy drop? That drop is the Information Gain.
We calculate this for every feature, then pick the winner.
The algorithm tests all 3 features on our 5 students:
Pure groups!
ROOT NODE
Mixed groups
Mixed groups
Attendance wins — it creates the two purest groups. So it becomes the Root Node.
After the root node splits the data, each branch gets its own subset of students. The exact same process repeats on that subset:
The whole tree is literally just if-else:
No magic. Just structured questions learned from data. Scores 100% on training data.
100% on training data sounds great — but the tree might have memorized these 5 students instead of learning general rules. This is Overfitting.
Solution: Limit tree depth, require minimum samples per split, or use a Random Forest (many trees vote together).
Adjust the sliders and see the tree's decision step by step.