Decision Tree

A complete visual lesson — from raw data to prediction

1 · The Idea
2 · Impurity
3 · Info Gain
4 · Build the Tree
5 · Try It!

🎓 The Big Idea

A Decision Tree is just a series of yes/no questions. Like a professor guessing if a student will pass — no complex math, just logic.


The machine learns which question to ask first and in what order — automatically, from data.

📊 Our Training Data

5 students, 3 features, 1 outcome. The tree will learn from this.

Student Study Hrs/day Attendance % Prev. Marks Result
A26045Fail
B57560Pass
C89080Pass
D15035Fail
E68070Pass

❓ The Core Question

We have 3 features. Which one should we ask about first?


Not randomly — we pick the feature that best separates Pass from Fail students. That's the Root Node.


To measure "best separation" we use a concept called Entropy (Impurity) →

🔀 What is Impurity (Entropy)?

Entropy tells us how mixed a group is.


Goal: Find splits that make groups as pure as possible (low entropy).

Pure Node

Attendance ≥ 70%

All students PASS

H = 0

Impure Node

Hours ≥ 4

Mixed result

H = High

🧮 The Formula

For a node with proportion p of one class and q of the other:

Entropy(node) = −p·log₂(p) − q·log₂(q) Example: 3 Pass, 2 Fail (total 5 students) p(Pass) = 3/5 = 0.6 p(Fail) = 2/5 = 0.4 H = −0.6·log₂(0.6) − 0.4·log₂(0.4) H ≈ 0.971 ← quite mixed

Don't memorize the formula. Remember the concept: lower entropy = purer group = better split.

💡 Analogy

Imagine a bag of colored balls.


A good split creates bags where you're not surprised by what you pull out.

📈 Information Gain — Picking the Root Node

After splitting on a feature, how much did entropy drop? That drop is the Information Gain.

Info Gain = Entropy(before split) − Weighted Entropy(after split) Higher Info Gain = Better feature to split on first

We calculate this for every feature, then pick the winner.

🏆 Race: Which Feature Wins?

The algorithm tests all 3 features on our 5 students:

Attendance ≥ 70?

|

Pure groups!

HIGH ⭐

ROOT NODE

Hours ≥ 4?

|

Mixed groups

MED

Prev.Marks ≥ 55?

|

Mixed groups

MED

Attendance wins — it creates the two purest groups. So it becomes the Root Node.

🔄 The Loop — How We Move to the Next Node

After the root node splits the data, each branch gets its own subset of students. The exact same process repeats on that subset:

🌳 Building the Tree — Level by Level

1
Root Node — highest Info Gain across all features
Attendance ≥ 70?
↙ NO
❌ FAIL
Students A, D

Entropy = 0
Pure! Stop here.
↘ YES
Hours ≥ 4?
B, C, E still mixed
→ Split again!
↙ NO
❌ FAIL
↘ YES
✅ PASS
B, C, E
Entropy = 0
Pure! Stop.

📋 The Tree as Code

The whole tree is literally just if-else:

if attendance >= 70: if hours >= 4: return "PASS" ✅ else: return "FAIL" ❌ else: return "FAIL" ❌

No magic. Just structured questions learned from data. Scores 100% on training data.

⚠️ But Is It Too Perfect?

100% on training data sounds great — but the tree might have memorized these 5 students instead of learning general rules. This is Overfitting.


Solution: Limit tree depth, require minimum samples per split, or use a Random Forest (many trees vote together).

🔮 Try the Tree — Predict a New Student

Adjust the sliders and see the tree's decision step by step.

65%
3 hrs

🗺️ Decision Map

❌ FAIL if:
  • Attendance < 70% → always Fail
  • Attendance ≥ 70% BUT Hours < 4
✅ PASS if:
  • Attendance ≥ 70% AND Hours ≥ 4

🧠 Key Takeaways