KNN — The Visual Lesson

🧠 The Big Idea

KNN asks one simple question: "Who are your closest neighbors, and what are they?"

No formulas during training. No tree to build. It just memorizes all the data and when a new student appears, it finds the K most similar students already seen — and takes a majority vote.

If most of your K neighbors passed → you pass.
If most failed → you fail.

👥 Analogy — Birds of a Feather

You're new to a city. You want to know what neighborhood you're in. Instead of a map, you just look at the 3 nearest houses.

🏠

🏚

❓

🏠

🏚

🏠

The ❓ new house looks at its 3 nearest neighbors: 2 nice 🏠, 1 rundown 🏚.
Majority says → nice neighborhood!

That's exactly how KNN classifies students.

📊 Same 5 Students — Same Data

Student	Study Hrs	Attendance %	Prev. Marks	Result
A	2	60	45	Fail
B	5	75	60	Pass
C	8	90	80	Pass
D	1	50	35	Fail
E	6	80	70	Pass

KNN stores this entire table. At prediction time, a new student walks in and the algorithm finds their K closest matches from this table.

⚔️ KNN vs Decision Tree — Quick Difference

🌳 Decision Tree

Builds rules during training
Learns: "If attendance ≥ 70..."
Fast at prediction time
Explains its reasoning

🔵 KNN

No training phase — just stores data
Looks up neighbors at prediction time
Slower for large datasets
Simple but powerful

📏 How Do We Measure "Closeness"?

We need a number to express how similar two students are. The most common way: Euclidean Distance — the straight-line distance between two points in feature space.

Distance(A, B) = √[(x₁−x₂)² + (y₁−y₂)² + (z₁−z₂)² + ...] Each feature becomes one dimension. Closer distance = more similar students.

Think of it as the ruler between two dots on a graph. Shorter ruler = more similar.

🆕 New Student X Arrives

A new student walks in:

Student	Study Hrs	Attendance %	Prev. Marks	Result
X (new)	4	70	55	???

We calculate the distance from X to every student in our data. Closest ones become the neighbors.

🧮 Calculate Distance: X to Each Student

Using all 3 features (hours, attendance, marks):

X = (4, 70, 55) Distance(X → A) = √[(4−2)² + (70−60)² + (55−45)²] = √[4 + 100 + 100] = √204 ≈ 14.28 Distance(X → B) = √[(4−5)² + (70−75)² + (55−60)²] = √[1 + 25 + 25] = √51 ≈ 7.14 ← closest! Distance(X → C) = √[(4−8)² + (70−90)² + (55−80)²] = √[16 + 400 + 625] = √1041 ≈ 32.26 Distance(X → D) = √[(4−1)² + (70−50)² + (55−35)²] = √[9 + 400 + 400] = √809 ≈ 28.44 Distance(X → E) = √[(4−6)² + (70−80)² + (55−70)²] = √[4 + 100 + 225] = √329 ≈ 18.14

🏅 Ranked by Closeness

#1 Nearest

B

Pass

dist: 7.14

#2 Nearest

A

Fail

dist: 14.28

#3 Nearest

E

Pass

dist: 18.14

#4

D

Fail

dist: 28.44

#5 Farthest

C

Pass

dist: 32.26

🗳️ What is K?

K is the number of nearest neighbors that vote on the prediction. You choose K before running the algorithm.

K = 1 → only the single closest neighbor decides (can be noisy)
K = 3 → top 3 neighbors vote (more stable)
K = 5 → top 5 neighbors vote (smoother, may miss patterns)

Odd K is preferred to avoid tie votes (e.g. 2 vs 2).

🗳️ Voting with K=3 — For Student X

We pick the 3 nearest neighbors: B, A, E

✅ PASS votes

2

B (dist 7.14)
E (dist 18.14)

❌ FAIL votes

1

A (dist 14.28)

Majority = PASS → Student X is predicted to Pass! 🎉

⚠️ How K Changes the Answer

With our 5 students, let's see what happens at different K values for Student X:

K	Neighbors Used	Pass Votes	Fail Votes	Prediction
K=1	B	1	0	Pass ✅
K=3	B, A, E	2	1	Pass ✅
K=5	B, A, E, D, C	3	2	Pass ✅

In this case, all K values agree. But with different data, a wrong K can flip the result — so choosing K matters. Common approach: try multiple K values and pick the one with best accuracy on test data.

📐 Feature Scaling — A Hidden Danger

Look at our features: Hours (1–8), Attendance (50–90), Marks (35–80).

Attendance has much bigger numbers. So small differences in attendance dominate the distance calculation, drowning out study hours.

Solution: Normalize all features to the same scale (0 to 1) before computing distances. This gives each feature an equal voice.

Normalized value = (value − min) / (max − min) Hours: min=1, max=8 → X's 4 hrs = (4−1)/(8−1) = 0.43 Attendance: min=50, max=90 → X's 70% = (70−50)/(90−50) = 0.50

🚶 Full Walkthrough — Step by Step

New student X = (4 hrs, 70%, 55 marks). We use K = 3.

1

Store all training data (no model built)

📦 Training Phase

KNN has the simplest training phase of any algorithm — there is no training. You just store the 5 rows of student data in memory. That's it.

Training KNN = storing data Time: O(1) ← instant

The real work happens at prediction time.

2

New student arrives — compute all distances

📏 Compute Distances

Student X arrives. Calculate distance from X to every stored student:

Student	Result	Distance from X	Rank
B	Pass	7.14	🥇 #1
A	Fail	14.28	🥈 #2
E	Pass	18.14	🥉 #3
D	Fail	28.44	#4
C	Pass	32.26	#5

3

Pick K=3 nearest neighbors

🎯 Select the Neighbors

Take the top 3: B, A, E. Ignore D and C — they're too far away.

X

→

B

A

E

Green = Pass, Red = Fail

4

Majority vote → final prediction

🗳️ Vote & Decide

B → PASS ✅ A → FAIL ❌ E → PASS ✅ PASS: 2 votes FAIL: 1 vote Majority = PASS → Student X is predicted: PASS ✅

The decision is made. No formula was "learned" — we just asked: who are the most similar people I've seen before?

🧮 KNN as Code

def predict_knn(new_student, data, k=3): # Step 1: calculate distance to every student distances = [] for student in data: d = euclidean_distance(new_student, student) distances.append((d, student.result)) # Step 2: sort by distance, pick K nearest distances.sort() neighbors = distances[:k] # Step 3: majority vote pass_votes = sum(1 for _, r in neighbors if r == "Pass") fail_votes = k - pass_votes return "Pass" if pass_votes > fail_votes else "Fail"

🔮 Try KNN — Predict a New Student

Set the student's features and K value. See which neighbors are chosen and how they vote.

Study Hours/day

4 hrs

Attendance %

70%

Previous Marks

55

K (neighbors)

🧠 Key Takeaways

No training — KNN stores data, learns nothing in advance
Distance measures how similar two students are
K controls how many neighbors vote
Majority vote of K neighbors = final prediction
Feature scaling is critical — big numbers dominate distance
Slower than Decision Tree at prediction time (checks all stored data)

K-Nearest Neighbors

🧠 The Big Idea

👥 Analogy — Birds of a Feather

📊 Same 5 Students — Same Data

⚔️ KNN vs Decision Tree — Quick Difference

🌳 Decision Tree

🔵 KNN

📏 How Do We Measure "Closeness"?

🆕 New Student X Arrives

🧮 Calculate Distance: X to Each Student

🏅 Ranked by Closeness

#1 Nearest

#2 Nearest

#3 Nearest

#4

#5 Farthest

🗳️ What is K?

🗳️ Voting with K=3 — For Student X

✅ PASS votes

❌ FAIL votes

⚠️ How K Changes the Answer

📐 Feature Scaling — A Hidden Danger

🚶 Full Walkthrough — Step by Step

📦 Training Phase

📏 Compute Distances

🎯 Select the Neighbors

🗳️ Vote & Decide

🧮 KNN as Code

🔮 Try KNN — Predict a New Student

🧠 Key Takeaways

⚔️ Decision Tree vs KNN — When to Use Which?

🌳 Use Decision Tree when...

🔵 Use KNN when...