K-Nearest Neighbors

Same students. Different algorithm. No training — just memory.

1 · The Idea
2 · Distance
3 · K & Voting
4 · Walkthrough
5 · Try It!

🧠 The Big Idea

KNN asks one simple question: "Who are your closest neighbors, and what are they?"


No formulas during training. No tree to build. It just memorizes all the data and when a new student appears, it finds the K most similar students already seen — and takes a majority vote.


If most of your K neighbors passed → you pass.
If most failed → you fail.

👥 Analogy — Birds of a Feather

You're new to a city. You want to know what neighborhood you're in. Instead of a map, you just look at the 3 nearest houses.


🏠
🏠
🏚
🏠
🏚
🏠

The ❓ new house looks at its 3 nearest neighbors: 2 nice 🏠, 1 rundown 🏚.
Majority says → nice neighborhood!


That's exactly how KNN classifies students.

📊 Same 5 Students — Same Data

Student Study Hrs Attendance % Prev. Marks Result
A26045Fail
B57560Pass
C89080Pass
D15035Fail
E68070Pass

KNN stores this entire table. At prediction time, a new student walks in and the algorithm finds their K closest matches from this table.

⚔️ KNN vs Decision Tree — Quick Difference

🌳 Decision Tree

  • Builds rules during training
  • Learns: "If attendance ≥ 70..."
  • Fast at prediction time
  • Explains its reasoning

🔵 KNN

  • No training phase — just stores data
  • Looks up neighbors at prediction time
  • Slower for large datasets
  • Simple but powerful

📏 How Do We Measure "Closeness"?

We need a number to express how similar two students are. The most common way: Euclidean Distance — the straight-line distance between two points in feature space.

Distance(A, B) = √[(x₁−x₂)² + (y₁−y₂)² + (z₁−z₂)² + ...] Each feature becomes one dimension. Closer distance = more similar students.

Think of it as the ruler between two dots on a graph. Shorter ruler = more similar.

🆕 New Student X Arrives

A new student walks in:

StudentStudy HrsAttendance %Prev. MarksResult
X (new) 4 70 55 ???

We calculate the distance from X to every student in our data. Closest ones become the neighbors.

🧮 Calculate Distance: X to Each Student

Using all 3 features (hours, attendance, marks):

X = (4, 70, 55) Distance(X → A) = √[(4−2)² + (70−60)² + (55−45)²] = √[4 + 100 + 100] = √204 ≈ 14.28 Distance(X → B) = √[(4−5)² + (70−75)² + (55−60)²] = √[1 + 25 + 25] = √51 ≈ 7.14 ← closest! Distance(X → C) = √[(4−8)² + (70−90)² + (55−80)²] = √[16 + 400 + 625] = √1041 ≈ 32.26 Distance(X → D) = √[(4−1)² + (70−50)² + (55−35)²] = √[9 + 400 + 400] = √809 ≈ 28.44 Distance(X → E) = √[(4−6)² + (70−80)² + (55−70)²] = √[4 + 100 + 225] = √329 ≈ 18.14

🏅 Ranked by Closeness

#1 Nearest

B
Pass
dist: 7.14

#2 Nearest

A
Fail
dist: 14.28

#3 Nearest

E
Pass
dist: 18.14

#4

D
Fail
dist: 28.44

#5 Farthest

C
Pass
dist: 32.26

🗳️ What is K?

K is the number of nearest neighbors that vote on the prediction. You choose K before running the algorithm.



Odd K is preferred to avoid tie votes (e.g. 2 vs 2).

🗳️ Voting with K=3 — For Student X

We pick the 3 nearest neighbors: B, A, E

✅ PASS votes

2

B (dist 7.14)
E (dist 18.14)

❌ FAIL votes

1

A (dist 14.28)

Majority = PASS → Student X is predicted to Pass! 🎉

⚠️ How K Changes the Answer

With our 5 students, let's see what happens at different K values for Student X:


KNeighbors UsedPass VotesFail VotesPrediction
K=1B10Pass ✅
K=3B, A, E21Pass ✅
K=5B, A, E, D, C32Pass ✅

In this case, all K values agree. But with different data, a wrong K can flip the result — so choosing K matters. Common approach: try multiple K values and pick the one with best accuracy on test data.

📐 Feature Scaling — A Hidden Danger

Look at our features: Hours (1–8), Attendance (50–90), Marks (35–80).


Attendance has much bigger numbers. So small differences in attendance dominate the distance calculation, drowning out study hours.


Solution: Normalize all features to the same scale (0 to 1) before computing distances. This gives each feature an equal voice.

Normalized value = (value − min) / (max − min) Hours: min=1, max=8 → X's 4 hrs = (4−1)/(8−1) = 0.43 Attendance: min=50, max=90 → X's 70% = (70−50)/(90−50) = 0.50

🚶 Full Walkthrough — Step by Step

New student X = (4 hrs, 70%, 55 marks). We use K = 3.

1
Store all training data (no model built)

📦 Training Phase

KNN has the simplest training phase of any algorithm — there is no training. You just store the 5 rows of student data in memory. That's it.

Training KNN = storing data Time: O(1) ← instant

The real work happens at prediction time.

2
New student arrives — compute all distances

📏 Compute Distances

Student X arrives. Calculate distance from X to every stored student:

StudentResultDistance from XRank
BPass7.14🥇 #1
AFail14.28🥈 #2
EPass18.14🥉 #3
DFail28.44#4
CPass32.26#5
3
Pick K=3 nearest neighbors

🎯 Select the Neighbors

Take the top 3: B, A, E. Ignore D and C — they're too far away.


X
B
A
E

Green = Pass, Red = Fail

4
Majority vote → final prediction

🗳️ Vote & Decide

B → PASS ✅ A → FAIL ❌ E → PASS ✅ PASS: 2 votes FAIL: 1 vote Majority = PASS → Student X is predicted: PASS ✅

The decision is made. No formula was "learned" — we just asked: who are the most similar people I've seen before?

🧮 KNN as Code

def predict_knn(new_student, data, k=3): # Step 1: calculate distance to every student distances = [] for student in data: d = euclidean_distance(new_student, student) distances.append((d, student.result)) # Step 2: sort by distance, pick K nearest distances.sort() neighbors = distances[:k] # Step 3: majority vote pass_votes = sum(1 for _, r in neighbors if r == "Pass") fail_votes = k - pass_votes return "Pass" if pass_votes > fail_votes else "Fail"

🔮 Try KNN — Predict a New Student

Set the student's features and K value. See which neighbors are chosen and how they vote.

4 hrs
70%
55

🧠 Key Takeaways

⚔️ Decision Tree vs KNN — When to Use Which?

🌳 Use Decision Tree when...

  • You need to explain the decision
  • Fast predictions on large data
  • Features have clear thresholds

🔵 Use KNN when...

  • Data has complex boundaries
  • Dataset is small
  • You want a quick baseline