CNN vs ANN: Interactive Architecture Analysis

Draw digits → Watch real-time processing → Understand mathematical foundations

🎨 Interactive Input

Draw & Process
Draw Here
Brush: 18px
Processed
28×28 pixels
Brush:
Eraser:
Quick Digits

📊 Performance Comparison

Key Metrics
ANN Parameters
CNN Parameters
Translation Stability Test
ANN
CNN

🎯 Key Takeaway

CNNs achieve 82% parameter reduction while being more robust to image translations. Perfect for vision tasks!

🧠 Neural Network Architecture Analysis

See how each network processes your digit drawings

🧠 Artificial Neural Network (ANN)

Input Layer
28×28 = 784 neurons (flattened)
Hidden Layer
64 fully connected neurons
Output Layer
10 neurons (digits 0-9)
ANN Mathematics
h = ReLU(W₁x + b₁)
y = W₂h + b₂
Parameters: 784×64 + 64×10 = 50,890

🔍 Convolutional Neural Network (CNN)

Conv1 Layer
3×3 filters, 8 feature maps
MaxPool1 Layer
2×2 pooling, 14×14 output
Conv2 Layer
3×3 filters, 16 feature maps
MaxPool2 Layer
2×2 pooling, 7×7 output
Dense Output
10 neurons (digits 0-9)
CNN Mathematics
Output[i,j] = Σ Input[i+m,j+n] × Kernel[m,n]
Size = (Input - Kernel + 2×Pad) / Stride + 1
Parameters: 3×3×8 + 3×3×8×16 + 784×10 = 9,098

Step-by-Step Convolution Operation

Input (5×5)

Kernel (3×3)

=

Output (3×3)

Step 1 of 9
Click "Next" to see step-by-step calculations
ANN Input (Flattened)
784 pixel values
CNN Conv1 Features
8 feature maps (28×28)
CNN Conv2 Features
16 feature maps (14×14)
Final Features
16 feature maps (7×7)
Receptive Field Growth
RF_layer = RF_prev + (Kernel_size - 1) × Stride_accumulated

Layer 1: 1×1 → 3×3 pixels
Layer 2: 3×3 → 4×4 pixels (after pooling)
Layer 3: 4×4 → 8×8 pixels
Final: 8×8 → 16×16 pixels
ANN Parameters
50,890
CNN Parameters
9,098
Parameter Reduction
82%
Memory Efficiency
~5x Better

Why ANNs Struggle with Images

No Spatial Structure: Treats images as flat vectors
Too Many Parameters: 784×64 = 50,240 weights in first layer
Position Dependent: Moving digit changes all connections
No Translation Invariance: Same digit in different positions = different patterns
Overfitting Prone: High parameter count leads to memorization

Why CNNs Excel at Images

Parameter Sharing: Same 3×3 filter used everywhere
Spatial Awareness: Preserves 2D structure
Translation Invariant: Detects edges/shapes anywhere
Hierarchical Learning: Edges → Shapes → Objects
Efficient: 82% fewer parameters for same task
Mathematical Comparison
ANN Computation:
Dense Layer: h = ReLU(Wx + b)
Complexity: O(input_size × output_size)
First Layer: 784 × 64 = 50,176 multiplications

CNN Computation:
Convolution: Output[i,j] = Σ Input[i+m,j+n] × Kernel[m,n]
Complexity: O(output_size × kernel_size²)
First Layer: 28×28 × 3×3 = 7,056 multiplications

CNN is ~7x more computationally efficient!