❌ No Spatial Structure: Treats images as flat vectors
❌ Too Many Parameters: 784×64 = 50,240 weights in first layer
❌ Position Dependent: Moving digit changes all connections
❌ No Translation Invariance: Same digit in different positions = different patterns
❌ Overfitting Prone: High parameter count leads to memorization
Why CNNs Excel at Images
✅ Parameter Sharing: Same 3×3 filter used everywhere
✅ Spatial Awareness: Preserves 2D structure
✅ Translation Invariant: Detects edges/shapes anywhere
✅ Hierarchical Learning: Edges → Shapes → Objects
✅ Efficient: 82% fewer parameters for same task
Mathematical Comparison
ANN Computation:
Dense Layer: h = ReLU(Wx + b)
Complexity: O(input_size × output_size)
First Layer: 784 × 64 = 50,176 multiplications