Convolutional Neural Networks
Convolutional Neural Networks

Convolutional Neural Networks (CNNs) – Complete Guide with Examples
Deep Learning Series

Convolutional Neural Networks

A complete visual guide to CNNs — from pixel to prediction, with architecture diagrams, math, examples, and real-world applications.

The Brain Behind Computer Vision

A Convolutional Neural Network (CNN) is a specialized deep learning architecture designed to process grid-like data, most famously images. Unlike traditional neural networks that flatten all pixels into a single vector, CNNs exploit the spatial structure of images — understanding that nearby pixels are more closely related than distant ones.

CNNs were inspired by the visual cortex of animals, where neurons respond to stimuli in specific, localized regions of the visual field. They achieve state-of-the-art performance on tasks like image classification, object detection, face recognition, and medical diagnosis.

Key Insight: A CNN learns feature detectors (filters) automatically from data — from simple edges and colors in early layers, to complex shapes and objects in deeper layers.

Layer-by-Layer Breakdown

A CNN processes input through a pipeline of specialized layers. Each layer learns to detect increasingly complex features.

INPUT
IMG
32×32×3
CONV
+ReLU
30×30×32
CONV
+ReLU
28×28×32
MAX
POOL
14×14×32
CONV
+ReLU
12×12×64
MAX
POOL
6×6×64
FC
512
Flatten
OUT
10
Softmax

🔍 Convolutional Layer

Slides learned filters across the input to create feature maps. Detects edges, textures, patterns.

📉 Pooling Layer

Reduces spatial dimensions (downsampling), making the network more efficient and translation-invariant.

🔗 Fully Connected Layer

Maps extracted features to output classes. Traditional feedforward neurons that produce final predictions.

How Filters Extract Features

The core operation in a CNN is the convolution. A small matrix called a kernel (or filter) slides across the input image, computing element-wise multiplications and summing them to produce a single output value per position.

Input (5×5)
1
0
1
0
1
0
1
1
0
0
1
0
1
1
0
0
1
0
1
1
1
0
0
1
0
Kernel (3×3)
1
0
1
0
1
0
1
0
1
=
Output Value
4
(0×1+1×0+1×1+1×0+0×1+1×0+0×1+1×0+0×1) = 4
Stride & Padding: Stride controls how many pixels the filter moves at a time. Padding adds zeros around the border to control output size. Output size = ⌊(N − F + 2P) / S⌋ + 1

🔲 Edge Detection Kernel

// Sobel-X: detect vertical edges
[[-1,  0,  1],
 [-2,  0,  2],
 [-1,  0,  1]]

🔵 Blur Kernel

// Gaussian Blur 3×3
[[1/16, 2/16, 1/16],
 [2/16, 4/16, 2/16],
 [1/16, 2/16, 1/16]]

Reducing Spatial Dimensions

Pooling layers downsample feature maps while retaining the most important information. The most common type is Max Pooling, which takes the maximum value from each region.

Input (4×4)
3
1
2
5
7
2
8
1
4
6
9
3
1
2
5
7
Max Pool (2×2, stride=2)
7
8
6
9

Why Pooling?

  • Reduces computation
  • Controls overfitting
  • Provides spatial invariance
  • Retains dominant features

Introducing Non-Linearity

Without activation functions, a CNN would just be a series of linear transformations. ReLU (Rectified Linear Unit) is the most widely used activation function in CNNs.

x y ReLU
f(x) = max(0, x)

Common Activation Functions

ReLU
f(x) = max(0, x) — Fast, sparse, default choice
Leaky ReLU
f(x) = max(0.01x, x) — Fixes “dying ReLU” problem
Sigmoid
f(x) = 1/(1+e⁻ˣ) — Binary outputs
Softmax
Multi-class output probabilities

CNN in Python (TensorFlow/Keras)

Below is a complete CNN implementation for CIFAR-10 image classification — 10 classes, 32×32 RGB images.

# Import required libraries
import tensorflow as tf
from tensorflow.keras import layers, models

# Load and preprocess the CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build the CNN model
model = models.Sequential([

    # Block 1: Convolution → ReLU → MaxPool
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),

    # Block 2: Deeper convolutions
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),

    # Block 3: More filters
    layers.Conv2D(128, (3, 3), activation='relu'),

    # Classifier head
    layers.Flatten(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.4),
    layers.Dense(10, activation='softmax')  # 10 classes
])

# Compile: optimizer + loss function
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

model.summary()

# Train the model
history = model.fit(
    x_train, y_train,
    epochs=20,
    batch_size=64,
    validation_data=(x_test, y_test)
)
PyTorch alternative: You can also build the same CNN using torch.nn.Conv2d, torch.nn.MaxPool2d, and torch.nn.Linear layers within a custom nn.Module class.

Where CNNs Excel

🩺

Medical Imaging

Detecting tumors in X-rays and MRIs, diagnosing diabetic retinopathy from eye scans with over 90% accuracy.

🚗

Autonomous Vehicles

Real-time detection of pedestrians, traffic signs, lane boundaries, and obstacles for self-driving systems.

📱

Face Recognition

iPhone Face ID, social media auto-tagging, and security systems rely on deep CNN face embeddings.

🛍️

Visual Search

Google Lens, Pinterest visual search, and Amazon’s product search use CNNs to match images.

🎮

Game AI

DeepMind’s AlphaGo and Atari-playing agents use CNNs to process game screens as raw pixel input.

🌍

Satellite Analysis

Detecting deforestation, mapping roads, and monitoring climate change using aerial and satellite imagery.

Landmark Models in History

LeNet-5
1998

The original CNN by Yann LeCun. Used for handwritten digit recognition (MNIST). Introduced the core conv-pool-FC pattern that all modern CNNs follow.

AlexNet
2012

Won ImageNet 2012 by a massive margin. First to use ReLU, Dropout, and GPU training at scale. Sparked the deep learning revolution.

VGGNet
2014

Showed that depth matters — 16–19 layers of 3×3 convolutions outperformed shallower architectures with larger filters.

GoogLeNet
2014

Introduced the Inception module — multiple filter sizes in parallel. Achieved better accuracy with fewer parameters than VGG.

ResNet
2015

Introduced skip connections (residual blocks) enabling 100–1000+ layer networks. Solved vanishing gradients. Still used as backbone today.

EfficientNet
2019

Google’s compound scaling method — scales depth, width, and resolution simultaneously. Best accuracy-to-compute ratio on ImageNet.

Convolutional Neural Networks — Deep Learning Guide  ·  Built with Claude

For educational purposes · TensorFlow · PyTorch · Keras

Bestseller #4

Leave a Reply

Your email address will not be published. Required fields are marked *