askvity

How Filters Are Selected in CNN?

Published in CNN Filters Learning 4 mins read

In Convolutional Neural Networks (CNNs), filters are not explicitly "selected" by a human designer in the sense of choosing specific patterns. Instead, the values within the filters are automatically learned from the data during the training process.

What Are CNN Filters?

Filters, also known as kernels, are small matrices of numbers that slide over the input image (or feature maps from previous layers) during the convolution operation. Each filter is designed to detect a specific feature, such as edges, curves, corners, or textures, at different locations in the image.

The Learning Process: From Initialization to Optimization

The "selection" of effective filters in a CNN is fundamentally driven by the learning algorithm, primarily backpropagation.

  1. Filter Initialization

    Initially, the values within these filter matrices are set. As stated in the reference, we can Initialize filter value with random 1 and 0 in a matrix. While random 1s and 0s are one way, other common initialization strategies include random values drawn from a specific distribution (like Gaussian or uniform), sometimes scaled appropriately (e.g., Xavier/Glorot initialization, He initialization). The key is that they start as arbitrary values.

  2. Forward Pass and Prediction

    During training, an input image passes through the CNN. The filters perform convolutions, creating feature maps. These feature maps are then processed by subsequent layers (e.g., pooling, activation functions, fully connected layers) to produce a final output prediction (e.g., image class).

  3. Calculating Loss

    The CNN's prediction is compared to the true label (the correct answer) using a loss function. This function quantifies how wrong the prediction was.

  4. Backpropagation and Gradient Descent

    The calculated loss is then propagated backward through the network. Using an optimization algorithm like gradient descent, the network calculates the gradient of the loss with respect to each weight in the network – including the values inside the filters.

  5. Updating Filter Values

    Based on these gradients, the values within the filters are adjusted slightly. The goal is to change the filter values in a direction that reduces the loss for the given input and its corresponding label. The filter value are learnt during training (i.e. during backpropagation). Hence, the individual values of the filters are often called the weights of a CNN. Through many iterations over a large dataset, the filters gradually evolve to become detectors for features that are most relevant for solving the task (e.g., recognizing objects, classifying images).

Analogy

Think of it like teaching a child to recognize shapes. You don't tell them the exact numerical pattern for a circle. You show them many examples of circles and non-circles. They initially guess randomly, you provide feedback ("Yes, that's a circle," or "No, that's a square"), and over time, they learn the features that define a circle. The filters are like the child's internal "feature detectors" that learn based on the provided examples and feedback (loss).

Choosing Filter Parameters

While the values within the filters are learned, the structure and number of filters are hyperparameters that are chosen by the network designer.

Common Filter Parameters:

  • Filter Size (Kernel Size): Typically small (e.g., 3x3, 5x5). Smaller filters can capture local patterns, while larger filters can cover a wider receptive field.
  • Number of Filters: More filters in a layer allow the network to learn a greater variety of features. The number often increases in deeper layers.
  • Stride: The step size the filter takes as it slides across the input. A larger stride reduces the size of the output feature map.
  • Padding: Adding extra pixels (often zeros) around the input boundary. This helps control the spatial size of the output and allows filters to process pixels at the edges.

These parameters are usually determined through experimentation and validation on a separate dataset (hyperparameter tuning) or by following common practices from successful CNN architectures (like LeNet, AlexNet, VGG, ResNet).

In summary, CNN filters are not pre-selected patterns; their specific values (weights) are automatically learned from data during the training process using backpropagation, starting from an initial state (e.g., random values).

Related Articles