PyTorch: Review of Conv Layer & Pooling Layer

Date: 2023.05.21

* The PyTorch series will mainly touch on the problem I faced. For actual code, check out my github repository.

[Data Size depending on Layer]

In a convolutional neural network (CNN), when the initial input goes through a convolutional layer, the data matrix size can shrink depending on the configuration of the convolutional layer.

A convolutional layer applies a set of filters or kernels to the input data. Each filter scans across the input data, performing a dot product operation between its weights and a local receptive field of the input. The output of this operation is typically referred to as a feature map.

When the convolutional layer is applied to the input, the size of the feature map can change due to several factors, such as the size of the input, the size of the filters, and the stride of the convolution operation.

Let's consider a simple example to illustrate this. Suppose we have an input image of size 5x5 pixels, and we apply a 3x3 filter with a stride of 1 (meaning the filter moves one pixel at a time). In this case, the filter will scan across the input image, performing the dot product operation and generating a feature map. Since the filter has a smaller size than the input, the feature map will have a smaller size as well.

In this example, the input image is 5x5 pixels, and the filter is 3x3. As the filter scans across the input, it generates a feature map of size 3x3. So, the data matrix size has indeed shrunk.

Important Note: The shrinking of the data matrix size in the convolutional layer is different from the pooling layer. In the pooling layer, the objective is to downsample the feature maps, reducing their spatial dimensions while retaining important features. Pooling operations like max pooling or average pooling typically divide the input feature map into non-overlapping regions and perform an operation (e.g., taking the maximum value) to downsample the data. This downsampling process explicitly reduces the size of the feature maps.

In summary, in a convolutional layer, the data matrix size can shrink depending on factors like input size, filter size, and stride. This is distinct from the downsampling performed by pooling layers, which intentionally reduce the spatial dimensions of the feature maps. The pooling layer is indeed used to deliberately shrink the data matrix size in order to extract important features from the previous layer's output.

[Food for Thought]

Q1. Why is downsampling performed in the pooling layer effective in the process to extract important information?

Downsampling performed in the pooling layer is effective in extracting important information due to its ability to reduce spatial dimensions while preserving relevant features. Pooling layers typically operate on individual regions of the input feature map and perform a downsampling operation (such as taking the maximum value or computing the average). This downsampling helps in achieving translation invariance and reducing the sensitivity of the model to small spatial variations in the input.

By downsampling, pooling layers help to reduce the computational complexity of subsequent layers, reduce overfitting, and increase the receptive field of the network. The pooling operation effectively summarizes the information in each region by capturing the most prominent or representative features.

Input Feature Map (4x4):

2 1 5 3

6 4 9 8

7 2 3 1

5 4 2 6

Max Pooling (2x2, stride 2):

6 9

7 6

In this example, the input feature map is 4x4, and we apply max pooling with a 2x2 window and a stride of 2. The max pooling operation divides the input into non-overlapping regions of size 2x2 and selects the maximum value within each region. As a result, the output feature map size is reduced to 2x2, effectively downsampling the information while retaining the most salient features.

Mathematically, the downsampling operation in max pooling can be represented as follows:

Output[i, j] = max(Input[2i, 2j], Input[2i, 2j+1], Input[2i+1, 2j], Input[2i+1, 2j+1])

Q2. Is it a common practice to let the data size shrink in the process of convolutional layers? Also, what are some common practices to prevent size shrinking?

Yes, it is a common practice to let the data size shrink in the process of convolutional layers. The shrinking of the data matrix size is often intentional and beneficial for several reasons:

a. Reduced computational complexity: Smaller feature maps require fewer computations in subsequent layers, leading to faster training and inference.

b. Increased receptive field: By reducing the spatial dimensions, convolutional layers with smaller feature maps allow the model to capture larger contextual information. This enlarged receptive field helps the network to learn spatial relationships and global patterns.

c. Feature extraction: The shrinking of data size can enhance the network's ability to extract important and discriminative features. By downsampling, the pooling layer retains the most prominent features while discarding less relevant details.

To prevent excessive size shrinking, other techniques are employed, such as:

a. Padding: Padding involves adding extra pixels or values around the borders of the input feature map. Padding helps to preserve the spatial dimensions and avoid significant size reduction during convolutional operations. It ensures that the output size matches the input size when using appropriate padding configurations.

b. Strided convolutions: Strided convolutions perform convolutional operations with a larger stride, causing the output feature map to have reduced spatial dimensions. However, this technique is used selectively when intentional downsampling is desired.

c. Skip connections: Skip connections, like those used in residual networks (ResNet), allow information to bypass certain layers. This helps to preserve information and gradients during training, mitigating the negative effects of excessive downsampling.

Overall, allowing the data size to shrink in convolutional layers is a common practice, but techniques like padding, strided convolutions, and skip connections are employed to control the extent of size reduction and ensure the effective functioning of the network.

728x90

저작자표시 변경금지

'Tech Development > Computer Vision (PyTorch)' 카테고리의 다른 글

PyTorch: The End (0)	2023.07.08
PyTorch: DataLoaders & Batches (0)	2023.05.18
PyTorch: Classification Metrics (0)	2023.05.18
PyTorch: Results & Trouble Shooting (0)	2023.05.15
PyTorch: Multiclass Classification Model (0)	2023.05.15

Just a Kid from Korea

PyTorch: Review of Conv Layer & Pooling Layer