본문 바로가기
Tech Development/Computer Vision (PyTorch)

PyTorch: Avoiding Dimension Problems through Basic Operations

by JK from Korea 2023. 4. 23.

<PyTorch: Avoiding Dimension Problems through Basic Operations>

 

Date: 2023.01.18                

 

* The PyTorch series will mainly touch on the problem I faced. For actual code, check out my github repository. 

[A Few Techniques to Avoid Error]

A crucial part of deep learning is to turn data into numerical representations. For example, an image can be represented as a 3-dimensional tensor in which each dimension is sized [224, 224, 3]. Since tensors represent many dimensions, we use pytorch’s built-in functions to modify tensor shape, size, and order. These techniques are critical because the basic errors usually come from incompatible sizing.

 

[Reshaping]

Reshaping is a technique to change the shape of the tensor. However, it requires shape compatibility. The new shape requirements should be able to represent the original tensor without omitting any values. Let’s see an example.

[Create a tensor of size 9, with 1 interval apart]
[We added a dimension by reshaping the original tensor into a 2 dim tensor]

In the figure above, either the row or column value corresponds to the original size of the tensor, 9. Thus, size compatible. The interesting reshape is line 7. Each value becomes a row of itself.

[(row. Col) = (9, 1)]

Let’s see another example of 10 elements in the original tensor.

[What will the x_reshaped_3 look like?]
[(row, col) = (5, 2)]

Reshaping is quite intuitive. However, be careful of the size compatibility since pytorch doesn’t automatically delete elements but rather re-orders the elements by the given size cap.

[View]

View is confusing at first. View allocates a particular variable to the same memory address as another variable. Thus, the variables that share the same memory address will have the same value stored in the memory. Further, the variable's value is stored in the memory, not the variable. The variable is simply a reference name to access the memory space which has been allocated when we declare a variable.

[var1 and var2 share the same memory address. 0xABCDEF is where the actual data is stored.]

By applying the view function in pytorch, we can access the data with different shapes or sizes without copying the data. The shape or size compatibility must meet. Typically view is used to see some data from different shapes or size perspectives without modifying the original data or creating a copy of it.

 

[Stacking (aka Concatenation)]

Stacking is the concatenation of matrices. It is similar to concatenating tables using pandas or numpy. Since we are dealing with tensors, we can concatenate tensors from different dimension perspectives.

[‘x’ is a one-dimensional tensor. We can stack it in either the 0th dim or 1st dim.]
[The first is stacking with dim=0, and the second is stacking with dim=1.]

[Squeezing]

Squeeze manipulates the dimension.

[‘z’ is initially dim 2. What will happen if we squeeze?]
[Dimension 2 -> 1.]

Unsqueeze adds a layer of dimension.

[Unsqueeze Operation.]
[Additional Dimension.]

Squeeze and unsqueeze operations are straightforward. Using them while building the CNN to see practical applications is best.

 

[Permute]

[This will come in handy for organizing the input data.]
[The order of the color channel data changed 2 -> 0.]

These are essential techniques when we preprocess data since the size, shape, and indices match for a successful neural network.

 

Side Note: Working with PyTorch and reviewing neural network models from a mathematical perspective is much more intuitive and easier compared to when I first started building it before this Spring semester back at school. Taking linear algebra and program design classes does help with the overall programming experience!

728x90
반응형

댓글