본문 바로가기
Tech Development/Computer Vision (PyTorch)

PyTorch: Loss Functions & Optimizers

by JK from Korea 2023. 4. 23.

<PyTorch: Loss Functions & Optimizers>

Date: 2023.03.04                

 

* The PyTorch series will mainly touch on the problem I faced. For actual code, check out my github repository. 

[What Loss Function & Optimizer should I use?]

.parameters() and .This question is a problem specific and I will briefly touch on this topic since it’s been a while since I studied the conceptual ideas behind neural networks.

 

* I will be uploading a review on a research paper covering a conceptual breakdown of CNNs.

[MNIST ‘Classification’ Problem]

The neural network project using CNN for MNIST images is a classification problem. This project aims to classify the image into one of 10 classes (0-9).

In classification problems, the goal is to predict the class label of an input example from a finite set of possible class labels.

Classification examples:

  • Spam Email Detection: The goal is to classify an email as either spam or not spam.
  • Image Classification: Goals is to classify an image into different categories, such as dogs, cats, and birds.
  • Sentiment Analysis: The goal is to classify the sentiment of a given text as positive, negative, or neutral.

Several methods provided in PyTorch (torch.nn):

  • nn.Conv2d: This method creates a convolutional layer that is commonly used in CNNs for image classification tasks.
  • nn.MaxPool2d: This method creates a pooling layer that is commonly used to downsample the output of a convolutional layer.
  • nn.Linear: This method creates a fully connected layer that is commonly used to perform the final classification of the input image.

Loss Functions associated with Classification Problems:

  • Cross-Entropy Loss: This is the most common loss function used in classification problems. It measures the difference between the predicted probabilities and the true class labels. It is defined as the negative log-likelihood of the true class, given the predicted class probabilities. Cross-entropy loss is used when the output of the neural network is a probability distribution over multiple classes.
  • Binary Cross-Entropy Loss: This is a variation of cross-entropy loss used when there are only two classes. It is commonly used in binary classification problems.
  • Hinge Loss: This loss function is commonly used in multi-class classification problems, where the goal is to maximize the margin between different classes. It is used in support vector machines (SVMs) and other linear classifiers.

[Stochastic Gradient Descent]

Let’s talk about SGD specific. SGD is a type of gradient descent algorithm that updates the model parameters based on the gradients computed from a randomly selected subset of training data, known as a mini-batch.

 

SGD is appropriate for both classification and regression problems as it can be used with a variety of loss functions. However, it is more commonly used for image classification problems.

[Some Other Optimizers besides SGD]

Commonly used optimizers:

  • Stochastic Gradient Descent (SGD): SGD updates the weights of the neural network using the gradients of the loss function computed on a mini-batch of training examples. Good choice for non-complex data.
  • Adaptive Moment Estimation (Adam): Adam works better on complex datasets. It uses a combination of first and second-order moments of gradients to update the model parameters, making it suitable for datasets with varying gradients.
  • Root Mean Square Propagation (RMSprop): RMSprop uses a moving average of squared gradients to scale the learning rate of the model parameters. Works well on datasets with high variance in the gradients.

Optimizers differ in the way they update the model parameters based on the gradients of the loss function. The selection depends on the problem and data set, but the goal of optimization is “to find the global minimum of the loss function.”

728x90
반응형

댓글