본문 바로가기
Tech Development/Deep Learning (CNN)

SGD, Epochs, and Accuracy Testing

by JK from Korea 2022. 10. 22.

Date : 2022.09.28

 

*The contents of this book is heavily based on Stanford University’s CS231n course.

 

[Implementing SGD]

The overall procedure is as the following:

 

  1. Create a batch of randomly selected data for training.
  2. Use the loss function to find optimal inputs for weights.
  3. Repeat the above to minimize prediction error.

 

In step 2, we will be applying the gradient descent method to a random batch of data. This method is called Stochastic Gradient Descent (SGD).

 

Following the steps above, let’s program a network with one hidden layer.

 

[Two Layer Net Test]

*The numerical_gradient function in the TwoLayerNet class is different from the numerical_gradient function that we’ve imported from common.function.functions.

 

Before we go any further, let’s get one thing straight.

 

When and why do we use gradients?

 

First, we need to know when gradients are used. The following steps indicate which and when we implement variables and functions.

 

  1. Initialize weights and biases.
  2. Use initialized variables and x (training input data) and predict the outcomes of x.
  3. Use the outcomes from step 2 and t (training answer label) to calculate Cross Entropy Error.
  4. Get the gradients (slope) of the CEE loss function.
  5. The gradients for the weights will be used to update the learning direction for higher accuracy.

 

Now that we’ve completely broken down the TwoLayerNet and reviewed potential confusions, let’s implement the batch method for final testing.

 

[Loss Function Tracking (10000 Iterations)]

 

After iterating with the training data, we need to see if it's applicable to new data, the test data. The decreasing outcome of the loss function evidently shows that our weight variables are gradually adapting to the MNIST data. Time for real experimentation.

 

[Train and Test]

 

[Accuracy Results (replit cannot draw function..)]

 

There seems to be negligible difference between train accuracy and test accuracy which means overfitting has not occurred. If the data starts to over adapt to the train data, the graphs for each data will part ways.

 

728x90
반응형

댓글