Date : 2022.09.28
*The contents of this book is heavily based on Stanford University’s CS231n course.
[Implementing SGD]
The overall procedure is as the following:
- Create a batch of randomly selected data for training.
- Use the loss function to find optimal inputs for weights.
- Repeat the above to minimize prediction error.
In step 2, we will be applying the gradient descent method to a random batch of data. This method is called Stochastic Gradient Descent (SGD).
Following the steps above, let’s program a network with one hidden layer.
*The numerical_gradient function in the TwoLayerNet class is different from the numerical_gradient function that we’ve imported from common.function.functions.
Before we go any further, let’s get one thing straight.
When and why do we use gradients?
First, we need to know when gradients are used. The following steps indicate which and when we implement variables and functions.
- Initialize weights and biases.
- Use initialized variables and x (training input data) and predict the outcomes of x.
- Use the outcomes from step 2 and t (training answer label) to calculate Cross Entropy Error.
- Get the gradients (slope) of the CEE loss function.
- The gradients for the weights will be used to update the learning direction for higher accuracy.
Now that we’ve completely broken down the TwoLayerNet and reviewed potential confusions, let’s implement the batch method for final testing.
After iterating with the training data, we need to see if it's applicable to new data, the test data. The decreasing outcome of the loss function evidently shows that our weight variables are gradually adapting to the MNIST data. Time for real experimentation.
There seems to be negligible difference between train accuracy and test accuracy which means overfitting has not occurred. If the data starts to over adapt to the train data, the graphs for each data will part ways.
'Tech Development > Deep Learning (CNN)' 카테고리의 다른 글
Neural Network with Backward Propagation (0) | 2022.12.16 |
---|---|
Computational Graphs & Backward Propagation (0) | 2022.10.24 |
Loss Function and Stochastic Gradient Descent (0) | 2022.10.20 |
Activation Functions (Sigmoid, ReLU, Step) & Neural Networks (0) | 2022.10.14 |
Perceptrons, Equations, and Gates (0) | 2022.10.08 |
댓글