본문 바로가기
Tech Development/Deep Learning (CNN)

CNN: The Afterwork

by JK from Korea 2022. 12. 29.

The Afterwork

 

Date : 2022.12.25

 

*The original contents of this post is from Andrej Karpathy’s blog.

 

[Additional Thoughts After Completing Deep CNN]

After completing a convolutional neural network I couldn’t help but read additional blogs and research related to computer vision. It is truly amazing how we can program the computer to understand, categorize, and produce images as humans do.

 

Andrej Karpathy’s blog is one of the great sources I have come across in search of ‘fun’ reading material. Here’s some helpful facts that I found. The following contents are from this link.

 

[About the Post]

The contents of this post is Andrej’s re-implementation of Yann LeCun et  al. (1989) paper “Backpropagation Applied to Handwritten Zip Code Recognition.” I believe this is the origin of the MNIST research. Considering the paper was written 30 years ago, it’s quite astonishing that the Yann implemented a deep neural network back then.

 

Andrej follows the paper in his own style using PyTorch as the main framework. Since I have no experience with framework collaboration, I decided to write a quick summary about this post. I believe the process Andrej wrote will help me in the near future when I start using PyTorch as one of my main machine learning frameworks.

 

You can specify neural nets using higher-level API similar to what you’d do in something like PyTorch today. A quick note on software design among modern libraries:
1. A fast general Tensor library that implements basic mathematical operations over multi-dimensional tensors
2. An auto-gradient engine that tracks the forward compute graph and can generate operations for backward pass
3. A high-level API of common deep learning operations, layers, architecture, optimizers, loss function etc.

 

So far I’ve built a network based on pure python without any frameworks. Modern day machine learning frameworks seem to have everything I wrote in pure python implemented as different packages. After building RNN in pure python (here), my next project will be a topic in CV mainly focused on using PyTorch.

 

[Deep Learning in 1989 versus 2022]

I love the “Reflections” section written by Andrej.

 

Let’s summarize what we’ve learned as a 2022 time traveler examining state of the art 1989 deep learning tech:
  • First of all, not much has changed in 33 years on the macro level. We’re still setting up differentiable neural net architectures made of layers of neurons and optimizing them end-to-end with backpropagation and stochastic gradient descent. Everything reads remarkably familiar, except it is smaller.
  • The dataset is a baby by today’s standards: The training set is just 7291 16x16 greyscale images. Today’s vision datasets typically contain a few hundred million high-resolution color images from the web (e.g. Google has JFT-300M, OpenAI CLIP was trained on a 400M), but grow to as large as a small few billion. This is approx. ~1000X pixel information per image (384*384*3/(16*16)) times 100,000X the number of images (1e9/1e4), for a rough 100,000,000X more pixel data at the input.
  • The neural net is also a baby: This 1989 net has approx. 9760 params, 64K MACs, and 1K activations. Modern (vision) neural nets are on the scale of small few billion parameters (1,000,000X) and O(~1e12) MACs (~10,000,000X). Natural language models can reach into trillions of parameters.
  • A state of the art classifier that took 3 days to train on a workstation now trains in 90 seconds on my fanless laptop (3,000X naive speedup), and further ~100X gains are very likely possible by switching to full-batch optimization and utilizing a GPU.
  • I was, in fact, able to tune the model, augmentation, loss function, and the optimization based on modern R&D innovations to cut down the error rate by 60%, while keeping the dataset and the test-time latency of the model unchanged.
  • Modest gains were attainable just by scaling up the dataset alone.
  • Further significant gains would likely have to come from a larger model, which would require more computation and additional R&D to help stabilize the training at increasing scales. In particular, if I was transported to 1989, I would have ultimately become upper-bounded in my ability to further improve the system without a bigger computer.
Suppose that the lessons of this exercise remain invariant in time. What does that imply about deep learning in 2022? What would a time traveler from 2055 think about the performance of current networks?
  • 2055 neural nets are basically the same as 2022 neural nets on the macro level, except bigger.
  • Our datasets and models today look like a joke. Both are somewhere around 10,000,000X larger.
  • One can train 2022 state of the art models in ~1 minute by training naively on their personal computing device as a weekend fun project.
  • Today’s models are not optimally formulated, and just changing some of the details of the model, loss function, augmentation or the optimizer we can about halve the error.
  • Our datasets are too small, and modest gains would come from scaling up the dataset alone.
  • Further gains are actually not possible without expanding the computing infrastructure and investing into some R&D on effectively training models on that scale.
728x90
반응형

댓글