What is Machine Learning?
Data prep
The first step in this process is data prep, a step that is both difficult and expensive. The output of this step is a dataset that is ready to be fed into the framework eg. TensorFlow, Pytorch etc. and used to train the model. This process involves removing extraneous information (untrainable parameters), normalizing (eg. cropping images to a standard size), batching, and transforming the data into a numeric representation (tensors) which can then be understood by the framework.
Realtime versus Batch Training
While many practitioners believe that “online” or realtime Deep Learning pipelines (where the model is continually trained and updated), will eventually become standard, what most companies are doing today is considered ”offline” or batch training (where the training process is manually performed in discrete batches). In this scenario, models are updated periodically after a training job is initiated manually and the model that is produced is carefully evaluated and tested for accuracy before entering production. The data pipeline feeds these model training jobs.
Feature Engineering is no longer a bottleneck
Deep Learning algorithms learn to understand features without guidance from the developer instead of placing the burden on feature engineering. Without these required handcrafted rules used to describe data (both time-consuming and error-prone), large structured and unstructured datasets can be fed into algorithms with relative ease.
Public and Private Datasets
Because high-quality data is so critical to the training process, a massive amount of energy has flowed into developing large public/freelyavailable labeled training data. These datasets are designed to showcase advancements in algorithms and frameworks, benchmark chips etc. Public datasets are mentioned here because they are often used as a starting point when developing a new model.
GPUs & CPUs
GPUs have been widely used for Deep Learning and Deep Learning, as they deliver significant performance improvements over CPUs. The bottom line: GPUs will almost always be chosen for training and often be used for inference. The decision will come down to whether your prediction engine (serving endpoint) is running at scale or not. A single modern GPU, for instance, can process over 3,000 images/second compared to less than 500 images/second on a CPU. This may sound like a no-brainer but this speedup is only relevant if your request volume exceeds what a CPU can process. On the other hand, when training the model, you will always want to process the maximum amount of data in the shortest amount of time possible.
Deep Learning vs. Classical Machine Learning
Classical Machine Learning techniques (such as decision trees, linear regression, logistic regression, clustering, random forests, and Bayesian networks) have achieved incremental improvements over time and are fundamentally limited in certain areas. An example is the diminishing return that data has on the accuracy of these types of models. At a certain threshold, the marginal increase of accuracy begins to plateau even as more data is fed into the model. In contrast, neural network-based Deep Learning techniques like CNNs, RNNs etc. are unique in that their accuracy scales almost infinitely with the amount of available data. This approach enables comprehension of higher level features which have proven to yield unprecedented levels of accuracy in solving traditional problems but perhaps more importantly, have unlocked entirely new areas of application. One of the most well known applications of Deep Learning, autonomous vehicles, would not be practically feasible without modern Deep Learning techniques.