Thanks for attending
CVPR 2021 Tutorial
Saturday, June 19, 2021, 11 AM PST
Distributed Deep Learning on HPC servers for Large Scale Computer Vision Applications
Reduce Your Time To Solution
Check out our full tutorial videos below!
Why Distributed Deep Learning
Andrew Ng's take on HPC
HPC is becoming extremely critical in solving industrial scale machine learning problems. First, data sizes are becoming larger, high resolution and complex. Second, model sizes are increasing in search for higher accuracies. Under these circumstance, training times will be prohibitively high rendering experimentation while building network or parameter tuning unproductive and even intractable.Â
Tutorial Overview
This is a half day tutorial covering critical aspects of Distributed Deep Learning. As with any “tightly coupled” HPC workloads, achieving strong scaling is challenging. Deep Learning is an example of tightly coupled HPC workload. This tutorial will provide the necessary background for understanding the different tasks and associated challenges with Distributed Deep Learning.
Many practical machine learning applications, such as medical imaging, seismic image analysis in Energy, and Autonomous driving, depend on achieving the highest accuracy possible. This often requires building larger and even more complex models. Â
In this tutorial, we will show how to alleviate GPU memory limitations and enable training in reasonable turnaround time. Specifically, we will:Â
1) Present a comprehensive overview of distributed deep learning,Â
2) Discuss important tenets of data parallel training includingÂ
(a) loss integrity independent of number of processes,
(b) synchronized batch normalization,
(c) large batch training using higher-order optimization methods and
(d) performance evaluation by measuring speedup w.r.t number of processorsÂ
3) Show applications in agriculture, energy, transportation and manufacturing using state-of-the-art model architectures with 100s of millions of parameters on CPU/GPU clusters.
Schedule for Tutorial
-
Kick off & Introductions 20 min
-
Nuts and bolts of distributed deep learning 90 min
-
Break 5 min
-
Application 1: Image-based plant phenotyping 30 min
-
Application 2: 3D Seismic facies classification 30 min
-
Application 3: Topology Optimization 30 min
-
Application 4: Driver Behavior from Data 30 min
-
Conclusions 5 min
Distributed Deep Learning Technical Seminar
The seminar covers fundamentals and different steps in deep learning training and their respective memory footprints and time complexities. We will provide an overview of I/O, memory, and network architectures of CPU and GPU HPC clusters, and describe different modes of parallelism on different hardware infrastructures:
a) Data parallelism on CPU and GPU HPC clusters,
b) Model parallelism on CPU and GPU HPC clusters,
c) Pipeline parallelism, and
d) Federated learning on IOT devices.
Plant Phenotyping
Recent advancements in computation and sensor technology have enabled the cheap collection of high-resolution phenotype data across a large geographical area with high temporal resolution. Continuous increase in the amount of data collected and annotated has made it possible to apply deep learning algorithms successfully in a wide variety of challenging plant phenotyping tasks like in-field plant segmentation. The size of an individual image and the number of such images necessitates using large deep learning models. Naturally, distributed deep learning becomes an essential tool for training and deploying such models. We show examples of using distributed training for instance-segmentation on large-scale in-field Maize data, and anomaly detection using a low-dimensional representation of infield Maize data using convolutional autoencoders
Seismic Facies Classification
Identification of different geological features from seismic data by expert seismic interpreters for exploration is often referred to as seismic facies classification. Most of the work related to this problem utilizes seismic facies classification using 2D seismic cross-sections. Seismic facies classifications from 2D cross-sections, when stitched together and analyzed in a 3D holistic view, show abrupt discontinuity of geological features that are unrealistic. Depending on the direction in which the 2D cross-sections are taken, some features might not be fully visible in those sections which leads to wrong interpretations. Here, we use 3D image segmentation models to solve the problem of 3D seismic facies classification. This introduces two challenges: 1) memory requirements in our computational framework and 2) neural network design due to the large compute time to train each of these models. We will present distributed deep learning applications for 3D seismic facies classification problems.
Topology Optimization
Over the past few decades, there has been much emphasis on designing components with optimal performance to adapt to increasingly competitive markets. Further, coupled with advances in additive manufacturing and other advanced manufacturing processes, the scope to improve components’ performance using design optimization has increased drastically. However, state-of-the-art design optimization frameworks are compute-intensive due to the requirement of performing several iterations of finite element analysis. This part of the tutorial will explore a deep learning-based framework for performing faster and less computational design topology optimization. We will draw parallels between this problem and image segmentation task and then present the application of distributed deep learning for 3D topology optimization at different 3D voxel resolutions using different model architectures and different optimizers including higher-order methods.
Transportation Engineering
Driver behavior has been an important subject studied to improve driver safety and develop intelligent driver-assist systems. Naturalistic driving studies (NDS) are the most sought-after method
that provides insight into the driver’s everyday behavior. In these studies, vehicles are fitted with multiple sensors to record the driver’s actions such as speed, acceleration, and braking and cameras to record events inside and outside the vehicle in real-time. For a human it is easy to watch these videos and accurately identify the cues such as the driver was distracted because of a call which led to a lane departure during heavy traffic conditions. The challenge is to automate the processing using Deep Models. We will present how distributed training is used for simultaneously processing short clip of videos from multiple cameras for traffic conflict classification.