Thanks for attending

CVPR 2021 Tutorial

Saturday, June 19, 2021, 11 AM PST

Distributed Deep Learning on HPC servers for Large Scale Computer Vision Applications

Reduce Your Time To Solution

Check out our full tutorial videos below!

Why Distributed Deep Learning

Andrew Ng's take on HPC

HPC is becoming extremely critical in solving industrial scale machine learning problems. First, data sizes are becoming larger, high resolution and complex. Second, model sizes are increasing in search for higher accuracies. Under these circumstance, training times will be prohibitively high rendering experimentation while building network or parameter tuning unproductive and even intractable. 

Tutorial Overview

This is a half day tutorial covering critical aspects of Distributed Deep Learning. As with any “tightly coupled” HPC workloads, achieving strong scaling is challenging. Deep Learning is an example of tightly coupled HPC workload. This tutorial will provide the necessary background for understanding the different tasks and associated challenges with Distributed Deep Learning.

Many practical machine learning applications, such as medical imaging, seismic image analysis in Energy, and Autonomous driving, depend on achieving the highest accuracy possible. This often requires building larger and even more complex models.  

In this tutorial, we will show how to alleviate GPU memory limitations and enable training in reasonable turnaround time. Specifically, we will: 

1) Present a comprehensive overview of distributed deep learning, 

2) Discuss important tenets of data parallel training including 

(a) loss integrity independent of number of processes,

(b) synchronized batch normalization,

(c) large batch training using higher-order optimization methods and

(d) performance evaluation by measuring speedup w.r.t number of processors 

3) Show applications in agriculture, energy, transportation and manufacturing using state-of-the-art model architectures with 100s of millions of parameters on CPU/GPU clusters.

Distributed Deep Learning Technical Seminar

The seminar covers fundamentals and different steps in deep learning training and their respective memory footprints and time complexities. We will provide an overview of I/O, memory, and network architectures of CPU and GPU HPC clusters, and describe different modes of parallelism on different hardware infrastructures:

a) Data parallelism on CPU and GPU HPC clusters,

b) Model parallelism on CPU and GPU HPC clusters,

c) Pipeline parallelism, and

d) Federated learning on IOT devices.

Plant Phenotyping

Recent advancements in computation and sensor technology have enabled the cheap collection of high-resolution phenotype data across a large geographical area with high temporal resolution. Continuous increase in the amount of data collected and annotated has made it possible to apply deep learning algorithms successfully in a wide variety of challenging plant phenotyping tasks like in-field plant segmentation. The size of an individual image and the number of such images necessitates using large deep learning models. Naturally, distributed deep learning becomes an essential tool for training and deploying such models. We show examples of using distributed training for instance-segmentation on large-scale in-field Maize data, and anomaly detection using a low-dimensional representation of infield Maize data using convolutional autoencoders

Seismic Facies Classification

Identification of different geological features from seismic data by expert seismic interpreters for exploration is often referred to as seismic facies classification. Most of the work related to this problem utilizes seismic facies classification using 2D seismic cross-sections. Seismic facies classifications from 2D cross-sections, when stitched together and analyzed in a 3D holistic view, show abrupt discontinuity of geological features that are unrealistic. Depending on the direction in which the 2D cross-sections are taken, some features might not be fully visible in those sections which leads to wrong interpretations. Here, we use 3D image segmentation models to solve the problem of 3D seismic facies classification. This introduces two challenges: 1) memory requirements in our computational framework and 2) neural network design due to the large compute time to train each of these models. We will present distributed deep learning applications for 3D seismic facies classification problems.

Topology Optimization

Over the past few decades, there has been much emphasis on designing components with optimal performance to adapt to increasingly competitive markets. Further, coupled with advances in additive manufacturing and other advanced manufacturing processes, the scope to improve components’ performance using design optimization has increased drastically. However, state-of-the-art design optimization frameworks are compute-intensive due to the requirement of performing several iterations of finite element analysis. This part of the tutorial will explore a deep learning-based framework for performing faster and less computational design topology optimization. We will draw parallels between this problem and image segmentation task and then present the application of distributed deep learning for 3D topology optimization at different 3D voxel resolutions using  different model architectures and different optimizers including higher-order methods.

Transportation Engineering

Driver behavior has been an important subject studied to improve driver safety and develop intelligent driver-assist systems. Naturalistic driving studies (NDS) are the most sought-after method
that provides insight into the driver’s everyday behavior. In these studies, vehicles are fitted with multiple sensors to record the driver’s actions such as speed, acceleration, and braking and cameras to record events inside and outside the vehicle in real-time. For a human it is easy to watch these videos and accurately identify the cues such as the driver was distracted because of a call which led to a lane departure during heavy traffic conditions. The challenge is to automate the processing using Deep Models. We will present how distributed training is used for simultaneously processing short clip of videos from multiple cameras for traffic conflict classification.

Teaching Team


CTO, RocketML

Data Scientist, ML engineer



CEO, RocketML

Data Scientist


Professor, ISU

Data scientist


Research lead, Shell

Data scientist


Professor, ISU

Data Scientist, ML engineer


Professor, ISU

Data Scientist, ML engineer


CSA, RocketML

Data Scientist, ML engineer


PhD Student, ISU

Data Scientist, ML engineer


GeoPhysicist, Shell

Data Scientist


Post Doc, ISU

Data Scientist, ML engineer


PhD Student, ISU

Data Scientist

Data Scientist

Start Trial