NVIDIA DeepStream Platform

The essence of the DeepStream platform is to turn videos into actionable insights. In the present world, we have cameras which are present almost everywhere. The cameras have significant data in the form of pixels, which, when tapped, can lead to generating intelligent, actionable insights, which is what we call “intelligent video analytics or IVA.” One can use IVA in different settings across industries, e.g., access controls in offices and buildings, public transport, industrial inspection, traffic engineering, retail analytics, logistics. Any domain that requires the deployment of cameras is significant for using IVA.




All the settings, i.e., security, retail, construction, and manufacturing, have one thing in common, i.e., they all have lots of pixels to analyze. The platform facilitates processing data through both edge and the cloud. It also provides a pre-trained model, i.e., TLT, from a repository named “NGC” to train a real-time model. The output of the process enables developers to create alerts, deploy analytics, or build the visualization.




The platform, as a whole, has three layers. The first layer is “SKD,” wherein we have four components:

  • Hardware-accelerated plugins: It facilitates the integration of several required hardware. It is a composite collection of plugins which one can integrate at various level of model building.
  • Docker containers: It is a repository of different tools and services that enables GPU usage as per the requirement.
  • Reference applicants – It works as a base for individuals to kick in the ideal projects.
  • Azure IoT runtime: It ensures seamless integration with Microsoft Azure.



The second layer is “CUDA-X,” which lists various software technologies used as deep stream plugins and the third or the last layer is the layer of deployment.


What’s NEW

The new launches enable:

  • Developer to start coding and then change to other models or services with the same base code. “Monochrome” and “JPEG,” which facilitates industrial inspection.
  • Support for image segmentation through enabling developers with retail and supply chain solutions.
  • To view source code of three plugins, namely “inference,” “decoder,” and “messaging.”


Pixels to Insights: Workflow

The process to reach from pixels, i.e., base information to insights, is described below. The workflow remains constant with a few other techniques, which varies w.r.t different industry requirements and settings.



  • Capturing data: The developers can source data through various sources like files, cameras, and several other sources.
  • Stream decoding: The stream with the use of 100 accelerated GPU with the help of NVDEC.
  • Pre-processing: Developers can apply various pre-processing tasks such as image conversion, image scaling, image cropping.
  • Drawing inference: Developers can deploy object detection algorithms, different classification and segmentation algorithms that one can do over GPU or DCA (deep learning accelerator) to draw meaningful insights.
  • Tracking the object
  • View it onscreen using metadata or send it to the data center
  • Capture the insights


Case Study – Retail analytics

The retail setting has several key areas wherein the scope to deploy video analytics or analytics per se is enormous. E.g., the store settings of “Amazon go” require accurate real-time analytics for the store’s efficient functioning.

The models deployed will need to access the accurate number of people in the store at any given point of time, the amount of time each person dwells around an aisle, the basket or range of products each customer seeks in real-time. The use case we will discuss below is “accurately tracking people and objects in the store.”



DeepStream platform has a plugin named “nvtracker.” It helps to detect the number of people in a store effectively. It also helps to identify the number of people beside an aisle at any given point in time. The model can do so by assigning each specific record a unique ID. The input to the model is “input buffer and metadata” while the generated output is “output buffer and modified metadata,” i.e., after attaching ID’s to identify and differentiate records. The different types of trackers available under “nvtracker” are – NVDCF, KLT, IOU, and a customer tracker, which a developer can design as per the requirement. The parameters required by the plugin are: “tracker width & height”, “tracker type”, “GPU-ID”, etc.



The roads in a city have many cameras to access traffic congestions, anomalies on the street, and accidents. Analyzing the information will help the traffic department create real-time alerts to detect abnormalities and build intelligent models at the backend to direct action plans.

In capturing and analyzing the data, analysts need to understand the data that needs analysis in real-time and the amount of data they need to send to the data centers. It is imperative to know that all data cannot be processed and analyzed in the data centers due to significant band-with issues. Thus, the use case we’ll discuss is “generation of actionable insights from 1000’s cameras”.



There is a plugin named “message broker,” which one can use for the given problem statement. It enables connecting the edge surface to the cloud in real-time and transferring only relevant data to the cloud for initiating data analysis. The plugin’s input is the metadata, while the output is a message sent over supported protocols. The several parameters required for this plugin are “path to the shared library,” “protocol, URL, port, a topic for message destination,” and others.


About the author: I am currently working as a data scientist with around 2-3 years of experience in analytics. Alongside my job, I own a blog and also write for business websites. I like to read books, listen to some good music, explore places to travel, and, most importantly, “dream” in my free time. One can connect with me: here