video - Smells Like ML

Cheaper to Fly

In recent experiments, we’ve generated high quality reconstructions of our apartment from video. Learning the failure modes of these methods, you will move the camera smoothly, avoid bright lights, and focus on textured regions of the FOV. If it all works out, you might spend more time recording video than processing it! Automating the data collection can really reduce the cost of mapping and reconstruction. Compared to recording from a phone/tablet, drones move more smoothly and swiftly....

Real-Time Reconstructions

Modern archeologists have been able to survey larger structures more precisely using remote sensing and photogrammetry. More recently, researchers demonstrate applications of multi view stereo with networked embedded cameras to track geological disturbances. In scenarios where visibility comes with high cost or saftey risk, the ability to quickly render high-fidelity reconstructions for offline analysis & review can be a powerful tool. Advances in techniques like multi-view stereo and structure from motion have reduced the cost by alleviating dependence on more expensive sensors like lidar....

Detect-Track-Localize

In our latest experiment with Depthai’s cameras, we consider visual localization. This relates to the simultaneous localization and mapping (SLAM) problem that robots use to consistently localize in a known environment. However, instead of feature matching with algorithms like ORB, we can try to directly regress the pose of a known object. This approach uses object detection, which is more robust to changes in illumination and perspective than classical techniques. And without the need to generate a textured map of a new environment, this technique can be quickly adapted to new scenes....

Pointcloud Video

Lately, we’ve come to enjoy using the DepthAI OAK-D, which features an RGB camera with stereo depth, IMU, and Intel’s MyriadX VPU. Along with this powerful hardware combination, DepthAI provides a rich SDK to build your own embedded vision pipelines. Many projects are included to get you started. These specs could help bring spatial AI to the SpecMirror where we can test representing human activities with pointcloud video. The Data First, we will generate training samples for activity recognition models like P4Transformer....

Next Steps for ActionAI

Some of our earliest work applying ML to video was done in the context of prototyping IoT products like YogAI. A couple years ago, we described a more generalized pipeline called ActionAI. ActionAI was designed to streamline prototyping IoT products using lightweight activity recognition pipelines on devices like NVIDIA’s Jetson Nano or the Coral Dev Board. Since then, NVIDIA has introduced action recognition modules into their Deepstream SDK. They model a classifier using 3D convolutional kernels over the space-time volume of normalized regions of interest, batched over a k-window in time....

Next Steps for ActionAI

Some of our earliest work applying ML to video was done in the context of prototyping IoT products like YogAI. A couple years ago, we described a more generalized pipeline called ActionAI. ActionAI was designed to streamline prototyping IoT products using lightweight activity recognition pipelines on devices like NVIDIA’s Jetson Nano or the Coral Dev Board. Since then, NVIDIA has introduced action recognition modules into their Deepstream SDK. They model a classifier using 3D convolutional kernels over the space-time volume of normalized regions of interest, batched over a k-window in time....

Machine Learning on Video

Factors like cheaper bandwidth and storage, expanded remote work, streaming entertainment, social media, robotics and autonomous vehicles, all contribute to the rapidly increasing volume of video data. Nonetheless, performance in benchmark ML video tasks in perception, activity recognition, and video understanding lag behind the image counterpart. In this post, we consider the challenges in applying ML to video while surveying some of the techniques en vogue to address them. The Time Dimension Treating video analytics as a search over space and time, the dimensionality begets additional hurdles to statistical and computational efficiency....

Bitrate Optimization using Spark and FFmpeg

Check out this part 1 notebook and this part 2 notebook and part 3 notebook which accompany this post! Streaming video is quickly occupying the lion’s share of digital content consumed by users of many applications. At the same time more users are streaming from mobile devices, screen sizes are also increasing while consumers expect high-quality video without lag or distortion artifacts. This frames an engineering challenge to optimize the way video is streamed for consumers across a multitude of hardware platforms....

Deepfake Detection With NVIDIA TLT 3.0 and DeepStream SDK

Last year, over 2 thousand teams participated in Kaggle’s Deepfake detection video classification challenge. For this task, contestants were provided 470 GB of high resolution video and required to submit a notebook which predicts whether each sample video file has been deepfaked with a 9 hour run-time limit. Since most deepfake technology performs a faceswap, contestants concentrated around face detection and analysis. Beginning with face detection, contestants could develop an image classifier using the provided labels....

Movie Trailer Similarity for Recommendation

Intro In a previous post, we discussed scraping a movie poster image corpus with genre labels from imdb and learning image similarity models using tensorflow. In this post, we extend this idea to recommend movie trailers based on audio-visual similarity. Data We started by scraping IMDB for movie trailers and their genre tags as labels. Using Scrapy, it is easy to build a text file of video links to then download with youtube-dl....