computer vision - Smells Like ML

Pointcloud Video

Lately, we’ve come to enjoy using the DepthAI OAK-D, which features an RGB camera with stereo depth, IMU, and Intel’s MyriadX VPU. Along with this powerful hardware combination, DepthAI provides a rich SDK to build your own embedded vision pipelines. Many projects are included to get you started. These specs could help bring spatial AI to the SpecMirror where we can test representing human activities with pointcloud video. The Data First, we will generate training samples for activity recognition models like P4Transformer....

Efficient Transformers

Convolutional Neural Networks have been a boon to the computer vision community. Deep learning from high-bandwidth image/video datasets can be computationally and statistically much more efficient using the inductive bias of strong locality. This streamlines inference over big datasets or on resource-limited hardware. To model sequential dependence in short sequences of low-dimensional data, we have often used LSTMs. However, researchers have recently found success adapting Transformer architectures to learn from image and video, both applications traditionally dominated by CNNs....

TF Microcontroller Challenge: Droop, There It Is

Repo for this project here! A seasoned gardener can diagnose plant stress by visual inspection. For our entry to the Tensorflow Microcontroller Challenge, we chose to highlight the issue of water conservation while pushing the limits of computer vision applications. Our submission, dubbed “Droop, There It Is” builds on previous work to identify droopy, wilted plants. Drought stress in plants typically manifests as visually discernible drooping and wilting, also known as plasmolysis, indicating low turgidity or water pressure....

Scalable Image Deduplication With Spark

Make sure to check out the databricks notebook which complements this post! Modern internet companies maintain many image/video assets rendered at various resolutions to optimize content delivery. This demand gives rise to very interesting optimization problems. Groups like Netflix have even taken steps to personalize the images presented to each user, but as they describe, this involves subproblems in organizing the collection of images. In particular, researchers described extracting image metadata to help cluster near duplicate images so they could more efficiently apply techniques like contextual bandits for image personalization....

Image Inpainting for Content Localization

In our last post, we trained StyleGAN2 over a corpus of hundreds of thousands theatrical posters we scraped from sites like IMDb. Then we explored image retrieval applications of StyleGAN2 after extracting embeddings by projecting our image corpus onto the learned latent factor space. Image retrieval techniques can form the basis of personalized image recommendations as we use content similarity to generate new recommendations. Netflix engineers posted about testing the impact on user engagement from artwork produced by their content creation team....

Deepfake Detection With NVIDIA TLT 3.0 and DeepStream SDK

Last year, over 2 thousand teams participated in Kaggle’s Deepfake detection video classification challenge. For this task, contestants were provided 470 GB of high resolution video and required to submit a notebook which predicts whether each sample video file has been deepfaked with a 9 hour run-time limit. Since most deepfake technology performs a faceswap, contestants concentrated around face detection and analysis. Beginning with face detection, contestants could develop an image classifier using the provided labels....

Movie Trailer Similarity for Recommendation

Intro In a previous post, we discussed scraping a movie poster image corpus with genre labels from imdb and learning image similarity models using tensorflow. In this post, we extend this idea to recommend movie trailers based on audio-visual similarity. Data We started by scraping IMDB for movie trailers and their genre tags as labels. Using Scrapy, it is easy to build a text file of video links to then download with youtube-dl....

Scraping Smarter with Content Filtering

Scrapy is a powerful web scraping framework and essential tool for building machine learning datasets. For sites with simple structure, scrapy makes it easy to curate a dataset after launching a spider. Check out the tutorials in scrapy’s documentation. To train a poster similarity model, we first gathered hundreds of thousands of movie posters. More concretely, when scraping IMDb.com, we may be interested in gathering posters from <img> tags under <div> tags with the class "poster"....

Movie Poster Similarity for Recommendation

The use of streaming services has sharply increased over this past year. Many video streaming platforms prominently feature theatrical posters in content representation. As movie posters are designed to signal theme, genre and era, this representation strongly influences a user’s propensity to watch the title. Domain experts have remarked on how poster elements can convey an emotion or capture attention. Exploring this thesis, Netflix conducted a UX study, using eye tracking to find that 91% of titles are rejected after roughly 1 second of view time....

IVA Pipelines with NVIDIA TLT and Deepstream SDK 5.0

We have seen applications in industries like retail, telemedicine, and robotics enabled by video analytics with machine learning. ML practitioners often leverage transfer learning with pretrained models to expedite development. Computer vision applications can benefit from using video analytics frameworks to facilitate faster iteration and experimentation. NVIDIA’s TLT toolkit and the Deepstream SDK 5.0 have made it easy to experiment with various network architectures and quickly deploy them on a NVIDIA powered device for optimized inference....