Make Some Noise for Score Based Models

Blob Pitt's next big blockbuster We consider generative models among the most exciting applications of machine learning. This tech has reached a remarkable capacity to synthesize original multimedia content after learning a data distribution. In this arena, the state-of-the-art has been dominated by a family of models called generative adversarial networks or GANs. However, GANs are challenged by training instabilities. The latest StyleGAN2-ada mitigates mode collapse arising from overfit discriminators using test time data augmentation....

 · 4 min · Terry Rodriguez & Salma Mayorquin

Machine Learning on Video

Factors like cheaper bandwidth and storage, expanded remote work, streaming entertainment, social media, robotics and autonomous vehicles, all contribute to the rapidly increasing volume of video data. Nonetheless, performance in benchmark ML video tasks in perception, activity recognition, and video understanding lag behind the image counterpart. In this post, we consider the challenges in applying ML to video while surveying some of the techniques en vogue to address them. The Time Dimension Treating video analytics as a search over space and time, the dimensionality begets additional hurdles to statistical and computational efficiency....

 · 5 min · Terry Rodriguez & Salma Mayorquin

Jacked About Jax

Like others, we’ve noted a recent uptick in research implemented using Jax. You might chalk it up as yet another ML platform, but Jax is emerging as the tool of choice for faster research iteration at DeepMind. After exploring for ourselves, we’re excited to find Jax is principally designed for fast differentiation. Excited because differentiation is foundational to gradient-based learning strategies supported in many ML algorithms. Moreover, the derivative is also ubiquitous in scientific computing, making Jax one powerful hammer!...

 · 4 min · Terry Rodriguez, Salma Mayorquin

NLS on the Lumpy Torus

Recently, I’ve found a fascinating line of work directed at advancing computational fluid dynamics using machine-learned preconditioners to speed up convergence in linear iterative solvers. In fact, the number of steps until convergence influences the performance bound of many classical optimization algorithms. Machine learning helps us to trade a cheap, data-driven approximation for fewer, costly optimization steps in the endgame of convergence. Given this context, I’ve been revisiting my studies on numerical PDE like the Nonlinear Schrodinger Equation (NLS) and here I’ll share some of the background work I took part in during the Summer of 2012....

 · 5 min · Terry Rodriguez

Bitrate Optimization using Spark and FFmpeg

Check out this part 1 notebook and this part 2 notebook and part 3 notebook which accompany this post! Streaming video is quickly occupying the lion’s share of digital content consumed by users of many applications. At the same time more users are streaming from mobile devices, screen sizes are also increasing while consumers expect high-quality video without lag or distortion artifacts. This frames an engineering challenge to optimize the way video is streamed for consumers across a multitude of hardware platforms....

 · 5 min · Terry Rodriguez & Salma Mayorquin

Scalable Image Deduplication With Spark

Make sure to check out the databricks notebook which complements this post! Modern internet companies maintain many image/video assets rendered at various resolutions to optimize content delivery. This demand gives rise to very interesting optimization problems. Groups like Netflix have even taken steps to personalize the images presented to each user, but as they describe, this involves subproblems in organizing the collection of images. In particular, researchers described extracting image metadata to help cluster near duplicate images so they could more efficiently apply techniques like contextual bandits for image personalization....

 · 2 min · Terry Rodriguez & Salma Mayorquin

Image Inpainting for Content Localization

In our last post, we trained StyleGAN2 over a corpus of hundreds of thousands theatrical posters we scraped from sites like IMDb. Then we explored image retrieval applications of StyleGAN2 after extracting embeddings by projecting our image corpus onto the learned latent factor space. Image retrieval techniques can form the basis of personalized image recommendations as we use content similarity to generate new recommendations. Netflix engineers posted about testing the impact on user engagement from artwork produced by their content creation team....

 · 3 min · Terry Rodriguez & Salma Mayorquin

Applying GAN Latent Factors for Image Retrieval

GANs consistently achieve state of the art performance in image generation by learning the distribution of an image corpus. The newest models often use explicit mechanisms to learn factored representations for images which can be help provide faceted image retrieval, capable of conditioning output on key attributes. In this post, we explore applying StyleGAN2 embeddings in image retrieval tasks. StyleGAN2 To begin, we train a StyleGAN2 model to generate theatrical posters from our image corpus....

 · 3 min · Terry Rodriguez & Salma Mayorquin

Deepfake Detection With NVIDIA TLT 3.0 and DeepStream SDK

Last year, over 2 thousand teams participated in Kaggle’s Deepfake detection video classification challenge. For this task, contestants were provided 470 GB of high resolution video and required to submit a notebook which predicts whether each sample video file has been deepfaked with a 9 hour run-time limit. Since most deepfake technology performs a faceswap, contestants concentrated around face detection and analysis. Beginning with face detection, contestants could develop an image classifier using the provided labels....

 · 5 min · Terry Rodriguez & Salma Mayorquin

Movie Trailer Similarity for Recommendation

Intro In a previous post, we discussed scraping a movie poster image corpus with genre labels from imdb and learning image similarity models using tensorflow. In this post, we extend this idea to recommend movie trailers based on audio-visual similarity. Data We started by scraping IMDB for movie trailers and their genre tags as labels. Using Scrapy, it is easy to build a text file of video links to then download with youtube-dl....

 · 4 min · Terry Rodriguez & Salma Mayorquin