Framing SSL

Many recent successes in computer vision have been powered by the extension of BERTology beyond the mode of text-based data to image & video. Without a doubt, efficient Transformers which patchify input images a la ViT have initiated much of this progress. But in this post, we are interested in pretraining with self-supervised learning to develop compact representations we might use in various downstream tasks. Datasets of real world interest often exhibit structures which are not exploited in research on benchmark datasets....

 · 3 min · Terry Rodriguez & Salma Mayorquin

Make Some Noise for Score Based Models

Blob Pitt's next big blockbuster We consider generative models among the most exciting applications of machine learning. This tech has reached a remarkable capacity to synthesize original multimedia content after learning a data distribution. In this arena, the state-of-the-art has been dominated by a family of models called generative adversarial networks or GANs. However, GANs are challenged by training instabilities. The latest StyleGAN2-ada mitigates mode collapse arising from overfit discriminators using test time data augmentation....

 · 4 min · Terry Rodriguez & Salma Mayorquin

Image Inpainting for Content Localization

In our last post, we trained StyleGAN2 over a corpus of hundreds of thousands theatrical posters we scraped from sites like IMDb. Then we explored image retrieval applications of StyleGAN2 after extracting embeddings by projecting our image corpus onto the learned latent factor space. Image retrieval techniques can form the basis of personalized image recommendations as we use content similarity to generate new recommendations. Netflix engineers posted about testing the impact on user engagement from artwork produced by their content creation team....

 · 3 min · Terry Rodriguez & Salma Mayorquin

Applying GAN Latent Factors for Image Retrieval

GANs consistently achieve state of the art performance in image generation by learning the distribution of an image corpus. The newest models often use explicit mechanisms to learn factored representations for images which can be help provide faceted image retrieval, capable of conditioning output on key attributes. In this post, we explore applying StyleGAN2 embeddings in image retrieval tasks. StyleGAN2 To begin, we train a StyleGAN2 model to generate theatrical posters from our image corpus....

 · 3 min · Terry Rodriguez & Salma Mayorquin

Deepfake Detection With NVIDIA TLT 3.0 and DeepStream SDK

Last year, over 2 thousand teams participated in Kaggle’s Deepfake detection video classification challenge. For this task, contestants were provided 470 GB of high resolution video and required to submit a notebook which predicts whether each sample video file has been deepfaked with a 9 hour run-time limit. Since most deepfake technology performs a faceswap, contestants concentrated around face detection and analysis. Beginning with face detection, contestants could develop an image classifier using the provided labels....

 · 5 min · Terry Rodriguez & Salma Mayorquin

Movie Trailer Similarity for Recommendation

Intro In a previous post, we discussed scraping a movie poster image corpus with genre labels from imdb and learning image similarity models using tensorflow. In this post, we extend this idea to recommend movie trailers based on audio-visual similarity. Data We started by scraping IMDB for movie trailers and their genre tags as labels. Using Scrapy, it is easy to build a text file of video links to then download with youtube-dl....

 · 4 min · Terry Rodriguez & Salma Mayorquin

Movie Poster Similarity for Recommendation

The use of streaming services has sharply increased over this past year. Many video streaming platforms prominently feature theatrical posters in content representation. As movie posters are designed to signal theme, genre and era, this representation strongly influences a user’s propensity to watch the title. Domain experts have remarked on how poster elements can convey an emotion or capture attention. Exploring this thesis, Netflix conducted a UX study, using eye tracking to find that 91% of titles are rejected after roughly 1 second of view time....

 · 4 min · Terry Rodriguez & Salma Mayorquin

IVA Pipelines with NVIDIA TLT and Deepstream SDK 5.0

We have seen applications in industries like retail, telemedicine, and robotics enabled by video analytics with machine learning. ML practitioners often leverage transfer learning with pretrained models to expedite development. Computer vision applications can benefit from using video analytics frameworks to facilitate faster iteration and experimentation. NVIDIA’s TLT toolkit and the Deepstream SDK 5.0 have made it easy to experiment with various network architectures and quickly deploy them on a NVIDIA powered device for optimized inference....

 · 3 min · Terry Rodriguez & Salma Mayorquin

Population Health Modeling

In a matter of months, the COVID-19 pandemic has besieged humanity and now the world wrestles to manage the population health challenges of a novel coronavirus with remarkable infectivity. Organizing an effective response to blunt the impact of such a large, complex challenge demands a principled and scientific approach. Better Planning by Forecasting Infections Reliable forecasting is crucial for planning and allocating limited resources efficiently and minimizing casualties....

 · 6 min · Terry Rodriguez & Salma Mayorquin

Deepfake Detection: Challenge Accepted

Advances in methods to generate photorealistic but synthetic images have prompted concerns about abusing the technology to spread misinformation. In response, major tech companies like Facebook, Amazon, and Microsoft partnered to sponsor a contest hosted by Kaggle to mobilize machine learning talent to tackle the challenge. With $1 million in prizes and nearly half a terabyte of samples to train on, this contest requires the development of models that can be deployed to combat deepfakes....

 · 2 min · Terry Rodriguez & Salma Mayorquin