Efficient Transformers

Convolutional Neural Networks have been a boon to the computer vision community. Deep learning from high-bandwidth image/video datasets can be computationally and statistically much more efficient using the inductive bias of strong locality. This streamlines inference over big datasets or on resource-limited hardware. To model sequential dependence in short sequences of low-dimensional data, we have often used LSTMs. However, researchers have recently found success adapting Transformer architectures to learn from image and video, both applications traditionally dominated by CNNs....

 · 7 min · Terry Rodriguez & Salma Mayorquin

TF Microcontroller Challenge: Droop, There It Is

Repo for this project here! A seasoned gardener can diagnose plant stress by visual inspection. For our entry to the Tensorflow Microcontroller Challenge, we chose to highlight the issue of water conservation while pushing the limits of computer vision applications. Our submission, dubbed “Droop, There It Is” builds on previous work to identify droopy, wilted plants. Drought stress in plants typically manifests as visually discernible drooping and wilting, also known as plasmolysis, indicating low turgidity or water pressure....

 · 5 min · Terry Rodriguez & Salma Mayorquin

Scalable Image Deduplication With Spark

Make sure to check out the databricks notebook which complements this post! Modern internet companies maintain many image/video assets rendered at various resolutions to optimize content delivery. This demand gives rise to very interesting optimization problems. Groups like Netflix have even taken steps to personalize the images presented to each user, but as they describe, this involves subproblems in organizing the collection of images. In particular, researchers described extracting image metadata to help cluster near duplicate images so they could more efficiently apply techniques like contextual bandits for image personalization....

 · 2 min · Terry Rodriguez & Salma Mayorquin

Image Inpainting for Content Localization

In our last post, we trained StyleGAN2 over a corpus of hundreds of thousands theatrical posters we scraped from sites like IMDb. Then we explored image retrieval applications of StyleGAN2 after extracting embeddings by projecting our image corpus onto the learned latent factor space. Image retrieval techniques can form the basis of personalized image recommendations as we use content similarity to generate new recommendations. Netflix engineers posted about testing the impact on user engagement from artwork produced by their content creation team....

 · 3 min · Terry Rodriguez & Salma Mayorquin

Deepfake Detection With NVIDIA TLT 3.0 and DeepStream SDK

Last year, over 2 thousand teams participated in Kaggle’s Deepfake detection video classification challenge. For this task, contestants were provided 470 GB of high resolution video and required to submit a notebook which predicts whether each sample video file has been deepfaked with a 9 hour run-time limit. Since most deepfake technology performs a faceswap, contestants concentrated around face detection and analysis. Beginning with face detection, contestants could develop an image classifier using the provided labels....

 · 5 min · Terry Rodriguez & Salma Mayorquin