Smells Like ML

Framing SSL

Many recent successes in computer vision have been powered by the extension of BERTology beyond the mode of text-based data to image & video. Without a doubt, efficient Transformers which patchify input images a la ViT have initiated much of this progress. But in this post, we are interested in pretraining with self-supervised learning to develop compact representations we might use in various downstream tasks. Datasets of real world interest often exhibit structures which are not exploited in research on benchmark datasets....

Learning on Synthetic Data

Sometimes, we succeed applying transfer learning with relatively few labeled samples to develop custom models. However, there are times when the cost of acquisition is so great that even having a few examples to learn from is difficult. Scientists curate databases like FathomNet to share expertise about the ocean’s wildlife. Applying machine learning to classify marine species is quite challenging in practice due in part to rarity of encounters and challenging photographic environments....

Next Steps for ActionAI

Some of our earliest work applying ML to video was done in the context of prototyping IoT products like YogAI. A couple years ago, we described a more generalized pipeline called ActionAI. ActionAI was designed to streamline prototyping IoT products using lightweight activity recognition pipelines on devices like NVIDIA’s Jetson Nano or the Coral Dev Board. Since then, NVIDIA has introduced action recognition modules into their Deepstream SDK. They model a classifier using 3D convolutional kernels over the space-time volume of normalized regions of interest, batched over a k-window in time....

Next Steps for ActionAI

Some of our earliest work applying ML to video was done in the context of prototyping IoT products like YogAI. A couple years ago, we described a more generalized pipeline called ActionAI. ActionAI was designed to streamline prototyping IoT products using lightweight activity recognition pipelines on devices like NVIDIA’s Jetson Nano or the Coral Dev Board. Since then, NVIDIA has introduced action recognition modules into their Deepstream SDK. They model a classifier using 3D convolutional kernels over the space-time volume of normalized regions of interest, batched over a k-window in time....

Becoming CUDA Capable

ML on GPUs Generally speaking, machine learning model training & inference is computationally expensive, so most practitioners know to try using GPU acceleration, if available. Historically, these optimizations required expertise in GPU programming, especially using NVIDIA’s CUDA framework for parallel programming. Recently, emergent best practices in model selection and transfer learning are abstracted into high-level apis, shifting the practitioner’s productivity bottlenecks from training models to getting data. Assuming the upfront cost of developing a model to be amortized over the lifetime of it’s deployment, it becomes especially important to optimize runtime performance for your target hardware....

Model Explainability With GradCAM

Though accustomed to evaluating ML models with respect to performance statistics like accuracy, real-world deployment scenarios must weigh multiple models performing comparably. Deciding which to launch in A/B experiment can be challenging when the offline metrics are just a proxy for online metrics core to business decisions. Experiment time is precious and for large experiments on foundational models, the tolerance for error is limited hence it is critical to base experiment launch decisions on a collection of diverse metrics....

Data Sketching

In applied machine learning, engineers may spend considerable effort optimizing model performance with respect to metrics like accuracy on hold-out data. At times, more nuanced engineering decisions may weigh additional factors, such as latency or algorithmic simplicity. Across many problem domains, approximate, algorithmic solutions are preferred to more accurate techniques with poor scalability. It’s said that “what’s past is prologue”, an idea which manifests in the most foundational of problem solving methods: use prior information....

Efficient Transformers

Convolutional Neural Networks have been a boon to the computer vision community. Deep learning from high-bandwidth image/video datasets can be computationally and statistically much more efficient using the inductive bias of strong locality. This streamlines inference over big datasets or on resource-limited hardware. To model sequential dependence in short sequences of low-dimensional data, we have often used LSTMs. However, researchers have recently found success adapting Transformer architectures to learn from image and video, both applications traditionally dominated by CNNs....

TF Microcontroller Challenge: Droop, There It Is

Repo for this project here! A seasoned gardener can diagnose plant stress by visual inspection. For our entry to the Tensorflow Microcontroller Challenge, we chose to highlight the issue of water conservation while pushing the limits of computer vision applications. Our submission, dubbed “Droop, There It Is” builds on previous work to identify droopy, wilted plants. Drought stress in plants typically manifests as visually discernible drooping and wilting, also known as plasmolysis, indicating low turgidity or water pressure....

Make Some Noise for Score Based Models

Blob Pitt's next big blockbuster We consider generative models among the most exciting applications of machine learning. This tech has reached a remarkable capacity to synthesize original multimedia content after learning a data distribution. In this arena, the state-of-the-art has been dominated by a family of models called generative adversarial networks or GANs. However, GANs are challenged by training instabilities. The latest StyleGAN2-ada mitigates mode collapse arising from overfit discriminators using test time data augmentation....