Hey, we're Terry & Salma 馃憢

The technical tagteam behind this blog. We aim to showcase experiments and innovations in Artificial Intelligence and applied Machine Learning.

Framing SSL

Many recent successes in computer vision have been powered by the extension of BERTology beyond the mode of text-based data to image & video. Without a doubt, efficient Transformers which patchify input images a la ViT have initiated much of this progress. But in this post, we are interested in pretraining with self-supervised learning to develop compact representations we might use in various downstream tasks. Datasets of real world interest often exhibit structures which are not exploited in research on benchmark datasets....

 路 3 min 路 Terry Rodriguez & Salma Mayorquin

Learning on Synthetic Data

Sometimes, we succeed applying transfer learning with relatively few labeled samples to develop custom models. However, there are times when the cost of acquisition is so great that even having a few examples to learn from is difficult. Scientists curate databases like FathomNet to share expertise about the ocean鈥檚 wildlife. Applying machine learning to classify marine species is quite challenging in practice due in part to rarity of encounters and challenging photographic environments....

 路 3 min 路 Terry Rodriguez & Salma Mayorquin

Next Steps for ActionAI

Some of our earliest work applying ML to video was done in the context of prototyping IoT products like YogAI. A couple years ago, we described a more generalized pipeline called ActionAI. ActionAI was designed to streamline prototyping IoT products using lightweight activity recognition pipelines on devices like NVIDIA鈥檚 Jetson Nano or the Coral Dev Board. Since then, NVIDIA has introduced action recognition modules into their Deepstream SDK. They model a classifier using 3D convolutional kernels over the space-time volume of normalized regions of interest, batched over a k-window in time....

 路 4 min 路 Terry Rodriguez & Salma Mayorquin

Becoming CUDA Capable

ML on GPUs Generally speaking, machine learning model training & inference is computationally expensive, so most practitioners know to try using GPU acceleration, if available. Historically, these optimizations required expertise in GPU programming, especially using NVIDIA鈥檚 CUDA framework for parallel programming. Recently, emergent best practices in model selection and transfer learning are abstracted into high-level apis, shifting the practitioner鈥檚 productivity bottlenecks from training models to getting data. Assuming the upfront cost of developing a model to be amortized over the lifetime of it鈥檚 deployment, it becomes especially important to optimize runtime performance for your target hardware....

 路 4 min 路 Terry Rodriguez & Salma Mayorquin

Model Explainability With GradCAM

Though accustomed to evaluating ML models with respect to performance statistics like accuracy, real-world deployment scenarios must weigh multiple models performing comparably. Deciding which to launch in A/B experiment can be challenging when the offline metrics are just a proxy for online metrics core to business decisions. Experiment time is precious and for large experiments on foundational models, the tolerance for error is limited hence it is critical to base experiment launch decisions on a collection of diverse metrics....

 路 3 min 路 Terry Rodriguez & Salma Mayorquin

Data Sketching

In applied machine learning, engineers may spend considerable effort optimizing model performance with respect to metrics like accuracy on hold-out data. At times, more nuanced engineering decisions may weigh additional factors, such as latency or algorithmic simplicity. Across many problem domains, approximate, algorithmic solutions are preferred to more accurate techniques with poor scalability. It鈥檚 said that 鈥渨hat鈥檚 past is prologue鈥, an idea which manifests in the most foundational of problem solving methods: use prior information....

 路 5 min 路 Terry Rodriguez & Salma Mayorquin

Efficient Transformers

Convolutional Neural Networks have been a boon to the computer vision community. Deep learning from high-bandwidth image/video datasets can be computationally and statistically much more efficient using the inductive bias of strong locality. This streamlines inference over big datasets or on resource-limited hardware. To model sequential dependence in short sequences of low-dimensional data, we have often used LSTMs. However, researchers have recently found success adapting Transformer architectures to learn from image and video, both applications traditionally dominated by CNNs....

 路 7 min 路 Terry Rodriguez & Salma Mayorquin

TF Microcontroller Challenge: Droop, There It Is

Repo for this project here! A seasoned gardener can diagnose plant stress by visual inspection. For our entry to the Tensorflow Microcontroller Challenge, we chose to highlight the issue of water conservation while pushing the limits of computer vision applications. Our submission, dubbed 鈥淒roop, There It Is鈥 builds on previous work to identify droopy, wilted plants. Drought stress in plants typically manifests as visually discernible drooping and wilting, also known as plasmolysis, indicating low turgidity or water pressure....

 路 5 min 路 Terry Rodriguez & Salma Mayorquin

Make Some Noise for Score Based Models

Blob Pitt's next big blockbuster We consider generative models among the most exciting applications of machine learning. This tech has reached a remarkable capacity to synthesize original multimedia content after learning a data distribution. In this arena, the state-of-the-art has been dominated by a family of models called generative adversarial networks or GANs. However, GANs are challenged by training instabilities. The latest StyleGAN2-ada mitigates mode collapse arising from overfit discriminators using test time data augmentation....

 路 4 min 路 Terry Rodriguez & Salma Mayorquin

Machine Learning on Video

Factors like cheaper bandwidth and storage, expanded remote work, streaming entertainment, social media, robotics and autonomous vehicles, all contribute to the rapidly increasing volume of video data. Nonetheless, performance in benchmark ML video tasks in perception, activity recognition, and video understanding lag behind the image counterpart. In this post, we consider the challenges in applying ML to video while surveying some of the techniques en vogue to address them. The Time Dimension Treating video analytics as a search over space and time, the dimensionality begets additional hurdles to statistical and computational efficiency....

 路 5 min 路 Terry Rodriguez & Salma Mayorquin