Smells Like ML

This is the Remyx

Check out remyx.ai Why do we care about ML in product? As you’re reading this, one might assume that you are familiar with machine learning (ML) concepts. But since we aim to reach a broad, ML-curious developer audience with this post, let us simply frame our interest in ML-based product features which help us to make decisions using cheaper data. The case for custom ML Many practical machine learning tasks come down to differentiating between only a few object categories in a constrained environment....

FilmGeeks 3

Check out the FilmGeeks3 Collection In last year’s post on generative models, we showcased theatrical posters synthesized with GANs and diffusion models. Since that time, Latent Diffusion has gained popularity for faster training while allowing the output to be conditioned on text and images. This variant performs diffusion in the embedding space after encoding text or image inputs with CLIP. By allowing the generated output to be conditioned on text and/or image inputs, the user has much more influence on the results....

ROS NeuralRecon

Check out the ros_neuralrecon repo here! About a month ago, we remarked on some exciting modifications to NeuralRecon, which can generate dense scene reconstructions in real-time using TSDF fusion on posed monocular video. Specifically, we noted that the image feature extraction could be shifted to the edge using depthai cameras after converting the MnasMulti backbone to .blob. Trained on ScanNet, the researchers recommend custom data capture with ARKit using ios_logger. We found this works well if you have Xcode on a Mac and an iphone....

Go Nerf Yourself

While prototyping YogAI, our smart mirror fitness application, we dreamed of using generative models like GANs to render realistic avatars. For the TFWorld 2.0 Challenge, we came a bit closer to that vision by demonstrating a pipeline which quickly creates motion transfer videos. More recently, we have been learning about reconstruction techniques and have been excited about the work around Neural Radiance Fields (Nerf). By this method, one learns an implicit representation of a scene from posed monocular videos....

Cheaper to Fly

In recent experiments, we’ve generated high quality reconstructions of our apartment from video. Learning the failure modes of these methods, you will move the camera smoothly, avoid bright lights, and focus on textured regions of the FOV. If it all works out, you might spend more time recording video than processing it! Automating the data collection can really reduce the cost of mapping and reconstruction. Compared to recording from a phone/tablet, drones move more smoothly and swiftly....

Meet the Flockers

In this post, we share one of our favorite “pet projects”. We first heard about the “parrots of telegraph hill” looking for things to see in the city. But after a couple years, we never managed to run into one of these accidentally re-wilded parrots. Eventually, we moved to a new apartment where we could hear their distant squawks and occassionally see a small flock of the cherry-headed conures....

Real-Time Reconstructions

Modern archeologists have been able to survey larger structures more precisely using remote sensing and photogrammetry. More recently, researchers demonstrate applications of multi view stereo with networked embedded cameras to track geological disturbances. In scenarios where visibility comes with high cost or saftey risk, the ability to quickly render high-fidelity reconstructions for offline analysis & review can be a powerful tool. Advances in techniques like multi-view stereo and structure from motion have reduced the cost by alleviating dependence on more expensive sensors like lidar....

Detect-Track-Localize

In our latest experiment with Depthai’s cameras, we consider visual localization. This relates to the simultaneous localization and mapping (SLAM) problem that robots use to consistently localize in a known environment. However, instead of feature matching with algorithms like ORB, we can try to directly regress the pose of a known object. This approach uses object detection, which is more robust to changes in illumination and perspective than classical techniques. And without the need to generate a textured map of a new environment, this technique can be quickly adapted to new scenes....

Pointcloud Video

Lately, we’ve come to enjoy using the DepthAI OAK-D, which features an RGB camera with stereo depth, IMU, and Intel’s MyriadX VPU. Along with this powerful hardware combination, DepthAI provides a rich SDK to build your own embedded vision pipelines. Many projects are included to get you started. These specs could help bring spatial AI to the SpecMirror where we can test representing human activities with pointcloud video. The Data First, we will generate training samples for activity recognition models like P4Transformer....

Adding Vision to the Turtlebot3

Turtlebot open-sourced designs for many low-cost, personal robot kits and the Burger is a great platform to learn ROS and mobile robotics. The documentation progresses from setup to more advanced robot behavior including SLAM, navigation, autonomous driving and more! Our Build Since the turtlebot3 burger kit doesn’t include a camera, we added the OAK-FFC-3P camera made by Luxonis with a 100 degree Arducam to our build. This device includes deep learning accelerators on the camera so we can reduce the compute needs on the raspberry pi host....