Becoming CUDA Capable
ML on GPUs Generally speaking, machine learning model training & inference is computationally expensive, so most practitioners know to try using GPU acceleration, if available. Historically, these optimizations required expertise in GPU programming, especially using NVIDIA’s CUDA framework for parallel programming. Recently, emergent best practices in model selection and transfer learning are abstracted into high-level apis, shifting the practitioner’s productivity bottlenecks from training models to getting data. Assuming the upfront cost of developing a model to be amortized over the lifetime of it’s deployment, it becomes especially important to optimize runtime performance for your target hardware....