Data Science Programming Februrary 2021 Newsletter

Packaging, MLOps and making fancy models

Hello, fellow datanistas!

We’re back with another edition of the newsletter! For all of our friends in Texas, there’s little I can do in-person from out here in Boston, but I have been paying attention to the situation and have made a donation to help out. For readers of the newsletter who have the means, I’d like to encourage you to find your preferred way of pitching in.

In this edition, I’d like to share three topics with everybody: software packaging (especially in the Python/conda world), MLOps, and building fancy models. Let’s get started!

Conda, pip and more Python packaging

Ralf Gommers of Quansight put up a blog post towards the end of January on one of their Q-share sessions in which the discussion topic was Python packaging. In the blog post are a fairly raw and unedited look at the questions surrounding packaging in the Python world, including (but not limited to) questions such as compatibility between conda and pip, supporting multiple hardware platforms, the lack of certain packages, and most pressing of all, having organizing principles and funding for maintainers. For me, it provided a deeper insight into some of the growing pains that come with a growing community of package maintainers and users. Check out the post here.


A topic that came up at work recently was how we could operationalize the ML models we build, especially once a team has finished a minimally viable model. I found a well-organized website dedicated to this topic,, which organizes the main concepts we need to know and provides a current listing of tools. For those of us data practitioners who have a slight engineering bent, this seems to be a great resource for learning! (Related to the website is the `awesome-mlops` GitHub repository, which I’d encourage you to check out too!)

As a side topic, at work, I had organized a study session on a 2015 classic paper by Googlers on the hidden technical debt behind machine learning systems, and it was eye-opening to read. The paper illuminated the blind spots we may fall prey to when we hack our machine learning models in Jupyter notebooks. It’s an enlightening read.


Another topic that came up at work recently was Transformer models. For the longest time, Transformers existed to me as J-A-R-G-O-N, as I possessed neither an understanding of the math equations underneath the model nor an understanding of the name's etymology. (For funsies, I am curating a list of etymologies behind words that the deep learning world has co-opted in weird ways… if you have some favourites, please send them my way!)

Rants aside, I went digging out of curiosity and found two great resources - one by Jay Alammar titled “The Illustrated Transformer” and one by The AI Summer titled “How Transformers work in deep learning and NLP: an intuitive introduction,” both of which use tons of pictures to help with illustration. At its core is the Attention mechanism, which Jay explains with illustrations, and after doing an implementation myself, I finally figured out that Attention is nothing more than using dot products in fancy ways to compute a vector of similarity scores between two vectors. It was simultaneously satisfying to have implemented the math in NumPy, but I was also left with a lingering sense of “but that was it???” In any case, I hope you find the two resources educational in understanding Transformer models and the backing core concept, Attention.

Model Search

The final thing I’d like to share is a new open source library from the Google AI team called “model_search”, just released last month. The claims are that it will automatically run a model architecture search for us to find the right model that optimizes performance. If the point of your model-building exercise is gaining specific leverage over a problem, then hand-crafting a model might be a better bet. But if all you want is a predictive model, I’d say this is a wonderful tool to use. Check out the repository here.

From my collection

This month I have been exploring infinitely wide neural networks (IWNN), also a project by Google. Neural networks of infinite width share some form of equivalence with Gaussian Process models; my intuition surrounding this, however, is kind of lacking. As part of my explorations, I put together an essay on my essay collection in which I attempt computational experiments to gain an intuition over how infinitely wide neural networks’ equivalent Gaussian Process varies with the IWNN architecture, and with some level of trepidation and excitement, I’m sharing it after giving my Patreon supporters access to an early draft.

Thank you for reading

I hope you enjoyed this edition of the Data Science Programming Newsletter! If you've enjoyed this newsletter, please do share the link to the newsletter subscribe page with those whom you think might benefit from it.

As always, let me know on Twitter if you've enjoyed the newsletter, and I'm always open to hearing about the new things you've learned from it. Meanwhile, if you'd like to get early access to new content I make, I'd appreciate your support on Patreon!

Stay safe, stay indoors, and keep hacking!