Data Science Programming Newsletter April 2021

The Social Media Edition!

Hello, fellow datanistas!

If you are wondering what happened to the March edition, I was a bit overloaded at work in the lead-up to parental leave, so I intentionally took some time off from all things data-related. To make things up, though, this month, there'll be two editions of the newsletter forthcoming, this being a special edition on Awesome Social Media Posts! (It was what I had planned for March, and I'm still excited to share it with you all.)

When to use what model?

Isabelle Ghement has a great tweet that lists out the factors that influence a statistical model's choice. Things that I learned there are that practical matters, such as the skill level of an individual, are real constraints on whether a model can be used or not. Models are tools and require skill to wield!

Sponsor the people who make your tools

Samuel Colvin, who makes the awesome tool Pydanticsponsored Ned Batchelder, who makes coverage.py, a tool for testing code. The financial support, even if just the price of a cup of coffee a month (or latte, if you're feeling fancy), can make maintaining these tools financially viable for some of your favourite tools' maintainers!

Data curation is a worthwhile infrastructural investment

The Protein Data Bank was instrumental in efforts to build vaccines and treatments against COVID-19, and the fact that now over 1000 such structures have been deposited highlights for me how focused data curation over a long period of time targeting one data modality can be such a worthwhile investment that pays off dividends multiple folds.

How good are machine learning paper publication practices?

There's no doubt right now that machine learning, as a discipline, has intersected with many other disciplines. How do non-machine learners perceive the view of ML? David Ha (@hardmaru) tweeted a Reddit Thread that spells out some views.

Are two brains better than one in pair programming?

Those who have worked with me know that I like to work in pairs, solving problems together. It makes for more robust projects; creativity is also sharpened by having pairs work together. Does this hold all the time? Jacqueline Smith shares her take on her blog.

Why I'm lukewarm on Graph Neural Networks

In this post shared by Andrew Fairless on LinkedIn, Matt Ranger talks about why the research on graph neural networks appears to be "more of the same" from the academy. It's a simultaneously entertaining and sobering read :).

No COVID-19 models are clinic-ready!

On Twitter, Eric Topol shared a link to a publication in Nature Machine Intelligence. The authors found that none of the published models for using chest radiographs and CT scans to predict COVID-19 progression were ready for the clinic. Why? I won't spill the beans here; check out the paper linked in the tweet!

Berkson's Paradox

Also known as "how observational biases give rise to spurious correlations" Tweeted out by Lionel Page, there's a whole thread! Mathematician Hannah Fry explains further with more examples of Berkson's paradox in her Numberphile video (linked in the tweet).


That ends this special social media edition of the Data Science Programming Newsletter. At the end of the month, we'll resume regular, ahem, programming. While on paternity leave, apart from cooking and wiping baby bums, I’ve had some extra time to do some network science and art projects on my mind for a long time. While I can’t wait to share it with the world, my Patreon supporters have had early access to that work!

Stay safe, stay sane, stay hacking and have a ton of fun!

Cheers,
Eric