Bacteria x deep learning; CNNs for sequences, and UMATO

Cool applications of deep learning, a potentially better version of UMAP, and a career Twitter thread sprinkled in!

Nov 09, 2022

Hello, fellow datanistas!

Welcome back to another edition of the data science programming newsletter! This month’s edition is a smattering of cool projects, a career thread, and awesome tooling.

Cool project: Controlling gene expression in bacteria using deep learning models

The Dunlop lab at Boston University developed a method to control bacterial gene expression using deep learning models. The core idea was to control the production of green fluorescent protein, also known as GFP, using a bacteria engineered to be controllable using light inputs. A deep learning model (an encoder-decoder network) is used to predict, as time progresses, what the gene expression level of the bacteria would be. The system uses that network output to decide what light intensity to apply next. Read more about it on this Twitter thread authored by Mary Dunlop herself:

Literature: Are convolutional neural networks better than transformers for long sequence modelling tasks?

This paper, and its accompanying Twitter thread, we're intriguing to me. That's because I usually think of sequence models as fit for sequence problems, with convolutional networks being fit for imaging problems. After all, that's what the respective inductive biases of those models are. However, this paper attempts to explain how some of the components of a special class of convolutional neural networks help with that model class’s ability to handle long-range sequence modelling tasks.

Career: How should you evaluate a company's job offer?

I've included this twitter thread for those switching to a new role or looking for a new job. I learned a lot from this Twitter thread, and I suggest you check it out. There is information about levelling, base salary, equity, benefits, and more. Check out the thread by Alex Cohen:

Tooling: An improvement to the UMAP algorithm: UMATO

UMAP is a popular dimensionality reduction method made by one of my conference friends Leland McInnes. Building on top of UMAP is the algorithm UMATO. Because most dimensionality reduction methods try to optimize for either global or local structure, we often lose information that may be relevant in interpreting the visual output of dimensionality reduction on our data sets. UMATO as an algorithm is an attempt at solving for optimizing both global and local structures together. Check out the GitHub repo over here.

Social Media: How to pick a Mastodon server instance?

Who would've thought that one of the wealthiest men in the world buying Twitter would have resulted in great promotional material for a free and open-source alternative called Mastodon? So many data science personalities on Twitter have leaped and created Mastodon accounts that you can follow. But, with so many mastodon instances, how do you pick which Mastodon server instance to set up an account? My latest blog post addresses that question: How to pick a Mastodon instance?

By the way, I've also moved over to Mastodon. Follow me at @ericmjl@octodon.social! (link to my profile here)

I have been doing a bit of thinking recently about career trajectories and such. This has been motivated by my experience being a data science team lead at Moderna. So over the next few editions, I’ll begin sharing more content along those lines. I hope that you find it useful!

Eric's Data Science Newsletter

Discussion about this post