ChatGPT3 and Moderna Jobs!

A curated look at opinions about Large Language Models and their impact on our world and our work. And Moderna Digital is Hiring!

Jan 13, 2023

Hello, fellow datanistas,

Welcome to 2023! I hope you had a restful Christmas and New Year's break.

In this edition, I wanted to reflect on ChatGPT3, GitHub Copilot, and Midjourney. The ultra-hype phase following their releases last year has passed; the beginning of this year is a great time to see how we can use these models as tools in our toolbox.

Tangential to Reality

In this blog post, Cassie Kozyrkov (Chief Decision Scientist at Google) introduces ChatGPT3 in a demystified manner. She outlines where it succeeds (creatively "generating bullshit," to quote her), where it fails (its complete lack of concern for the truth), and how users can benefit from using it as a tool. Check it out on Medium.

Ways for coders to be productive using ChatGPT3

On LinkedIn, Santiago Valdarrama shared this list of ways ChatGPT3 saves him hours. The "keep in mind" section of his post struck me:

I have 2+ decades of programming experience. I like to think I know what I'm doing. I don't trust people's code (especially mine,) and I surely don't trust ChatGPT's output.
This is not about letting ChatGPT do my work. This is about using it to 10x my output.
ChatGPT is flawed. I find it makes mistakes when dealing with code, but that's why I'm here: to supervise it. Together we form a more perfect Union. (Sorry, couldn't help it)
Developers who shit on this are missing the point. The story is not about ChatGPT taking programmers' jobs. It's not about a missing import here or a subtle mistake there.
The story is how, overnight, AI gives programmers a 100x boost.

AI Business Ideas with ChatGPT3

Many Twitter threads outline potential business ideas one could build on ChatGPT3. Here are two that came up on my timeline.

One by Alex Banks here, and
One by Ben Tossell here.

I wrote a blog post commenting on those business ideas. The unifying challenge these businesses will need to overcome is the ability to verify generated text. Check it out here.

Awesome ChatGPT3 Prompts

I struggled, and still struggle, with designing prompts to get the best out of ChatGPT3. Today, however, I found an awesome collection of ChatGPT3 prompts. What's remarkable here is the level of detail in the prompt templates that anyone can use as a starting point for generating text from ChatGPT3.

An AI-assisted paper on Arxiv

Yes, it actually happened :). Gonzalez et al. released a paper to Arxiv that was generated with ChatGPT3 assistance. Even the first illustrative figure was generated with DALL-E! What's cool here is that the version that the authors released was practically rewritten from what ChatGPT3 initially spat out. The authors also measured how much of the original text was retained - only 6%! As I mentioned in the post-script, this matches my experience with Copilot.

Elicit: Summarize technical research with GPT3.

Another really cool application built using GPT3 is Elicit, an AI Research Assistant. I've used it to help me quickly learn new topics to which I've had little literature exposure. Its most useful aspect is its ability to summarize research literature in plain language while linking/highlighting original supporting statements. A friend of mine, also in the drug discovery space, commented that if he had access to Elicit during his Ph.D. days, he would have graduated 1-2 years earlier. Check it out here!

Reverse Outlining with Generative AI

Maggie Appleton, a designer at Ought, the non-profit firm developing Elicit (mentioned above), described how ChatGPT3 could be combined with text editor UIs to aid a writer in taking their unorganized train of thought and organizing it. It was a creative take on an application of ChatGPT3. Check out her essay here.

But it's not all roses

AI safety researcher Margaret Mitchell highlights, via Twitter, a notable report that found that cyber criminals were exploiting ChatGPT3 to write malicious code involved in identity theft, code for encrypting messages (itself not wrong, but detrimental when wielded in the wrong hands), and selling AI art while fraudulently claiming them to be original human creations. Here is her Twitter post and a link to the original report.

The Verdict?

What's the final verdict on AI generators and their use in the data science field?

On this, I have two short thoughts.

The first is that the value of expertise has gone up in this era of machine content blabberers. Only skilled practitioners with domain expertise can adequately verify the integrity of generated content.

The second is that the field is wide open for creative new applications. In the last decade, data scientists used data to creatively build data-driven products that expanded new capabilities. The same will happen in this decade, but now with interactive, realistic data generators as a primary tool. With these tools continual release and improvement, I see an expansion of the creative playing field!

From my collection

After test-driving GitHub Copilot, ChatGPT3, DALL-E, and Midjourney, I came to the conclusion that the skilled practitioner will be the one that benefits from AI coding tools (and other domain-specific AI generative tools). My most productive moments were when I treated Copilot as a muse for writing and coding inspiration. Also, never before has it been more important to be able to discern wisely truth from fiction in generated content. You can read the blog post here.

Additionally, as promised in the last edition, I wrote everything I've learned about hiring data scientists in 2021. Now that it's been incubated for a while, I'm sharing it from my Essays collection. I hope it's useful; if you have feedback, please share it with me!

I also started uploading audiogram summaries of the essay to YouTube! There will be one per week in January, with more to come. Please check them out on this playlist!

Hiring at Moderna!

Moderna digital is hiring!

My fellow data science lead, Rebecca Vislay-Wade (we call her RVW), is hiring data scientists! She has two positions open - both PhD-level (or equivalent industry experience) in her clinical & regulatory data science team. If you're interested in accelerating the delivery of the next billion doses of mRNA medicines, check out my LinkedIn post here!

Additionally, we are hiring a counterpart to myself (Principal Data Scientist) in the Technical Development and Manufacturing space. Specifically, we're looking for someone with experience with discrete optimization & constraint programming, excellent communication skills, and who can lead and grow a team. Check out the listing here.

Also, one of our Digital Business Partners, David Cascio, is hiring a Principal Data Architect. David and I work closely together, and his open role will get to architect how all the /cool/ data flows through our systems - think 4D microscopy, massively parallel sequencing assays, genetic screens, and more! For this nerd's brain, it’s this kind of data that gets me ticking; marketing and finance data just can’t do that… though I recognize that not everyone thinks the same way 😂. That said, if your brain ticks like mine and you’ve got the right technical skills, check out his listing here.

Finally, my team, the Data Science and Artificial Intelligence (Research) team, will have a position for a Masters-level Research Associate to join us to accelerate our protein engineering efforts. (Science training, or evidence of productive use of Protein Language Models, is necessary!) By the next edition, I'll share the listing and a blog post outlining what we're looking for!

Post-script

You may wonder what portion of this newsletter edition was written by ChatGPT3 or Copilot. I used ChatGPT3 as a muse to generate ideas, but the newsletter text is all original composition in the Ulysses text composer. Before sending it out, I always do a Grammarly check. The essay on interviewing was written with Copilot in VSCode, and I accepted about 10% of the suggested completions after verifying that they contributed productively to the essay.

Eric's Data Science Newsletter

Discussion about this post