A practical guide to securing secrets in data science projects

Alternatively titled: Stop Committing Your API Keys! (I Did It So You Don't Have To)

Jan 29, 2025

I spend a lot of time thinking about making good data science practices easy to implement, and today I want to focus on something that's bitten me more times than I care to admit: managing secrets in data science projects.

I've made all the classic mistakes - accidentally committing API keys in Jupyter notebooks, storing passwords in plain text, even pushing sensitive credentials to public repositories. After years of learning better tooling and practices, I'm excited to share what works.

Here's what I've found works in practice:

The foundation starts with environment variables. I've found that using direnv combined with .env files gives you the best balance of security and convenience. Unlike global environment variables that live in your .bashrc, direnv loads variables when you enter a directory and unloads them when you leave. This isolation is crucial for security.

But environment variables alone aren't enough. You need multiple layers of protection. I use pre-commit hooks to catch potential API keys before they ever make it into a commit. The patterns I look for catch everything from OpenAI keys to GitHub tokens. This has saved me countless times from accidentally exposing sensitive information.

When it comes to team collaboration, I've learned to never trust external services for sharing secrets. Instead, I recommend using self-destroying notes through open source tools you control. Most development platforms (GitHub, AWS, etc.) also provide built-in secrets management - take the time to learn these features.

In the blog post I wrote about this topic, I go into the technical details of setting up each of these protections. I share the exact pre-commit hook configurations I use, how to properly scope your environment variables, and what to do if you accidentally commit a secret.

The most important thing I've learned? Security isn't about being perfect - it's about having robust systems that catch your mistakes before they become problems.

Want to dive deeper into practical security for data scientists? Check out the full blog post here.

Happy Coding!
Eric

P.S. If you found this helpful, consider subscribing to get more practical data science tips directly in your inbox.

Eric's Data Science Newsletter

Discussion about this post