Opinionated advice for the rest of us. Love of math, optional.
Since my article about my journey to data science, I’ve had a lot of people ask me for advice regarding their own journey towards becoming a data scientist. A common theme started to emerge: aspiring data scientists are confused about how to start, and some are drowning because of the overwhelming amount of information available in the wild. So, what’s another, right?
Well, let’s see.
I urge aspiring data scientists to slow it down a bit and take a step back. Before we get to learning, let’s take care of some business first: the fine art of reinventing yourself. Reinventing yourself takes time, so we better get started early on in the game.
In this post, I will share a very opinionated approach to do-it-yourself rebranding as a data scientist. I will assume three things about you:
- You’re broke, but you’ve got grit.
- You’re willing to sacrifice and learn.
- You’ve made a conscious decision to become a data scientist.
Let’s get started!
First Things First
I’m a strong believer in Yoda’s wisdom: “Do or do not, there is no try.” For me, either you do something or you don’t. Failure for me was not an option, and I took comfort in knowing that I won’t really fail unless I quit entirely. So first bit of advice: don’t quit. Ever.
Begin with the End in Mind
Let’s get our online affairs in order and start thinking about SEO. SEO stands for search engine optimization. The simplest way to think about is the very fine art of putting as much “stuff” as you can on the internet with your real professional name out there so that when somebody searches for you, all they will find are the stuff that you want them to find.
In our case, we want the words “data science” or “data scientist” to appear whenever your name appears in the search results.
So let’s start littering the interweb!
- Create a professional Gmail account if you don’t already have one. Don’t make your username be firstname.lastname@example.org. Play it safe, the more boring, the better. Start with email@example.com, or if your name is a common one, append it with “data” like firstname.lastname@example.org. Avoid numbers at all costs. If you have one already, but it doesn’t follow the aforementioned guidelines, create another one!
- Create a LinkedIn account and use your professional email address. Put “Data Scientist in Training” in the headline. “Data Science Enthusiast” is too weak. We’ve made a conscious decision and committed to the mission, remember? While we’re at it, let’s put the app on our phone too.
- If you don’t have a Facebook account yet, create one just so you could claim your name. If you already have one, put that thing on private pronto! Go the extra mile and also delete the app on your phone so you won’t get distracted. Do the same for other social networks like Twitter, Instagram, and Pinterest. Set them to private for now, we’ll worry about cleaning them up later.
- Create a Twitter account if you don’t already have one. We can take a little bit of leeway in the username. Make it short and memorable but still professional, so you don’t offend anybody’s sensibilities. If you already have one, decide if you want to keep it or start all over. The main thing to ask yourself: is there any content in your history that can be construed as unprofessional or mildly controversial? Err on the side of caution.
- Start following the top voices in data science on LinkedIn and Twitter. Here are a few suggestions: Cassie Kozyrkov, Angela Baltes, Sarah N., Kate Strachnyi, Kristen Kehrer, Favio Vazquez, and of course, my all-time favorite: Eric Weber.
- Create a Hootsuite account and connect your LinkedIn and Twitter accounts. Start scheduling data science-related posts. You can share interesting articles from other people about data science or post about your own data science adventures! If you do share other people’s posts, please make sure you give the appropriate credit. Simply adding a URL is lazy and no bueno. Thanks to Eric Weber for this pro-tip!
- Take a professional picture and put it as your profile picture in all of your social media accounts. Aim for a neutral background, if possible. Make sure it’s only you in the picture unless you’re Eric (he’s earned his chops so don’t question him! LOL.)
- Create a Github account if you don’t have one already. You’re going to need this as you start doing data science projects.
- BONUS: if you can spare a few dollars, go to wordpress.org and get yourself a domain that has your professional name on it. I was fortunate enough to have an uncommon name, so I have ednalyn.com, but if your name is common, be creative and make one up that’s recognizably yours. Maybe something like janesmithdoesdatascience.com. Then you can start planning on having your resumé online or maybe even have a blog post or two about data science. As for me, I started with writing my experience when I first started to learn data science.
- Clean-up: when time permits, start auditing your social media posts for offensive, scandalous, or unflattering content. If you’re looking to save time, try a service like brandyourself.com. Warning! It can get expensive, so watch where you click.
Do Your Chores
No kidding! When you’re doing household chores, taking a walk, or maybe even while driving, listen to podcasts that talk about data science topics like Linear Digression and TwiML. Don’t get too bogged down about committing what they say to memory. Just go along with the flow, and sooner or later, the terminology and concepts that they discuss will start to sound familiar. Just remember not to get too caught up with the discussions that you start burning whatever you’re cooking or miss your exit like I have many times in the past.
Meat and Potatoes
Now that we’ve taken care of the preliminaries of living and breathing data science, it’s time to take care of the meat and potatoes: actually learning about data science.
There’s no shortage of opinions about how to learn data science. There are so many of them that it can overwhelm you, especially when they start talking about learning the foundational math and statistics first.
While important, I don’t see the point of studying theory first when I may soon fall asleep or worst, get too intimidated by the onslaught of mathematical formulas that I get so exasperated, and ended up quitting!
What I humbly propose, rather, is to employ the idea of “minimum viable knowledge” or MVK as described by Ken Jee. in his article: How I Would Learn Data Science (If I Had to Start Over). Ken Jee describes minimum viable knowledge as learning “just enough to be able to learn through doing.”² I suggest checking it out:
My approach to MVK is pretty straight-forward: learn just enough SQL to be able to get the data from a database, learn enough Python so that you could have program control and be able to use the pandas library, and then do end-to-end projects, from simple ones to increasingly more challenging ones. Along the way, you’d learn about data wrangling, exploratory data analysis, and modeling. Other techniques like cross-validation and grid search would surely be a part of your journey as well. The trick is never to get too comfortable and always push yourself slowly.
To the list-oriented, here is my process:
- Learn enough SQL and Python to be able to do end-to-end projects with increasing complexity.
- For each project, go through the steps of the data science pipeline: planning, acquisition, preparation, exploration, modeling, delivery (story-telling/presentation). Be sure to document your efforts on your Github account.
- Rinse and repeat (iterate).
For a more in-depth discussion of the data science pipeline, I recommend the following article: PAPEM-DM: & Steps Towards a Data Science Win.
For each iteration, I suggest doing an end-to-end project that practices each of these following data science methodologies:
- time-series analysis
- anomaly detection
- natural language processing
- distributed ML
- deep learning
And for each methodology, practice its different algorithms, models, or techniques. For example, for natural language processing, you might want to practice these following techniques:
- n-gram ranking
- named-entity recognition
- sentiment analysis
- topic modeling
- text classification
Just Push It
As you do end-to-end projects, it’s a good practice to push your work publicly on Github. Not only will it track your progress, but it also backups your work in case your local machine breaks down. Not to mention, it’s a great way to showcase your progress. Note that I said progress, not perfection. Generally, people understand if our Github repositories are a little bit messy. In fact, most expect it. At a minimum, just make sure that you have a great README.md file for each repo.
What to put on a Github Repo README.md:
- Project name
- What goal or purpose of the project
- Background on the project
- How to use the project (if somebody wants to try it for themselves)
- Mention your keywords: “data science,” “data scientist,” “machine learning,” et cetera.
Don’t ignore this note: don’t make the big mistake or hard-coding your credentials or any passwords in your public code. Put them in an .env file and .gitignore them. For reference, check out this documentation from Github.
For a great in-depth tutorial on how to use Git and Github, check out
Anne Bonner’s guide: Getting Started with Git and Github: the complete beginner’s guide.
For the Love of Math
And finally, as you get better with employing different techniques and you begin to do hyper-parameter tuning, I believe at this point that you’re ready to face the necessary evil that is math. And more than likely, the more you understand and develop intuition, the less you’ll hate it. And maybe, just maybe, you’ll even grow to love it.
I have one general recommendation when it comes to learning the math behind data science: take it slow. Be gentle on yourself and don’t set deadlines. Again, there’s no sense in being ambitious and tackling something monumental if it ends up driving you insane. There’s just no fun in it.
There are generally two approaches to learning math.
One is to take the structured approach, which starts on learning the basics first and then incrementally take on the more challenging parts. For this I recommend KhanAcademy. Personalize your learning towards calculus, linear algebra, and statistics. Take small steps and celebrate small wins.
The other approach is slightly geared for more hands-on involvement and takes a little bit of reverse engineering. I call it learning backward. You start with finding out what math concept is involved in a project and breaking down that concept into more basic ideas and go from there. This approach is better suited for those who prefer to learn by doing.
A good example of learning by doing is illustrated by a post on Analytics Vidhya.
Supplemented by this article.
Take a Break
Well, learning math sure is hard! It’s so powerful and intense that you’d better take a break often or risk overheating your brain. On the other hand, taking a break does not necessarily mean taking a day off. After all, there is no rest for the weary!
Every once in a while, I strongly recommend supplementing your technical studies with a little bit of understanding of the business side of things. For this, I suggest the classic book: Thinking with Data by Max Shron. You can also find a lot of articles here on Medium.
For example, check out Eric Kleppen’s article.
Talk to People
Taking a break can be lonely sometimes, and being alone with only your thoughts can be exhausting. So you may decide to finally talk with your family, the problem is, you’re so motivated and gung-ho about data science that it’s all you can talk about. Sooner or later, you’re going to annoy your loved ones.
It happened to me.
This is why I decided to talk to other people with similar interests. I went on Meetups and started networking with people who are either already practicing data science or people like you who are aspiring to be a data scientist as well. In this post-COVID (hopefully) age that we’re in, having group video calls are more prevalent. This is actually more beneficial because now, geography won’t be an issue anymore.
A good resource to start is LinkedIn. You can use the social network to find others with similar interests or even find local data scientists who can still spare an hour or two every month to mentor motivated learners. Start with companies in your local municipality. Find out if they have a data scientist that works there, and if you do find one, kindly send them a personalized message with a request to connect. Give them the option to refuse gracefully and just ask them to repoint or recommend you to another person who does have the time to mentor.
The worst that can happen is they said no. No hard feelings, eh?
Thanks for reading! This concludes my very opinionated advice on rebranding yourself as a data scientist. I hope you got something out of it. I welcome any feedback. If you have something you’d like to add, please post it in the comments or responses.
Let’s continue this discussion!
If you’d like to connect with me, you can reach me on Twitter or LinkedIn. I love to connect, and I do my best to respond to inquiries as they come.
Stay tuned, and see you in the next post!
If you want to learn more about my journey from slacker to data scientist, check out this article.
 Quote Investigator. (June 10, 2020). Tell Me and I Forget; Teach Me and I May Remember; Involve Me and I Learn. https://quoteinvestigator.com/2019/02/27/tell/
 Towards Data Science. (June 11, 2020). How I Would Learn Data Science (If I Had to Start Over). https://towardsdatascience.com/how-i-would-learn-data-science-if-i-had-to-start-over-f3bf0d27ca87
This article was first published in Towards Data Science’ Medium publication.