A few things no course will teach you about Data Science

Yukio
3 min readJul 25, 2022

--

  • There are many jobs in the field, but there are also many applicants and recruitment is a game with huge asymmetric information. Be patient, the opportunity will arise, but it will take longer than you thought.
  • If you are curious and love to learn, you should know that you will never know everything you would like to know in Data Science. It is a very extensive field, don’t get frustrated with it.
  • Every data scientist has their own story, don’t compare yourself to others. Your colleague knows more about time series — or any other topic, this is just a random example — than you because he has always worked with it, or his studies were related to it. If you take a closer look, you’ll see that you are better than him on other topics (perhaps you are a better programmer or know more statistics).
  • Learn about product/business. No C-level executive really cares about algorithms, p-values, accuracy, AUC or f1-score. They want to know which problem is being solved and what is the revenue (or savings) your project is estimated to get. The best model is the model that solves your problem!
  • Machine Learning models are just a very small part of our daily tasks. You will spend much more of your time cleaning the data and gathering information.
  • Some techniques work well in tutorials, but are nearly useless in real life data. I like to use SMOTE as an example. On Medium, this technique seems impeccable, while in real world it might be quite inefficient (although not entirely, of course).
  • You probably missed some concepts of data leakage and validation during your studies. This means that you will experience some models crashing only when they are in production. Be sure to invest some time in monitoring your models in order to avoid major disasters!
  • All companies like to say they are data-driven, but very few really embrace the culture. What’s worse, you won’t always have the leaders on your side in this fight.
  • Know the differentiate between inference and prediction. The same algorithm may have different assumptions and risks depending on the situation.
  • Tensorflow, Pycaret, Catboost and other libraries will blow your mind, I know, I’ve been there too. But never underestimate the power of the Numpy, Pandas and Scikit-Learn. You will build unbelievable solutions with these three libraries together.
  • It’s awful to spend hours (or even days) on a study and only later find out that there was a misunderstanding about the data or some error you didn’t catch. It’s terrible to experience this, but it happens to everyone. Don’t blame yourself and don’t lose your patience with your colleagues. Learn what you have to learn there and move forward!
  • Companies used to pay a lot to anyone who managed to run some kind of pd.merge(), or anything with Python. This has changed. Data Science is evolving and the demands are higher nowadays. Cloud, some knowledge of data engineering, causality and other skills are already some of the prerequisites you might find in job positions today.

--

--

Yukio

Mathematician with a master degree in Economics. Working as a Data Scientist for the last 10 years.