Most companies' data is terrible beyond belief (for data science purposes
). For years, these people have all these IT systems set up by people who don't think about data in the long-term perspective. Nobody is following database 101.
A friend of mine likes to say that every database starts perfect. But as business requirements pile in, the database gets stranger and stranger. More and more of our job focuses on data cleansing. So get good at that.
My favorite tool for data cleansing is
alteryx. It's a drag-and-drop GUI software. It's the best tool in terms of drag-and-drop data engineering.
Get good at data engineering in addition to the algorithms. A lot of problems in data science isn't so much on the algorithm as on the data ingestion part.
Get good at using auto ML and also hyperdrive. Any hyper-parameter tuning tool, but not the open-source one. If you're focused, if you want to focus on a single company, like if you want to work at
- Amazon – study AWS
- Google – study GCP
- Microsoft study Azure Machine Learning service