Question 1

Eugene:

Hi Dennis, it's a pleasure to have you here!
Most of my fellow MITx learners are eager to land a job in a tech giant like Microsoft. We have so many questions for you! What do you do there?

Dennis Sawyers,
MICROSOFT:

#hide

Question 2

Eugene:

#hide

Answer

My position right now is called Solution Architect in Data and AI. I'm constantly doing data science pretty much every single day across a vast array of customers. Starting from data ingestion and through that whole pipeline.

Question 3

Eugene:

#hide

Answer

Basically, that is the following:

help people use the Azure Machine Learning suite to do data science.
constantly working with customers to get them started on Azure
show them how to do machine learning on Azure
help with getting access to their data
advice on how to implement those projects end to end

That is starting from data ingestion and going through that whole pipeline.

Question 4

Eugene:

What skill set would be enough to be selected for a data science position in Microsoft?

Answer

I was a data scientist at Ford Motor Company for four years. Don't get too attached to top tech. You can get into a big name company too. That can be even better for you. Google and Microsoft, Amazon, in particular — all have major customers and all the FORTUNE 500. When I got a job at Ford Motor, it was FORTUNE 10.

There's Chrysler, GM, and there are big companies all over the place. Walmart, Lowe's, grocery chains and restaurants, and fast-food chains like McDonald's. Just try to get into a big name company. Because all of those companies right now, they're all starving for data scientists.

Question 5

Eugene:

So – Data Science isn't all around?

Answer

Yeah. There's a significant problem with data science. There is a high concentration of the field on the West Coast of the United States and maybe a bit in New York. Everywhere else in the country, there's just not enough people.

Question 6

Eugene:

#hide

Answer

Don't be afraid to start at a manufacturing company or something like that. Everyone's trying to get in Google, Amazon, Microsoft. Even if you do get in on a junior position, it is hard to move up. If you start at a non-tech FORTUNE 500 company, you're going to be a much bigger fish in a smaller pond. You'll have a lot more responsibility right off the bat. You're going to be doing some fantastic projects. By the time that you're a couple of years out of college, you're already going to do these super big-ticket projects with a serious contribution to your company business.

At Ford Motor Company, I had an analytics project where I increased the number of activated vehicles and modems from about 15% to 80%, over two and a half year period. Why was it an important metric? Business wanted customers to use their technology. Activating the modem involved downloading and using an app that Ford Motor Company invested tens of millions of dollars.

I did another AI project at Ford. I was predicting auto vehicle sales and market share. And that was super impactful. Executives at Ford used that to determine how to allocate billions of dollars of incentive spending.

Those were just two projects. Had I joined a more prominent tech company, or at the beginning, I wouldn't have had the seniority to tackle those problems by myself. With that ability to generate value, many companies will welcome you to positions with even more responsibility, including top tech.

Question 7

Eugene:

#hide

Answer

Try to get on the most impactful projects that you can. Use your spare time to do volunteer work. Tech competitions are good, but what's better is finding some charity/volunteer organizations and doing some stuff for them. That's another area where they don't have too many data scientists, but they have massive projects that make a difference in the world. Once you have done a few of those, you get a lot of credibility, and you learn a lot along the way too.

Question 8

Eugene:

Why would anyone dreaming of a big tech would seriously consider anything else?

Answer

Once you learn how to contribute there it's straightforward. Nothing can prevent you from being as productive in a tech or any other company.

I'm doing a cloud solution architect role right now in data and AI. I could very easily switch to AI research in Microsoft or AI product development. Those are the natural roles for me. It's all about getting an understanding of what end to end data science really is.

Question 9

Eugene:

#hide

Answer

During my study I overly focused on the algorithms and alike. Students tend to do that, and they fall into excessively focusing on the accuracy metrics but forget about how it plugs into a business ecosystem.

Question 10

Eugene:

I spent two days trying to improve accuracy from 97.96% to 98% doing the lab project once... Where the students approach ends and grown-up Data Science begins then?

Answer

Most companies' data is terrible beyond belief (for data science purposes). For years, these people have all these IT systems set up by people who don't think about data in the long-term perspective. Nobody is following database 101.

A friend of mine likes to say that every database starts perfect. But as business requirements pile in, the database gets stranger and stranger. More and more of our job focuses on data cleansing. So get good at that.

My favorite tool for data cleansing is alteryx. It's a drag-and-drop GUI software. It's the best tool in terms of drag-and-drop data engineering.

Get good at data engineering in addition to the algorithms. A lot of problems in data science isn't so much on the algorithm as on the data ingestion part.

Get good at using auto ML and also hyperdrive. Any hyper-parameter tuning tool, but not the open-source one. If you're focused, if you want to focus on a single company, like if you want to work at

Amazon – study AWS
Google – study GCP
Microsoft study Azure Machine Learning service

Question 11

Eugene:

To learn any of these tools is quite a scope. What part is the most valuable to begin with?

Answer

All of these versions have some version of hyper-parameter tuning. Know how to use those things. In terms of when to use those things – always.

The first thing that you should do when you start a new machine learning project is to transform your data and run it through auto ML. That will give you a baseline. Oftentimes people can beat it! But the automated machine learning algorithms are getting better and better and better. As a data scientist, you're spending less and less time on things like algorithm selection and algorithm tuning

There's still a lot of room in terms of creativity for taking a business problem, and knowing what metric to minimize. Take accuracy, for example. False-positive and false-negative are rarely equally bad.

Question 12

Eugene:

For example?

Answer

Say you're a restaurant. You have a takeout place and items on hand that you sell to customers as they walk in. It's usually more OK to throw out extra items than to make customers wait. You will lose less money that way than if a customer walks in and walks out without buying anything.

In a problem like that, you're looking for precision or recall, rather than accuracy. You're trying to assign a weight to say that throwing out food is only 10% as bad as a customer.

Question 13

Eugene:

You are talking about false-positive/false-negative trade-off that I tie to an actual business problem I solve.

Answer

Right! And an automated ML can not do that and never will.

As a data scientist, you want to get to know well:

traditional things like algorithms
how to work with auto ML because you'll be using that a lot
what business needs.

Question 14

EUGENE:

#hide

Answer

There's something that I like to call the data science productivity problem. Many companies hire a ####-ton of data scientists, get a lot of data, and don't get the return on investment from it.

There are a few reasons for that. The biggest one is their data is terrible, and they don't do enough legwork on cleansing it. So their data scientists have to do all their data cleansing. But data scientists are not data engineers, they are slow at it. And they're bad at it.

Question 15

EUGENE:

#hide

Answer

My previous job at Ford Motor Company, for example. We had some of the strangest data you had ever seen in your life. A field entirely populated with Greek letters, and there always were three Greek letters like ⍺β𝜋. And there was no translation table because it was in everybody's heads. Moreover, greek letters turn it into gibberish, since they aren't supported by many databases, including hadoop. You lose the information when you move them.

Question 16

Eugene:

#hide

Answer

Problems like that slow everything down. Once you solved that, you should focus on scoring and retraining. In most companies, it's done manually:

Question 17

Eugene:

#hide

Answer

Once you have a model that you'd like, the next step is to deploy it to a pipeline. And by pipeline, I mean, there's this concept in Azure called Azure Machine Learning pipeline. You can set up scripts that automatically score data and automatically retrain the model, and you can schedule all that automatically. So you can never, you know, touch it again.

Master software engineering skills and make pipelines. If you can do an automated scoring and retraining pipelines – you are productive. You wouldn't do data cleansing all day and be able to do exciting work.

Think about it.

Question 18

Eugene:

What directions of DS / ML are the most desired?

DENNIS SAWYERS,
MICROSOFT:

#hide

Question 19

Eugene:

#hide

Answer

Stay away from projects & examples that are a little bit too esoteric. For example, I worked in a Carnegie Mellon research lab. I built an anomaly detection algorithm around radiation portal monitors there. We ended up using a self-scoring multivariate anomaly detection algorithm. Its paper was cited about forty times and known extremely little.

During an interview, it's a lot more challenging to talk about such a rare topics, than a well-known classification algorithm, where you applied it to a novel problem.

I also worked for an NBA team when I was a student. And I built the classification model using logistic regression and naive Bayes around predicting which European basketball player to draft in the NBA.

Everybody knows basketball, and everybody knows Naive Bayes and logistic regression. It's a lot easier to have that conversation.

Question 20

Eugene:

What is your vision of Data Science development for the next ten years?

Answer

That is a great question I will answer it with PowerPoint:

Question 21

Eugene:

Thank you, Dennis! That was great!

DENNIS SAWYERS,
MICROSOFT:

#hide