Isn’t that a great question? It's easy to get confused when it comes to parsing the difference between a data scientist and a data analyst. This is in part due to the general public’s misunderstanding of both of these professions and the misinformation that is spread as a result. The most obvious difference between these two fields is where their data comes from.

## Comparing data analysts and data scientists

Data analysts are going to be dealing with data that you might remember from your statistics classes. This data might come from survey results, lab experiments of various sorts, longitudinal studies, or another form of social observation. Data may also come from observation of natural or created phenomenons, but the data’s form would still be similar.

Data scientists on the other hand, are going to looking at things like metadata from billions of phone calls, data used to forecast Bitcoin prices that have been scraped from various places around the Internet, or maybe data related to Internet searches before and after some important event. So their data is often different, but is that all? The tools and skillset required for each is actually quite different as well.

Data science is much more entwined with the field of computer science than data analysis. A good data analyst should have working knowledge of how computers, networks, and the Internet function, but they don’t need to be an expert in any of these things. Data analyst really just need to know a good scripting language that is used to handle data, like Python or R, and maybe a more mathematically advanced tool like MatLab or Mathematica for more advanced modeling procedures. A data analyst could have a fruitful career knowing only about that much in the realm of technology. Data scientists, however, need to know a lot about how networks and the Internet work. Most data scientists will need to have mastered HTTP, HTML, XML and SQL as well as scripting languages like Ruby or Python, and also object-oriented languages like Java or C. This is because data scientists spend a lot more time capturing, manipulating, storing and moving around data than a data analyst would. These tasks require a different skillset.

## Data analysts and data scientists have different forms of conceptual understanding

There will also likely be a difference in the conceptual understanding of a data analyst versus a data scientist. If you were to ask both a data scientist and a data analyst to derive and twice differentiate the log likelihood function of the binomial logistic regression model, it is more likely the data analyst would be able to do it. I would expect data analysts to have a better theoretical understanding of statistics than a data scientist. This is because data scientists don’t really need much theoretical understanding in order to be effective. A data scientist would be better served by learning more about capturing data and analyzing streams of data than theoretical statistics. Differences are not limited to knowledge or skillset, how data scientists and data analysts approach their work is also different.

Data analysts generally know what they are looking for as they begin their analysis. By this I mean, a data analyst may be given the results of a study of a new drug, and the researcher may ask the analyst to explore and hopefully quantify the impact of a new drug. A data analyst would have no problem performing this task. A data scientist on the other hand, could be given the task of analyzing locations of phone calls and finding any patterns that might exist. For the data scientist, the goal is often less defined than it is for a data analyst. In fact, I think this is the crux of the entire difference. Data scientists perform far more exploratory data analysis than their data analyst cousins. This difference in approach really explains the difference in skill sets. Data scientists have skill sets that are primarily geared toward extracting, storing and finding uses for data. The skill set to perform these tasks is the skill set of a data scientist. Data analysts primarily analyze data and their skill set reflects this.

Just to add one more little wrinkle, while calling a data scientist a data analyst is basically correct, calling a data analyst a data scientist is probably not correct. This is because the data scientist is going to have a handle on more of the skills required of a data analyst than a data analyst would of a data scientist. This is another reason there is so much confusion around this subject.

So now, hopefully, you can tell the difference between a data scientist and a data analyst. I don’t believe either field is superior to the other. If you are choosing between which field you would like to pursue, what’s important is that you choose the field that best compliments your skillset. Luckily it's hard to go wrong because both data scientists and analysts usually have interesting and rewarding careers.

*Erik Kappelman wears many hats including blogger, developer, data consultant, economist, and transportation planner. He lives in Helena, Montana and works for the Department of Transportation as a transportation demand modeler.*