Explain the differences and differences between data science, machine learning, and artificial intelligence

Editor's note: If you want to write 3 pieces of the same code, you'd better call the function; if you are asked 3 times the same question in person, you'd better write an article... data science, machine learning, artificial intelligence, As these words appear frequently in the public eye, some misunderstandings like "programmers = repairing computers" have gradually become commonplace. So, what is the difference between the three? Recently, David Robinson, a data scientist at Stack Overflow, was so misunderstood that he wrote a special article explaining the difference between these three terms. Let me take a look at his understanding.

When I introduced myself to a data scientist, I often encountered "What is the difference between machine learning and data science?" "Are you engaged in artificial intelligence?" I answered this question over and over again. But the so-called thing is not three, now I can't stand it.

Admittedly, there is indeed a lot of overlap in these areas, coupled with the continuous bundled marketing hype of the media, it is easy for people to misunderstand them as the same thing. But in fact, the three concepts of data science, machine learning, and artificial intelligence are not confusing: professionals in most fields have an intuitive understanding of the differences, but once they are described in terms of language, this thing becomes The difficulties are heavy.

So in this article, I want to talk about a simple definition of the differences between these three areas:

Data science produces insights;

Machine learning produces predictions;

Artificial intelligence produces behavior.

It's important to note that these definitions are only general: something that fits these definitions may not be categorized into the corresponding domain (the fortune teller predicts people every day, but I won't say they are doing machine learning); these definitions Nor is it a measure of the direction and position of a person ("Is it a data scientist?"); it contains the focus and experience of each of the three (any job is like this: writing an article is part of my job, but I am not a professional writer).

Although "rigorous" is not enough, I still think that these definitions are effective ways to distinguish the three concepts of data science, machine learning, and artificial intelligence. It can at least make you not a layman when chatting. In fact, I will only describe it in this article and do not define it. I am completely uninterested in telling you the terms "what should be meant," but I will tell you that people in the field will be interested in what to do with these terms.

Data science produces insights

Data science is significantly different from machine learning and artificial intelligence. It focuses on human goals: getting insights and understanding. In this regard, Jeff Leek gave a good definition in the Types of Data Science Questions, he believes that data science should include descriptive (such as "the average customer's renewal probability is 70%"), Exploratory (such as "different renewal rates for customers in different sales staff") and causality (research shows that customers who are assigned to Xiao Ming have a higher renewal rate than Xiaohong).

Of course, not all data with certain insights belong to the field of data science. From the perspective of disciplines, data science should be a combination of statistics, software engineering and related fields, but this can make it and machine learning, AI obvious differentiate. One of the main differences between the three is that in data science, people are an indispensable part of the cycle: the algorithm draws numbers and results, and people get insights from them and mine the causes. For machine learning, DeepMind's Go algorithm does not need to rely on people to choose the next step; for AI, Google Maps does not need people to help when recommending driving directions.

Therefore, data science emphasizes:

Statistical reasoning

data visualization;

experimental design;

Professional domain knowledge;

communication.

Data scientists may use some simple tools: calculate percentages and make line graphs based on SQL queries. They also use very sophisticated methods: analyzing trillions of data with distributed data storage, developing cutting-edge statistical techniques, and building visual interaction tools. No matter what they do, how to do it, the goal is to better interpret the data.

Machine learning produces predictions

I think machine learning is characterized by generating predictions: Given a sample X with a particular characteristic, predicting Y. These predictions may be about the future (such as predicting the patient's disease progression), or it may be about the vulnerable areas of the computer (such as predicting whether there are birds in the image). Almost all of Kaggle's project competitions can be considered machine learning problems: they provide some training data to see if the contestant's model can accurately predict new samples based on the data set.

There are many intersections between data science and machine learning, and logistic regression is one of them. For example, we can use a logistic regression algorithm to analyze customers: the richer a customer is, the more likely he is to buy our goods, then we need to change the marketing strategy accordingly. So how do you develop a marketing strategy? One standard that can be referenced is the prediction of the algorithm: the customer's purchase probability is 53%, so we should sell the product to him.

But data science and machine learning are different after all. Models like random forests are difficult to explain in data science, but they are one of the simplest foundation models in machine learning, and the content of deep learning is more difficult to understand. If your goal is to extract insights rather than make predictions, then machine learning is not for you. So we can draw a simple boundary between data science and machine learning: data science is more inclined to use interpretable models, and machine learning is more about "black box" models.

In fact, people in most areas can easily switch back and forth between the two. For example, I often use data science and machine learning in my work: I will build a model based on the business data of Stack Overflow. Predict which users might be looking for a job (machine learning), but at the same time I need to give a summary and visual test (data science) on why the model works. This is an important way to find model flaws and algorithmic deviations, and is one of the reasons why data scientists are often responsible for developing machine learning components for products.

Artificial intelligence

So far, artificial intelligence can be regarded as the oldest and most socially recognized area among the three, so it is challenging to define it. When it comes to artificial intelligence, the most intuitive feeling is hype, researchers, media, startups... artificial intelligence is a hotbed of hype, and with it, you can gain fame, heat and money.

If you want to finance, write AI;

If you want to recruit, write ML;

If you want to achieve, linear regression;

If you are in debugging, printf().

This has led to my pessimistic view of it, because this kind of "all things are AI" hype will make some of the basic work that should be used as AI content "homeless." In this regard, some researchers have begun to complain about the AI â€‹â€‹effect: "We can't achieve AI at all." So, what work can be considered as part of AI?

In "Computational Intelligence" published by Poole, Mackworth, and Goebel in 1998, and in "Artificial Intelligence: Modern Methods" published by Stuart Russell and Peter Norvig in 2003, they define "artificial intelligence" that has one thing in common. First we need an agent that mimics human intelligence, and secondly it can perform tasks autonomously and can respond based on behavior. So here are the things I think should be described as AI:

Game-playing algorithms, such as AlphaGo;

Robotics and cybernetics;

Optimize, such as Google Maps to choose driving directions;

Natural language processing;

Reinforce learning.

Similarly, artificial intelligence has a lot of cross-cutting content with the other two areas, especially the cross-border achievements of deep learning in machine learning and artificial intelligence. A typical use of deep learning is to train based on data and then make predictions. This is machine learning, but its model is also very successful in the game. Unlike the supercomputer "deep blue" that used to calculate the power and simple and rude calculations, AlphaGo has no low computing power requirements, but it is no longer exhaustive, but focuses on exploring and optimizing the future of the solution. space.

But artificial intelligence is also very different. If I'm using the model to analyze some sales data and find that customers in a particular industry have a higher renewal rate than customers in other industries, then I will output some of its numbers and charts instead of simply the next conclusion (although the supervisor will This conclusion is needed and the sales strategy is adjusted based on it, but this series of actions is not autonomous). In this case, what I am doing is called data science.

please! Millions! Millions! Never say: I am using AI to increase sales! (Please contact the advertising agencies of some financial institutions to judge by themselves)

There are also subtle differences between artificial intelligence and machine learning. In the past, we used machine learning as a sub-area of â€‹â€‹artificial intelligence, especially computer vision, which is a classic problem of the latter. But now, I think that machine learning has largely been stripped from artificial intelligence. One of the reasons is the resentment of practitioners: Most people who are engaged in machine learning are reluctant to describe themselves as AI researchers (many machines Learning breakthroughs are all based on statistics). Independence means that you can describe the problem as "predicting X from Y" rather than using metaphorical vocabulary like AI.

According to today's definition, y=mx+b has become an artificial intelligence robot because it tells you where the function line will go.

Case study: a combination of the three

Suppose we are building a self-driving car and need to study the car identification stop sign, then we need to combine the knowledge of these three areas.

Machine learning: The car must use the camera to identify the stop sign. We built a dataset containing millions of streetscape images containing road signs and trained an algorithm based on it to accurately identify stop signs.

Artificial intelligence: Once our car recognizes the stop sign, it must be able to determine for itself when to brake. Too early and late is very dangerous, and it also has to take into account road conditions (such as rain and snow weather smooth road), which is a cybernetic problem.

Data Science: In the actual road test, we found that the car performance is not good enough, because the parking sign has a lot of negative factors that lead to errors. After analyzing the drive test data, we have come to the conclusion that the false negative rate (missing rate) depends on the time: before sunrise and after sunset, the car is easy to miss the road sign. From this we found that most of the training data is all-weather, and the car did not train for the night environment, so we collected a lot of night parking sign images and returned to the machine learning steps.

Gear Sensor

Gear Sensor has been widely used in the automotive and industrial field, which is important to the measurement of velocity, angel, angular velocity, direction of rotation.

Gear Sensor,Custom Gear Sensor,Gear Sensor 3 Pins,Good Gear Sensor

Yuheng Optics Co., Ltd.(Changchun) , https://www.yhencoder.com