If science fiction stories are to be believed, teaching machines to learn will inevitably lead to apocalyptic wars between machines and their makers. In the early stages, computers are taught to play simple games of tic-tac-toe and chess. Later, machines are given control of traffic lights and communications, followed by military drones and missiles. The machines' evolution takes an ominous turn once the computers become sentient and learn how to teach themselves. Having no more need for human programmers, humankind is then "deleted."
Thankfully, at the time of this writing, machines still require user input.
Your impressions of machine learning may be very heavily influenced by these types of mass media depictions of artificial intelligence. And even though there may be a hint of truth to such tales; in reality, machine learning is focused on more practical applications. The task of teaching a computer to learn is tied more closely to a specific problem that would be a computer that can play games, ponder philosophy, or answer trivial questions. Machine learning is more like training an employee than raising a child.
Putting these stereotypes aside, by the end of this chapter, you will have gained a far more nuanced understanding of machine learning. You will be introduced to the fundamental concepts that define and differentiate the most commonly used machine learning approaches.
You will learn:
The origins and practical applications of machine learning
How knowledge is defined and represented by computers
The basic concepts that differentiate machine learning approaches
In a single sentence, you could say that machine learning provides a set of tools that use computers to transform data into actionable knowledge. To learn more about how the process works, read on.
Since birth, we are inundated with data. Our body's sensors—the eyes, ears, nose, tongue, and nerves—are continually assailed with raw data that our brain translates into sights, sounds, smells, tastes, and textures. Using language, we are able to share these experiences with others.
The earliest databases recorded information from the observable environment. Astronomers recorded patterns of planets and stars; biologists noted results from experiments crossbreeding plants and animals; and cities recorded tax payments, disease outbreaks, and populations. Each of these required a human being to first observe and second, record the observation. Today, such observations are increasingly automated and recorded systematically in ever-growing computerized databases.
The invention of electronic sensors has additionally contributed to an increase in the richness of recorded data. Specialized sensors see, hear, smell, or taste. These sensors process the data far differently than a human being would, and in many ways, this is a benefit. Without the need for translation into human language, the raw sensory data remains objective.
Tip
It is important to note that although a sensor does not have a subjective component to its observations, it does not necessarily report truth (if such a concept can be defined). A camera taking photographs in black and white might provide a far different depiction of its environment than one shooting pictures in color. Similarly, a microscope provides a far different depiction of reality than a telescope.
Between databases and sensors, many aspects of our lives are recorded. Governments, businesses, and individuals are recording and reporting all manners of information from the monumental to the mundane. Weather sensors record temperature and pressure data, surveillance cameras watch sidewalks and subway tunnels, and all manner of electronic behaviors are monitored: transactions, communications, friendships, and many others.
This deluge of data has led some to state that we have entered an era of Big Data, but this may be a bit of a misnomer. Human beings have always been surrounded by data. What makes the current era unique is that we have easy data. Larger and more interesting data sets are increasingly accessible through the tips of our fingers, only a web search away. We now live in a period with vast quantities of data that can be directly processed by machines. Much of this information has the potential to inform decision making, if only there was a systematic way of making sense from it all.
The field of study interested in the development of computer algorithms for transforming data into intelligent action is known as machine learning. This field originated in an environment where the available data, statistical methods, and computing power rapidly and simultaneously evolved. Growth in data necessitated additional computing power, which in turn spurred the development of statistical methods for analyzing large datasets. This created a cycle of advancement allowing even larger and more interesting data to be collected.

A closely related sibling of machine learning, data mining, is concerned with the generation of novel insight from large databases (not to be confused with the pejorative term "data mining," describing the practice of cherry-picking data to support a theory). Although there is some disagreement over how widely the two fields overlap, a potential point of distinction is that machine learning tends to be focused on performing a known task, whereas data mining is about the search for hidden nuggets of information. For instance, you might use machine learning to teach a robot to drive a car, whereas you would utilize data mining to learn what type of cars are the safest.
At its core, machine learning is primarily interested in making sense of complex data. This is a broadly applicable mission, and largely application agnostic. As you might expect, machine learning is used widely. For instance, it has been used to:
Predict the outcomes of elections
Identify and filter spam messages from e-mail
Foresee criminal activity
Automate traffic signals according to road conditions
Produce financial estimates of storms and natural disasters
Examine customer churn
Create auto-piloting planes and auto-driving cars
Identify individuals with the capacity to donate
Target advertising to specific types of consumers
For now, don't worry about exactly how the machines learn to perform these tasks; we will get into the specifics later. But across each of these contexts, the process is the same. A machine learning algorithm takes data and identifies patterns that can be used for action. In some cases, the results are so successful that they seem to reach near-legendary status.
One possibly apocryphal tale is of a large retailer in the United States, which employed machine learning to identify expectant mothers for targeted coupon mailings. If mothers-to-be were targeted with substantial discounts, the retailer hoped they would become loyal customers who would then continue to purchase profitable items like diapers, formula, and toys.
By applying machine learning methods to purchase data, the retailer believed it had learned some useful patterns. Certain items, such as prenatal vitamins, lotions, and washcloths could be used to identify with a high degree of certainty not only whether a woman was pregnant, but also when the baby was due.
After using this data for a promotional mailing, an angry man contacted the retailer and demanded to know why his teenage daughter was receiving coupons for maternity items. He was furious that the merchant seemed to be encouraging teenage pregnancy. Later on, as a manager called to offer an apology, it was the father that ultimately apologized; after confronting his daughter, he had discovered that she was indeed pregnant.
Whether completely true or not, there is certainly an element of truth to the preceding tale. Retailers, do in fact, routinely analyze their customers' transaction data. If you've ever used a shopper's loyalty card at your grocer, coffee shop, or another retailer, it is likely that your purchase data is being used for machine learning.
Retailers use machine learning methods for advertising, targeted promotions, inventory management, or the layout of the items in the store. Some retailers have even equipped checkout lanes with devices that print coupons for promotions based on the items in the current transaction. Websites also routinely do this to serve advertisements based on your web browsing history. Given the data from many individuals, a machine learning algorithm learns typical patterns of behavior that can then be used to make recommendations.
Despite being familiar with the machine learning methods working behind the scenes, it still feels a bit like magic when a retailer or website seems to know me better than I know myself. Others may be less thrilled to discover that their data is being used in this manner. Therefore, any person wishing to utilize machine learning or data mining would be remiss not to at least briefly consider the ethical implications of the art.
Due to the relative youth of machine learning as a discipline and the speed at which it is progressing, the associated legal issues and social norms are often quite uncertain and constantly in flux. Caution should be exercised when obtaining or analyzing data in order to avoid breaking laws, violating terms of service or data use agreements, abusing the trust, or violating privacy of the customers or the public.
Tip
The informal corporate motto of Google, an organization, which collects perhaps more data on individuals than any other, is "don't be evil." This may serve as a reasonable starting point for forming your own ethical guidelines, but it may not be sufficient.
Certain jurisdictions may prevent you from using racial, ethnic, religious, or other protected class data for business reasons, but keep in mind that excluding this data from your analysis may not be enough—machine learning algorithms might inadvertently learn this information independently. For instance, if a certain segment of people generally live in a certain region, buy a certain product, or otherwise behave in a way that uniquely identifies them as a group, some machine learning algorithms can infer the protected information from seemingly innocuous data. In such cases, you may need to fully "de-identify" these people by excluding any potentially identifying data in addition to the protected information.
Apart from the legal consequences, using data inappropriately may hurt your bottom line. Customers may feel uncomfortable or become spooked if aspects of their lives they consider private are made public. Recently, several high-profile web applications have experienced a mass exodus of users who felt exploited when the applications' terms of service agreements changed and their data was used for purposes beyond what the users had originally agreed upon. The fact that privacy expectations differ by context, by age cohort, and by locale, adds complexity to deciding the appropriate use of personal data. It would be wise to consider the cultural implications of your work before you begin on your project.