官术网_书友最值得收藏!

Learning from Big Data

In the first two chapters, we set the context for intelligent machines with the big data revolution and how big data is fueling rapid advances in artificial intelligence. We also emphasized the need for a global vocabulary for universal knowledge representation. We have also seen how that need is fulfilled with the use of ontologies and how ontologies help construct a semantic view of the world.

The quest is for the knowledge, which is derived from information, which is in turn derived from the vast amounts of data that we are generating. Knowledge facilitates a rational decision-making process for machines that complements and augments human capabilities. We have seen how the Resource Description Framework (RDF) provides the schematic backbone for the knowledge assets along with Web Ontology Language (OWL) fundamentals and the query language for RDFs (SPARQL).

In this chapter, we are going to look at some of the basic concepts of machine learning and take a deep pe into some of the algorithms. We will use Spark's machine learning libraries. Spark is one of the most popular computer frameworks for the implementation of algorithms and as a generic computation engine on big data. Spark fits into the big data ecosystem well, with a simple programming interface, and very effectively leverages the power of distributed and resilient computing frameworks. Although this chapter does not assume any background with statistics and mathematics, it will greatly help if the reader has some programming background, in order to understand the code snippets and to try and experiment with the examples.

In this chapter, we will see broad categories of machine learning in supervised and unsupervised learning, before taking a deep pe, with examples, into:

  • Regression analysis
  • Data clustering
  • K-means
  • Data dimensionality reduction
  • Singular value decomposition
  • Principal component analysis (PCA)

In the end, we will have an overview of the Spark programming model and Spark's Machine Learning library (Spark MLlib). With all this background knowledge at our disposal, we will implement a recommendation system to conclude this chapter. 

主站蜘蛛池模板: 林周县| 茌平县| 大足县| 沈丘县| 惠水县| 军事| 克东县| 克山县| 高阳县| 桂平市| 聊城市| 泰宁县| 金湖县| 阜新| 石渠县| 泗水县| 张家港市| 离岛区| 台山市| 扎赉特旗| 蓬安县| 广宗县| 商水县| 涿鹿县| 汽车| 隆回县| 加查县| 丽水市| 南岸区| 苗栗县| 团风县| 雷山县| 基隆市| 武夷山市| 从化市| 霍城县| 宿松县| 常山县| 虹口区| 黎城县| 和田县|