官术网_书友最值得收藏!

Architecture and integration with applications

The architecture is well covered in the official documentation located at http://predictionio.incubator.apache.org/system/. However, we will expand on the important aspects a little more in this section so that we can completely understand the flexibility and the platform offering in detail.

The following diagram is from the official documentation of PredictionIO:

The key things to understand from the preceding diagram are as follows:

  • Event Server will provide a RESTful endpoint for all the applications to drop events in real time. For applications such as product recommender, events may include view data, for when a buyer views various products, an event when a buyer adds a product to a cart, an event from IOT devices, and so on. Event Server of the current version of PredictionIO can use PostgreSQL 9.1/MySQL 5.1 or Apache HBase/ElasticSearch for the event data store. PredictionIO allows different engines to be used in training, but many algorithms come from Spark's MLlib. For scalable and large data volume applications, it is better to consider Apache HBASE, which is an open source, distributed, versioned, and non-relational database capable of handling billions of transactions for the training of data.
  • Training: PredictionIO uses Apache Spark to train the dataset. Apache Spark has an extensive API support for developers using data structure and most of the templates use libraries such as SPARK MLlib to directly access machine learning functions developed by data scientists.
  • Prediction Server will be a RESTful endpoint to submit a query in real time and get predictive results. The output of the training has two parts: a model and its metadata. The model is then stored in Hadoop Distributed File System (HDFS--a local file system) or ElasticSearch.
HDFS  is a distributed filesystem from Hadoop; it allows the storage to be shared among clustered machines. It is used to stage data for the batch import into PredictionIO (PIO), for the export of Event Server datasets, and for the storage of some models. ElasticSearch is a distributed, RESTful search and analytics engine; it's at the core of the Elastic Stack and stores your data centrally so that you can discover the expected and uncover the unexpected.
主站蜘蛛池模板: 射阳县| 朝阳市| 阿鲁科尔沁旗| 海兴县| 普洱| 宁化县| 宁都县| 嘉义县| 三台县| 九龙城区| 石嘴山市| 东兴市| 长宁县| 奉新县| 墨脱县| 海安县| 万山特区| 松溪县| 台中县| 胶州市| 珲春市| 临泽县| 北京市| 辉南县| 南京市| 洪湖市| 绥阳县| 准格尔旗| 苍山县| 连平县| 延津县| 乌拉特中旗| 哈尔滨市| 会昌县| 东港市| 肇东市| 丹阳市| 东兴市| 建湖县| 巍山| 临洮县|