官术网_书友最值得收藏!

Architecture and integration with applications

The architecture is well covered in the official documentation located at http://predictionio.incubator.apache.org/system/. However, we will expand on the important aspects a little more in this section so that we can completely understand the flexibility and the platform offering in detail.

The following diagram is from the official documentation of PredictionIO:

The key things to understand from the preceding diagram are as follows:

  • Event Server will provide a RESTful endpoint for all the applications to drop events in real time. For applications such as product recommender, events may include view data, for when a buyer views various products, an event when a buyer adds a product to a cart, an event from IOT devices, and so on. Event Server of the current version of PredictionIO can use PostgreSQL 9.1/MySQL 5.1 or Apache HBase/ElasticSearch for the event data store. PredictionIO allows different engines to be used in training, but many algorithms come from Spark's MLlib. For scalable and large data volume applications, it is better to consider Apache HBASE, which is an open source, distributed, versioned, and non-relational database capable of handling billions of transactions for the training of data.
  • Training: PredictionIO uses Apache Spark to train the dataset. Apache Spark has an extensive API support for developers using data structure and most of the templates use libraries such as SPARK MLlib to directly access machine learning functions developed by data scientists.
  • Prediction Server will be a RESTful endpoint to submit a query in real time and get predictive results. The output of the training has two parts: a model and its metadata. The model is then stored in Hadoop Distributed File System (HDFS--a local file system) or ElasticSearch.
HDFS  is a distributed filesystem from Hadoop; it allows the storage to be shared among clustered machines. It is used to stage data for the batch import into PredictionIO (PIO), for the export of Event Server datasets, and for the storage of some models. ElasticSearch is a distributed, RESTful search and analytics engine; it's at the core of the Elastic Stack and stores your data centrally so that you can discover the expected and uncover the unexpected.
主站蜘蛛池模板: 乌鲁木齐市| 巴里| 玛曲县| 藁城市| 始兴县| 怀宁县| 三原县| 金坛市| 安西县| 河南省| 九龙县| 固阳县| 平度市| 东方市| 万源市| 伊宁市| 津南区| 闻喜县| 开鲁县| 平泉县| 日照市| 云安县| 夏河县| 岐山县| 浮梁县| 云霄县| 江阴市| 依安县| 益阳市| 南汇区| 利川市| 南平市| 唐山市| 阿克苏市| 渑池县| 辽阳市| 共和县| 广水市| 江城| 陆河县| 柳河县|