官术网_书友最值得收藏!

Preface

Apache Spark has captured the imagination of the analytics and big data developers, rightfully so. In a nutshell, Spark enables distributed computing at scale in the lab or in production. Until now, the collect-store-transform pipeline was distinct from the data science Reason-Model pipeline , which was again distinct from the deployment of the analytics and machine learning models. Now with Spark and technologies such as Kafka, we can seamlessly span the data management and data science pipelines. Moreover, now we can build data science models on larger datasets and need not just sample data. And whatever models we build can be deployed into production (with added work from engineering on the “ilities”, of course). It is our hope that this book will enable a data engineer to get familiar with the fundamentals of the Spark platform as well as provide hands-on experience of some of the advanced capabilities.

主站蜘蛛池模板: 大余县| 郴州市| 远安县| 中卫市| 桑植县| 蚌埠市| 赤城县| 西乌| 临澧县| 高青县| 井研县| 眉山市| 桂林市| 都兰县| 谷城县| 乐都县| 太保市| 化隆| 龙岩市| 探索| 时尚| 海口市| 建阳市| 马山县| 扶风县| 临清市| 康定县| 修水县| 嘉善县| 陆丰市| 师宗县| 出国| 博乐市| 通化县| 肥城市| 米易县| 正镶白旗| 桃源县| 大余县| 盐边县| 长垣县|