官术网_书友最值得收藏!

Preface

Apache Spark has captured the imagination of the analytics and big data developers, rightfully so. In a nutshell, Spark enables distributed computing at scale in the lab or in production. Until now, the collect-store-transform pipeline was distinct from the data science Reason-Model pipeline , which was again distinct from the deployment of the analytics and machine learning models. Now with Spark and technologies such as Kafka, we can seamlessly span the data management and data science pipelines. Moreover, now we can build data science models on larger datasets and need not just sample data. And whatever models we build can be deployed into production (with added work from engineering on the “ilities”, of course). It is our hope that this book will enable a data engineer to get familiar with the fundamentals of the Spark platform as well as provide hands-on experience of some of the advanced capabilities.

主站蜘蛛池模板: 长乐市| 皮山县| 太保市| 阿拉尔市| 信宜市| 昭苏县| 孟村| 临朐县| 新营市| 元朗区| 大同县| 廉江市| 太原市| 焉耆| 武冈市| 太仓市| 当雄县| 同德县| 石林| 彩票| 邹平县| 出国| 大埔区| 定安县| 荔波县| 迁安市| 刚察县| 盈江县| 定安县| 积石山| 蓝山县| 石柱| 禄丰县| 大同县| 桐柏县| 汾西县| 固镇县| 洛浦县| 洛阳市| 青龙| 苏尼特右旗|