官术网_书友最值得收藏!

Summary

In this chapter, we have discussed many ways to prepare data for machine learning and other forms of AI. Raw data from source systems had to be transported across the data layers of a modern data lake, including a historical data archive, a set of (virtualized) analytics datasets, and a machine learning environment. There are several tools for creating such a data pipeline: simple scripts and traditional software, ETL tools, big data processing frameworks, and streaming data engines.

We have also introduced the concept of feature engineering. This is an important piece of work in any AI system, where data is prepared to be consumed by a machine learning model. Independent of the programming language and frameworks that are chosen for this, an AI team has to spend significant time writing the features and ensuring that the resulting code and binaries are well managed and deployed, together with the models themselves.

We have performed exercises and activities where we have worked with Bash scripts, Jupyter Notebooks, Spark, and finally, stream processing with live Twitter data.

In the next chapter, we will look into a less technical but very important topic for data engineering and machine learning: the ethics of AI.

主站蜘蛛池模板: 师宗县| 礼泉县| 呼玛县| 商城县| 额敏县| 济南市| 青铜峡市| 临邑县| 南部县| 泽州县| 诏安县| 和硕县| 长兴县| 滨州市| 皮山县| 高平市| 德江县| 乌兰浩特市| 南郑县| 铜梁县| 壶关县| 平阳县| 寿光市| 大石桥市| 夹江县| 同德县| 中西区| 庆阳市| 长岭县| 金华市| 新竹市| 汉中市| 迭部县| 仁布县| 清河县| 高雄县| 东方市| 靖安县| 花莲市| 屏南县| 顺义区|