官术网_书友最值得收藏!

Summary

In this chapter, we have seen how learning is possible out-of-core by streaming data, no matter how big it is, from a text file or database on your hard disk. These methods certainly apply to much bigger datasets than the examples that we used to demonstrate them (which actually could be solved in-memory using non-average, powerful hardware).

We also explained the core algorithm that makes out-of-core learning possible—SGD—and we examined its strength and weakness, emphasizing the necessity of streams to be really stochastic (which means in a random order) to be really effective, unless the order is part of the learning objectives. In particular, we introduced the Scikit-learn implementation of SGD, limiting our focus to the linear and logistic regression loss functions.

Finally, we discussed data preparation, introduced the hashing trick and validation strategies for streams, and wrapped up the acquired knowledge on SGD fitting two different models—classification and regression.

In the next chapter, we will keep on enriching our out-of-core capabilities by figuring out how to enable non-linearity in our learning schema and hinge loss for support vector machines. We will also present alternatives to Scikit-learn, such as Liblinear, Vowpal Wabbit, and StreamSVM. Although operating as external shell commands, all of them could be easily wrapped and controlled by Python scripts.

主站蜘蛛池模板: 策勒县| 屏南县| 阳朔县| 武宣县| 孙吴县| 玛纳斯县| 兰考县| 灵石县| 红河县| 山东省| 武定县| 南皮县| 济源市| 东城区| 宁化县| 崇州市| 车险| 东阿县| 宁南县| 安达市| 中牟县| 什邡市| 揭东县| 安龙县| 利川市| 公安县| 长垣县| 屏山县| 南丰县| 临清市| 鹿邑县| 姜堰市| 永济市| 陇川县| 应城市| 普兰店市| 闻喜县| 班戈县| 阳朔县| 南江县| 鞍山市|