- Large Scale Machine Learning with Python
- Bastiaan Sjardin Luca Massaron Alberto Boschetti
- 219字
- 2021-07-14 10:39:49
Summary
In this chapter, we have seen how learning is possible out-of-core by streaming data, no matter how big it is, from a text file or database on your hard disk. These methods certainly apply to much bigger datasets than the examples that we used to demonstrate them (which actually could be solved in-memory using non-average, powerful hardware).
We also explained the core algorithm that makes out-of-core learning possible—SGD—and we examined its strength and weakness, emphasizing the necessity of streams to be really stochastic (which means in a random order) to be really effective, unless the order is part of the learning objectives. In particular, we introduced the Scikit-learn implementation of SGD, limiting our focus to the linear and logistic regression loss functions.
Finally, we discussed data preparation, introduced the hashing trick and validation strategies for streams, and wrapped up the acquired knowledge on SGD fitting two different models—classification and regression.
In the next chapter, we will keep on enriching our out-of-core capabilities by figuring out how to enable non-linearity in our learning schema and hinge loss for support vector machines. We will also present alternatives to Scikit-learn, such as Liblinear, Vowpal Wabbit, and StreamSVM. Although operating as external shell commands, all of them could be easily wrapped and controlled by Python scripts.
- 筆記本電腦使用、維護(hù)與故障排除實戰(zhàn)
- Linux KVM虛擬化架構(gòu)實戰(zhàn)指南
- 顯卡維修知識精解
- 硬件產(chǎn)品經(jīng)理成長手記(全彩)
- 嵌入式系統(tǒng)設(shè)計教程
- Large Scale Machine Learning with Python
- 計算機(jī)組裝與維護(hù)(第3版)
- Rapid BeagleBoard Prototyping with MATLAB and Simulink
- 筆記本電腦維修實踐教程
- VMware Workstation:No Experience Necessary
- RISC-V處理器與片上系統(tǒng)設(shè)計:基于FPGA與云平臺的實驗教程
- 基于PROTEUS的電路設(shè)計、仿真與制板
- 3D Printing Blueprints
- 筆記本電腦芯片級維修從入門到精通(圖解版)
- 計算機(jī)組裝與維護(hù)(慕課版)