- Hands-On Big Data Analytics with PySpark
- Rudy Lai Bart?omiej Potaczek
- 111字
- 2021-06-24 15:52:34
The UCI machine learning repository
We can access the UCI machine learning repository by navigating to https://archive.ics.uci.edu/ml/. So, what is the UCI machine learning repository? UCI stands for the University of California Irvine machine learning repository, and it is a very useful resource for getting open source and free datasets for machine learning. Although PySpark's main issue or solution doesn't concern machine learning, we can use this as a chance to get big datasets that help us test out the functions of PySpark.
Let's take a look at the KDD Cup 1999 dataset, which we will download, and then we will load the whole dataset into PySpark.
推薦閱讀
- 數據要素安全流通
- 大規模數據分析和建模:基于Spark與R
- Hands-On Data Structures and Algorithms with Rust
- 程序員修煉之道:從小工到專家
- 算法競賽入門經典:習題與解答
- Java Data Science Cookbook
- SQL Server 2012數據庫技術與應用(微課版)
- Mastering Ninject for Dependency Injection
- 云計算環境下的信息資源集成與服務
- Voice Application Development for Android
- 企業大數據系統構建實戰:技術、架構、實施與應用
- 圖解機器學習算法
- Power BI商業數據分析完全自學教程
- Construct 2 Game Development by Example
- 二進制分析實戰