官术网_书友最值得收藏!

Python Libraries

Having now seen all the basics of Jupyter Notebooks, and even some more advanced features, we'll shift our attention to the Python libraries we'll be using in this book. Libraries, in general, extend the default set of Python functions. Examples of commonly used standard libraries are datetime, time, and os. These are called standard libraries because they come standard with every installation of Python.

For data science with Python, the most important libraries are external, which means they do not come standard with Python.

 The external data science libraries we'll be using in this book are NumPy, Pandas, Seaborn, matplotlib, scikit-learn, Requests, and Bokeh. Let's briefly introduce each.

It's a good idea to import libraries using industry standards, for example, import numpy as np; this way, your code is more readable. Try to avoid doing things such as from numpy import *, as you may unwittingly overwrite functions. Furthermore, it's often nice to have modules linked to the library via a dot (. ) for code readability.
  • NumPy offers multi-dimensional data structures (arrays) on which operations can be performed far quicker than standard Python data structures (for example, lists). This is done in part by performing operations in the background using C. NumPy also offers various mathematical and data manipulation functions.
  • Pandas is Python's answer to the R DataFrame. It stores data in 2-D tabular structures where columns represent different variables and rows correspond to samples. Pandas provides many handy tools for data wrangling such as filling in NaN entries and computing statistical descriptions of the data. Working with Pandas DataFrames will be a big focus of this book.
  • Matplotlib is a plotting tool inspired by the MATLAB platform. Those familiar with R can think of it as Python's version of ggplot. It's the most popular Python library for plotting figures and allows for a high level of customization.
  • Seaborn works as an extension to matplotlib, where various plotting tools useful for data science are included. Generally speaking, this allows for analysis to be done much faster than if you were to create the same things manually with libraries such as matplotlib and scikit-learn.
  • Scikit-learn is the most commonly used machine learning library. It offers top-of the-line algorithms and a very elegant API where models are instantiated and then fit with data. It also provides data processing modules and other tools useful for predictive analytics.
  • Requests is the go-to library for making HTTP requests. It makes it straightforward to get HTML from web pages and interface with APIs. For parsing the HTML, many choose BeautifulSoup4, which we will also cover in this book.
  • Bokeh is an interactive visualization library. It functions similar to matplotlib, but allows us to add hover, zoom, click, and use other interactive tools to our plots. It also allows us to render and play with the plots inside our Jupyter Notebook.

Having introduced these libraries, let's go back to our Notebook and load them, by running the import statements. This will lead us into our first analysis, where we finally start working with a dataset.

主站蜘蛛池模板: 齐齐哈尔市| 宁远县| 莒南县| 仙游县| 德江县| 洛浦县| 清原| 社会| 盐池县| 沙雅县| 尼木县| 安西县| 拜城县| 开远市| 泊头市| 故城县| 丹凤县| 钟山县| 贵港市| 瓦房店市| 盐城市| 柯坪县| 沛县| 关岭| 肃北| 海兴县| 河池市| 贞丰县| 额尔古纳市| 武陟县| 化隆| 绥芬河市| 平谷区| 肥城市| 宜兰市| 海林市| 靖西县| 塘沽区| 抚宁县| 东丰县| 延津县|