官术网_书友最值得收藏!

Introducing data science and Python

Data science is a relatively new knowledge domain, though its core components have been studied and researched for many years by the computer science community. Its components include linear algebra, statistical modeling, visualization, computational linguistics, graph analysis, machine learning, business intelligence, and data storage and retrieval.

Data science is a new domain, and you have to take into consideration that, currently, its frontiers are still somewhat blurred and dynamic. Since data science is made of various constituent sets of disciplines, please also keep in mind that there are different profiles of data scientists depending on their competencies and areas of expertise (for instance, you may read the illustrative There’s More Than One Kind of Data Scientist by Harlan D Harris at radar.oreilly.com/2013/06/theres-more-than-one-kind-of-data-scientist.html, or delve into the discussion about type A or B data scientists and other interesting taxonomies at https://stats.stackexchange.com/questions/195034/what-is-a-data-scientist).

In such a situation, what can be the best tool of the trade that you can learn and effectively use in your career as a data scientist? We believe that the best tool is Python, and we intend to provide you with all the essential information that you will need for a quick start.

In addition, other programming languages such as R and MATLAB provide data scientists with specialized tools to solve specific problems in statistical analysis and matrix manipulation in data science. However, only Python really completes your data scientist skill set with all the key techniques in a scalable and effective way. This multipurpose language is suitable for both development and production alike; it can handle small- to large-scale data problems and it is easy to learn and grasp, no matter what your background or experience is.

Created in 1991 as a general-purpose, interpreted, and object-oriented language, Python has slowly and steadily conquered the scientific community and grown into a mature ecosystem of specialized packages for data processing and analysis. It allows you to have uncountable and fast experimentations, easy theory development, and prompt deployment of scientific applications.

At present, the core Python characteristics that render it an indispensable data science tool are as follows:

  • It offers a large, mature system of packages for data analysis and machine learning. It guarantees that you will get all that you may need in the course of a data analysis, and sometimes even more.
  • Python can easily integrate different tools and offers a truly unifying ground for different languages, data strategies, and learning algorithms that can be fitted together easily and which can concretely help data scientists forge powerful solutions. There are packages that allow you to call code in other languages (in Java, C, Fortran, R, or Julia), outsourcing some of the computations to them and improving your script performance.
  • It is very versatile. No matter what your programming background or style is (object-oriented, procedural, or even functional), you will enjoy programming with Python.
  • It is cross-platform; your solutions will work perfectly and smoothly on Windows, Linux, and macOS systems. You won't have to worry all that much about portability.
  • Although interpreted, it is undoubtedly fast compared to other mainstream data analysis languages such as R and MATLAB (though it is not comparable to C, Java, and the newly emerged Julia language). Moreover, there are also static compilers such as Cython or just-in-time compilers such as PyPy that can transform Python code into C for higher performance.
  • It can work with large in-memory data because of its minimal memory footprint and excellent memory management. The memory garbage collector will often save the day when you load, transform, dice, slice, save, or discard data using various iterations and reiterations of data wrangling.
  • It is very simple to learn and use. After you grasp the basics, there's no better way to learn more than by immediately starting with the coding.
  • Moreover, the number of data scientists using Python is continuously growing: new packages and improvements have been released by the community every day, making the Python ecosystem an increasingly prolific and rich language for data science.
主站蜘蛛池模板: 松滋市| 舞阳县| 西青区| 出国| 溧水县| 临清市| 和龙市| 旬阳县| 聂荣县| 九龙坡区| 丰县| 夏邑县| 思茅市| 怀柔区| 武清区| 桦甸市| 湘阴县| 秀山| 汉阴县| 滦平县| 溧水县| 天镇县| 丰镇市| 辽阳市| 松阳县| 虞城县| 镇赉县| 五寨县| 尚义县| 汕尾市| 樟树市| 普宁市| 锡林郭勒盟| 突泉县| 札达县| 罗平县| 永丰县| 比如县| 宜黄县| 孝义市| 泽库县|