官术网_书友最值得收藏!

  • Practical Data Wrangling
  • Allan Visochek
  • 262字
  • 2021-07-02 15:16:05

R

R is both a programming language and an environment built specifically for statistical computing. This definition has been taken from the R website, r-project.org/about.html:

The term 'environment' is intended to characterize [R] as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software.

In other words, one of the major differences between R and Python is that some of the most common functionalities for working with data--data handling and storage, visualization, statistical computation, and so on--come built in. A good example of this is linear modeling, a basic statistical method for modelling numerical data.

In R, linear modeling is a built-in functionality that is made very intuitive and straightforward, as we will see in Chapter 5, Manipulating Text Data - An Introduction to Regular Expressions. There are a number of ways to do linear modeling in Python, but they all require using external libraries and often doing extra work to get the data in the right format.

R also has a built-in data structure called a dataframe that can make manipulation of tabular data more intuitive. 

The big takeaway here is that there are benefits and trade-offs to both languages. In general, being able to use the right tool for the job can save an immense amount of time spent on data wrangling. It is therefore quite useful as a data programmer to have a good working knowledge of each language and know when to use one or the other. 

主站蜘蛛池模板: 桦南县| 隆子县| 宜兰市| 黑龙江省| 东乡族自治县| 垣曲县| 大姚县| 旺苍县| 泗洪县| 玛纳斯县| 松阳县| 枣强县| 北流市| 阿鲁科尔沁旗| 叶城县| 永丰县| 雷波县| 五家渠市| 洞头县| 宣化县| 株洲市| 涿州市| 焦作市| 商城县| 旬邑县| 宜宾市| 温宿县| 正镶白旗| 盐边县| 西藏| 平湖市| 建瓯市| 湘西| 临猗县| 光泽县| 朝阳市| 芦溪县| 沛县| 罗源县| 闻喜县| 顺义区|