官术网_书友最值得收藏!

Chapter 1. Understanding R's Performance – Why Are R Programs Sometimes Slow?

R is a great tool used for statistical analysis and data processing. When it was first developed in 1993, it was designed as a tool that would teach data analysis courses. Because it is so easy to use, it became more and more popular over the next 20 years, not only in academia, but also in government and industry. R is also an open source tool, so its users can use it for free and contribute new statistical packages to the R public repository called the Comprehensive R Archive Network (CRAN). As the CRAN library became richer with more than 6,000 well-documented and ready-to-use packages at the time of writing this book, the attractiveness of R increased even further. In these 20 years, the volume of data being created, transmitted, stored, and analyzed, by organizations and individuals alike, has also grown exponentially. R programmers who need to process and analyze the ever growing volume of data sometimes find that R's performance suffers under such heavy loads. Why does R sometimes not perform well, and how can we overcome its performance limitations? This book examines the factors behind R's performance and offers a variety of techniques to improve the performance of R programs, for example, optimizing memory usage, performing computations in parallel, or even tapping the computing power of external data processing systems.

Before we can find the solutions to R's performance problems, we need to understand what makes R perform poorly in certain situations. This chapter kicks off our exploration of the high-performance R programming by taking a peek under the hood to understand how R is designed, and how its design can limit the performance of R programs.

We will examine three main constraints faced by any computational task—CPU, RAM, and disk input/output (I/O)—and then look at how these play out specifically in R programs. By the end of this chapter, you will have some insights into the bottlenecks that your R programs could run into.

This chapter covers the following topics:

  • Three constraints on computing performance—CPU, RAM, and disk I/O
  • R is interpreted on the fly
  • R is single-threaded
  • R requires all data to be loaded into memory
  • Algorithm design affects time and space complexity
主站蜘蛛池模板: 松原市| 巴南区| 磴口县| 喜德县| 沂南县| 原阳县| 永善县| 富锦市| 乡宁县| 北京市| 晴隆县| 池州市| 阿尔山市| 鸡泽县| 石棉县| 阿荣旗| 连州市| 伊春市| 汝南县| 大宁县| 济南市| 彰化县| 中江县| 彰武县| 临沧市| 灵武市| 鄂托克旗| 湟中县| 道真| 宁国市| 信丰县| 高州市| 华阴市| 花莲县| 屏边| 商都县| 延川县| 额尔古纳市| 固镇县| 米易县| 石首市|