官术网_书友最值得收藏!

R requires all data to be loaded into memory

All data that is processed in R has to be fully loaded into the RAM. This means that once the data has been loaded, all of it is available for processing by the CPU, which is great for performance. On the other hand, it also means that the maximum size of data that you can process depends on the amount of free RAM available on your system. Remember that not all the RAM on your computer is available to R. The operating system, background processes, and any other applications that are running in the CPU also compete for the RAM. What is available for R to use might be a fraction of the total RAM installed on the system.

On top of that, R also requires free RAM to store the results of its computations. Depending on what kinds of computations you are performing, you might need the available RAM to be twice or even more times as large as the size of your data.

32-bit versions of R are also limited by the amount of RAM they can access. Depending on the operating system, they might be limited to 2 GB to 4 GB of RAM even when there is actually more RAM available. Furthermore, due to memory address limits, data structures in 32-bit versions of R can contain at most 231-1 = 2,147,483,647 elements. Because of these limits, you should use the 64-bit versions of R whenever you can.

Note

In all versions of R prior to 3.0, even 64-bit versions, vectors and other data structures faced this 2,147,483,647-element limit. If you have data that exceeds this size, you need to use a 64-bit version of R 3.0 or one of its later versions.

What happens when we try to load a dataset that is larger than the available RAM? Sometimes, the data loads successfully, but once the available RAM is used up, the operating system starts to swap the data in RAM into a swapfile on the hard disk. This is not a feature of R; it depends on the operating system. When this happens, R thinks that all the data has been loaded into the RAM when in fact the operating system is hard at work in the background swapping data between RAM and the swapfile on the disk. When such a situation occurs, we have a disk I/O bottleneck on top of the memory bottleneck. Because disk I/O is so slow (hard drive's speed is typically measured in milliseconds, while RAM's speed in nanoseconds), it can cause R to appear as if it is frozen or becomes unresponsive. Of the three performance limitations we looked at, disk I/O often has the largest impact on R's performance.

Chapter 6, Simple Tweaks to Use Less RAM and Chapter 7, Processing Large Datasets with Limited RAM will discuss how to optimize memory usage and work with datasets that are too large to fit into the memory.

主站蜘蛛池模板: 灌阳县| 巴塘县| 通渭县| 新疆| 织金县| 陈巴尔虎旗| 遵义市| 三都| 太仓市| 嵊泗县| 宁德市| 尉氏县| 平潭县| 韶关市| 勃利县| 连州市| 文成县| 黄骅市| 大庆市| 平邑县| 北安市| 北碚区| 宜宾市| 阿拉善右旗| 佛坪县| 商河县| 涞源县| 蒙山县| 那曲县| 葵青区| 西峡县| 贺兰县| 浑源县| 光泽县| 册亨县| 修文县| 聂拉木县| 嫩江县| 泾源县| 云阳县| 商河县|