- Data Analysis with R
- Tony Fischetti
- 372字
- 2021-07-30 09:55:12
Univariate data
In this chapter, we are going to deal with univariate data, which is a fancy way of saying samples of one variable—the kind of data that goes into a single R vector. Analysis of univariate data isn't concerned with the why questions—causes, relationships, or anything like that; the purpose of univariate analysis is simply to describe.
In univariate data, one variable—let's call it x—can represent categories like soy ice cream flavors, heads or tails, names of cute classmates, the roll of a die, and so on. In cases like these, we call x a categorical variable.
> categorical.data <- c("heads", "tails", "tails", "heads")
Categorical data is represented, in the preceding statement, as a vector of character type. In this particular example, we could further specify that this is a binary or dichotomous variable, because it only takes on two values, namely, "heads" and "tails."
Our variable x could also represent a number like air temperature, the prices of financial instruments, and so on. In such cases, we call this a continuous variable.
> contin.data <- c(198.41, 178.46, 165.20, 141.71, 138.77)
Univariate data of a continuous variable is represented, as seen in the preceding statement, as a vector of numeric type. These data are the stock prices of a hypothetical company that offers a hypothetical commercial statistics platform inferior to R.
You might come to the conclusion that if a vector contains character types, it is a categorical variable, and if it contains numeric types, it is a continuous variable. Not quite! Consider the case of data that contains the results of the roll of a six-sided die. A natural approach to storing this would be by using a numeric vector. However, this isn't a continuous variable, because each result can only take on six distinct values: 1, 2, 3, 4, 5, and 6. This is a discrete numeric variable. Other discrete numeric variables can be the number of bacteria in a petri dish, or the number of love letters to cute classmates.
The mark of a continuous variable is that it could take on any value between some theoretical minimum and maximum. The range of values in case of a die roll have a minimum of 1 and a maximum of 6, but it can never be 2.3. Contrast this with, say, the example of the stock prices, which could be zero, zillions, or anything in between.
On occasion, we are unable to neatly classify non-categorical data as either continuous or discrete. In some cases, discrete variables may be treated as if there is an underlying continuum. Additionally, continuous variables can be discretized, as we'll see soon.
- 手機(jī)安全和可信應(yīng)用開發(fā)指南:TrustZone與OP-TEE技術(shù)詳解
- PHP程序設(shè)計(jì)(慕課版)
- Mastering matplotlib
- NativeScript for Angular Mobile Development
- Hands-On C++ Game Animation Programming
- 匯編語言程序設(shè)計(jì)(第3版)
- INSTANT OpenNMS Starter
- INSTANT Django 1.5 Application Development Starter
- Building an RPG with Unity 2018
- OpenStack Orchestration
- 劍指大數(shù)據(jù):企業(yè)級(jí)數(shù)據(jù)倉庫項(xiàng)目實(shí)戰(zhàn)(在線教育版)
- 微服務(wù)架構(gòu)深度解析:原理、實(shí)踐與進(jìn)階
- Hands-On Neural Network Programming with C#
- Advanced UFT 12 for Test Engineers Cookbook
- JavaScript悟道