- Applied Data Visualization with R and ggplot2
- Dr. Tania Moulik
- 508字
- 2021-07-23 16:59:46
Histograms
Histograms are used to group and represent numerical (continuous) variables. For example, you may want to know the distribution of voters' ages in an election. A histogram is often confused with a bar chart; however, a bar chart is more general, and we will cover those later. In a histogram, a continuous variable is grouped into bins of specific sizes and the bins have a range that covers the maximum and minimum of the variable in question.
Histograms can be classified as follows:
- Unimodal: A distribution with a single maximum or mode; for example, a normal distribution:
- A normal distribution (or a bell-shaped curve) is symmetrical. An example is the grade distribution of students in a class. A unimodal distribution may or may not be symmetrical. It can be positively or negatively skewed, as well.
- Positively or negatively skewed (also known as right-skewed or left-skewed): Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, negative, or undefined.
- A left-skewed distribution has a long tail to the left while a right-skewed distribution has a long tail to the right. An example of a right-skewed distribution might be the US household income, with a long tail of higher-income groups.
- Bimodal: Bimodal distribution resembles the back of a two-humped camel. It shows the outcomes of two processes, with different distributions that are combined into one set of data. For example, you might expect to see a bimodal distribution in the height distribution of an entire population. There would be a peak around the average height of a man, and a peak around the typical height of a woman.
- Unitary distribution: This distribution follows a uniform pattern that has approximately the same number of values in each group. In the real world, one can only find approximately uniform distributions. An example is the speed of a car versus time if moving at constant speed (zero acceleration), or the uniform distribution of heat in a microwave:
Let's take a look at another image:
It's important to study the shapes of distributions, as they can reveal a lot about the nature of data. We will see some real-world examples of histograms in the datasets that we will explore.
You can read more about the shapes of histograms at https://www.moresteam.com/toolbox/histogram.cfm
and https://www.siyavula.com/read/maths/grade-11/statistics/11-statistics-05.
Find out more about normal distributions at http://onlinestatbook.com/2/normal_distribution/history_normal.html.
You will find more real-world examples at
https://stats.stackexchange.com/questions/33776/real-life-examples-of-common-distributions.
We discussed the different kinds of geometric objects that we will be working on, and we created our fist plot using two different methods (qplot and hist). Now, let's use another command: ggplot. We will use the humidity data that we loaded previously.
As seen in the preceding section, we can create a default histogram by using one of the commands in the base R package: hist. This is seen in the following command:
hist(df_hum$Vancouver)
The default histogram that will be created is as follows:
- Python Artificial Intelligence Projects for Beginners
- 四向穿梭式自動化密集倉儲系統的設計與控制
- Storm應用實踐:實時事務處理之策略
- Docker on Amazon Web Services
- Word 2007,Excel 2007辦公應用融會貫通
- 軟件工程及實踐
- 單片機技能與實訓
- Unity Multiplayer Games
- Hands-On Dashboard Development with QlikView
- Creating ELearning Games with Unity
- Red Hat Enterprise Linux 5.0服務器構建與故障排除
- Embedded Linux Development using Yocto Projects(Second Edition)
- Windows 7來了
- Raspberry Pi 3 Projects for Java Programmers
- Microsoft 365 Mobility and Security:Exam Guide MS-101