- Hands-On Data Science with SQL Server 2017
- Marek Chmel Vladimír Mu?n?
- 509字
- 2021-06-10 19:13:53
Visualizing the types of data
Visualizing and communicating data is incredibly important, especially with young companies that are making data-driven decisions for the first time, or companies where data scientists are viewed as people who help others make data-driven decisions. When it comes to communicating, this means describing your findings, or the way techniques work to audiences, both technical and non-technical. Different types of data have different ways of representation. When we talk about the categorical values, the ideal representation visuals would be these:
- Bar charts
- Pie charts
- Pareto diagrams
- Frequency distribution tables
A bar chart would visually represent the values stored in the frequency distribution tables. Each bar would represent one categorical value. A bar chart is also a base line for a pareto diagram, which includes the relative and cumulative frequency for the categorical values:
If we'll add the cumulative frequency to the bar chart, we will have a pareto diagram of the same data:
Another very useful type of visualization for categorical data is the pie chart. Pie charts display the percentage of the total for each categorical value. In statistics, this is called the relative frequency. The relative frequency is the percentage of the total frequency of each category. This type of visual is commonly used for market-share representations:
For numeric data, the ideal start would be a frequency distribution table, which will contain ordered or unordered values. Numeric data is very frequently displayed with histograms or scatter plots. When using intervals, the rule of thumb is to use 5 to 20 intervals, to have a meaningful representation of the data.
Let's create a table with 20 discrete data points, which we'll display visually. To create the table, we can use the following T-SQL script:
CREATE TABLE [dbo].[dataset](
[datapoint] [int] NOT NULL
) ON [PRIMARY]
To insert new values into the table, let's use the script:
INSERT [dbo].[dataset] ([datapoint]) VALUES (7)
INSERT [dbo].[dataset] ([datapoint]) VALUES (28)
INSERT [dbo].[dataset] ([datapoint]) VALUES (50)
etc. with more values to have 20 values in total
The table will include numbers in the range of 0 to 300, and the content of the table can be retrieved with this:
SELECT * FROM [dbo].[dataset]
ORDER BY datapoint
To visualize a descrete values dataset, we'll need to build a histogram. The histogram will have six intervals, and the interval length can be calculated as a (largest value ? smallest value) / number of intervals. When we build the frequency distribution table and the intervals for the histogram, we'll end up with the following results:
A histogram based on the absolute frequency of the discrete values will look such as this one:
- OpenStack for Architects
- Dreamweaver 8中文版商業(yè)案例精粹
- 智能工業(yè)報(bào)警系統(tǒng)
- 運(yùn)動(dòng)控制器與交流伺服系統(tǒng)的調(diào)試和應(yīng)用
- Splunk Operational Intelligence Cookbook
- Kubernetes for Developers
- 精通LabVIEW程序設(shè)計(jì)
- 基于敏捷開發(fā)的數(shù)據(jù)結(jié)構(gòu)研究
- C#求職寶典
- Puppet 3 Beginner’s Guide
- 常用傳感器技術(shù)及應(yīng)用(第2版)
- Learn Microsoft Azure
- FreeCAD [How-to]
- Practical Network Automation
- 人工智能:重塑個(gè)人、商業(yè)與社會(huì)