官术网_书友最值得收藏!

Sanitizing and getting the data

For this example, we'll download data from https://www.ssa.gov/oact/babynames/limits.html. This site provides data for all the baby names in the US since 1880. On this page, you can find national data and state-specific data. For this example, download the national data dataset. Once you've downloaded it, you can extract it, and you'll see data for a lot of different years:

$ ls -1 
NationalReadMe.pdf
yob1880.txt
yob1881.txt
yob1882.txt
yob1883.txt
yob1884.txt
yob1885.txt
...
yob2013.txt
yob2014.txt
yob2015.txt

As you can see, we have data from 1880 until 2015. For this example, I've used the data from 2015, but you can use pretty much anything you want. Now let's look a bit closer at the data:

$ cat yob2015.txt 
Emma,F,20355
Olivia,F,19553
Sophia,F,17327
Ava,F,16286
Isabella,F,15504
Mia,F,14820
Abigail,F,12311
Emily,F,11727
Charlotte,F,11332
Harper,F,10241
...
Zynique,F,5
Zyrielle,F,5
Noah,M,19511
Liam,M,18281
Mason,M,16535
Jacob,M,15816
William,M,15809
Ethan,M,14991
James,M,14705
Alexander,M,14460
Michael,M,14321
Benjamin,M,13608
Elijah,M,13511
Daniel,M,13408

In this data, we've got a large number of rows where each row shows the name and the sex (M or F). First, all the girls' names are shown, and after that all the boys' names are shown. The data in itself already looks pretty usable, so we don't need to do much processing before we can use it. The only thing, though, we do is add a header to this file, so that it looks like this:

name,sex,amount 
Emma,F,20355
Olivia,F,19553
Sophia,F,17327
Ava,F,16286

This will make parsing this data into D3 a little bit easier, since the default way of parsing CSV data with D3 assumes the first line is a header. The sanitized data we use in this example can be found here: <DVD3>/src/chapter-01/data/yob2015.txt.

主站蜘蛛池模板: 平遥县| 前郭尔| 青神县| 兴安县| 松原市| 宕昌县| 罗城| 浦江县| 台前县| 天柱县| 庄河市| 安溪县| 文昌市| 邻水| 望谟县| 广河县| 黄石市| 宜州市| 徐闻县| 台山市| 鹤壁市| 寻乌县| 霍邱县| 凌源市| 彭水| 道真| 余庆县| 哈巴河县| 拜城县| 红安县| 松原市| 恩施市| 济南市| 托克逊县| 石棉县| 绥芬河市| 甘德县| 乌拉特中旗| 庆安县| 加查县| 高碑店市|