官术网_书友最值得收藏!

Sanitizing and getting the data

For this example, we'll download data from https://www.ssa.gov/oact/babynames/limits.html. This site provides data for all the baby names in the US since 1880. On this page, you can find national data and state-specific data. For this example, download the national data dataset. Once you've downloaded it, you can extract it, and you'll see data for a lot of different years:

$ ls -1 
NationalReadMe.pdf
yob1880.txt
yob1881.txt
yob1882.txt
yob1883.txt
yob1884.txt
yob1885.txt
...
yob2013.txt
yob2014.txt
yob2015.txt

As you can see, we have data from 1880 until 2015. For this example, I've used the data from 2015, but you can use pretty much anything you want. Now let's look a bit closer at the data:

$ cat yob2015.txt 
Emma,F,20355
Olivia,F,19553
Sophia,F,17327
Ava,F,16286
Isabella,F,15504
Mia,F,14820
Abigail,F,12311
Emily,F,11727
Charlotte,F,11332
Harper,F,10241
...
Zynique,F,5
Zyrielle,F,5
Noah,M,19511
Liam,M,18281
Mason,M,16535
Jacob,M,15816
William,M,15809
Ethan,M,14991
James,M,14705
Alexander,M,14460
Michael,M,14321
Benjamin,M,13608
Elijah,M,13511
Daniel,M,13408

In this data, we've got a large number of rows where each row shows the name and the sex (M or F). First, all the girls' names are shown, and after that all the boys' names are shown. The data in itself already looks pretty usable, so we don't need to do much processing before we can use it. The only thing, though, we do is add a header to this file, so that it looks like this:

name,sex,amount 
Emma,F,20355
Olivia,F,19553
Sophia,F,17327
Ava,F,16286

This will make parsing this data into D3 a little bit easier, since the default way of parsing CSV data with D3 assumes the first line is a header. The sanitized data we use in this example can be found here: <DVD3>/src/chapter-01/data/yob2015.txt.

主站蜘蛛池模板: 渑池县| 灌南县| 蒙山县| 巢湖市| 建始县| 高要市| 江陵县| 青神县| 韶山市| 茶陵县| 灵璧县| 酉阳| 武乡县| 荃湾区| 达州市| 高密市| 安阳市| 虹口区| 岢岚县| 印江| 叶城县| 绵竹市| 磐安县| 和田市| 修文县| 玛沁县| 哈尔滨市| 绥德县| 永泰县| 桐柏县| 望江县| 科尔| 霍林郭勒市| 太谷县| 社旗县| 福泉市| 万荣县| 鄱阳县| 万山特区| 蒙自县| 图木舒克市|