官术网_书友最值得收藏!

Get and cleanup the data

You can get a CSV file of the data from https://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ASE_2014_00CSA01&prodType=table. Just hit the Download button and click OK. The result is a CSV file that has lots of interesting information in it. If you open it, though, it doesn't really look like an easy-to-use data file.

A single data row looks like this:

00100000US,,United States,,00,Total for all sectors,,001,All firms,001,All firms,00,All firms,003,Equally veteran-/nonveteran-owned,319,Firms with 4 to 5 years in business,2014,12174,11571648,107722,2746052,6.3,15.3,17.8,16.4

So, we'll sanitize the data a bit before we start processing it with D3. There are many different ways you can do this. You can open the file in Excel and select the files you want, you can use some command-line filtering utilities to get the required data, or even write a simple Python or R script to return the data you want. Since we're already working with JavaScript and we've installed Node.js in Chapter 1, Getting Started with D3, let's write a simple script that filters our data. We'll not filter too much, let's just get rid of the data we're not interested in:

  • We're not interested in the data for a specific industry sector, so we start by filtering out all the rows that don't have the value Total for all sectors set to Y.
  • Next, we'll filter out the columns that aren't interesting for us. What we want are the columns that indicate gender, ethnic group, race, veteran status, time in business, and finally, the rows that contain the number of businesses.

We use the following simple Node.js script for that:

var d3 = require('d3'); 
var fs = require('fs');

// read the data
fs.readFile('./ASE_2014_00CSA02.csv', function (err, fileData) {
var rows = d3.csvParse(fileData.toString());

// filter out the sector specific stuff
var allSectors = rows.filter(function (row) {
return row['NAICS.id'] === '00'
});

// remove unused columns, and make nice headers
var mapped = allSectors.map( function(el) {
return {
sex: el['SEX.id'],
sexLabel: el['SEX.display-label'],
ethnicGroup: el['ETH_GROUP.id'],
ethnicGroupLabel: el['ETH_GROUP.display-label'],
raceGroup: el['RACE_GROUP.id'],
raceGroupLabel: el['RACE_GROUP.display-label'],
vetGroup: el['VET_GROUP.id'],
vetGroupLabel: el['VET_GROUP.display-label'],
yearsInBusiness: el['YIBSZFI.id'],
yearsInBusinessLabel: el['YIBSZFI.display-label'],
count: el['FIRMPDEMP']
}
});

fs.writeFile('./businessFiltered.csv',d3.csvFormat(mapped));
});

What happens in this script is that we use the fs.readFile API of Node.js to read the file we downloaded from the filesystem, and then use D3 to parse the CSV file. After parsing, we filter out the elements we don't want, and use map to convert each element to a simple one. Finally, we use the fs.writeFile API call to output the converted data as a CSV again using the d3.csvFormat function. To run this script yourself, navigate to the <DVD3>/src/chapter-02/data/ directory and run the ./cleanBusinesses.js node. The result of this is that now we have a very clean and easy-to-understand CSV to process in our visualization:

sex,sexLabel,ethnicGroup,ethnicGroupLabel,raceGroup,raceGroupLabel, ... 
001,All firms,001,All firms,00,All firms, ...
001,All firms,001,All firms,00,All firms, ...

With this data, we can now very easily select specific groups to visualize by just filtering on the sex, ethnicGroup, raceGroup, and vetGroup properties.

主站蜘蛛池模板: 靖宇县| 平潭县| 翼城县| 凤冈县| 民勤县| 丹巴县| 宽甸| 报价| 洱源县| 清新县| 望谟县| 墨竹工卡县| 石嘴山市| 汕尾市| 枣阳市| 木兰县| 顺义区| 丹巴县| 中阳县| 黄梅县| 贵阳市| 绥滨县| 利津县| 平武县| 德令哈市| 富顺县| 沁源县| 邹城市| 白水县| 滕州市| 秭归县| 双牌县| 南雄市| 泾阳县| 华池县| 德安县| 海阳市| 钟祥市| 抚州市| 宝应县| 高邑县|