官术网_书友最值得收藏!

Reading delimited data and altered output format

Every day, we open many files in many different formats. However, when thinking about large amounts of data, it is always a good practice to use standard formats. One of these is called Comma Separated Values, or CSVs, and it uses a comma (,) to separate elements or delimit on each row. This is particularly useful when you have large amounts of data or records, and that data will be used in a scripted fashion. For example, in every school semester, Bob, the system administrator, needs to create a series of new users and set their information. Bob also gets a standardized CSV (like in the following snippet) from the people in charge of attendance:

Rbrash,Ron,Brash,01/31/88,+11234567890,rbrash@acme.com,FakePassword9000
...

If Bob the administrator wishes to only read this information into an array and create users, it is relatively trivial for him to parse a CSV and create each record in one single scripted action. This allows Bob to focus his time and effort on other important issues such as troubleshooting end-user WiFi issues.

While this is a trivial example, these same files may be in different forms with delimiters (the , or $ sign, for example), different data, and different structures. However, each file works on the premise that each line is a record that needs to be read into some structure (whatever it may be) in SQL, Bash arrays, and so on:

Line1Itself: Header (optional and might not be present)
Line2ItselfIsOneREc:RecordDataWithDelimiters:endline (windows \r\n, in Linux \n)
....

In the preceding example of a pseudo CSV, there is a header, which may be optional (not present), and then several lines (each being a record). Now, for Bob to parse the CSV, he has many ways to do this, but he may use specialized functions that apply a strategy such as:

$ Loop through each item until done
for each line in CSV:
# Do something with the data such as create a user
# Loop through Next item if it exists

To read in the data, Bob or yourself may resort to using:

  • For loops and arrays
  • A form of iterator
  • Manually walking through each line (not efficient)

Once any input data has been read in, the next step is to do something with the data itself. Is it to be transformed? Is it to be used immediately? Sanitized? Stored? Or converted to another format? Just like Bob, there are many things that can be performed using the data read in by the script.

In regards to outputting the data, we can also convert it to XML, JSON, or even insert it into a database as SQL. Unfortunately, this process requires being able to know at least two things: the format of the input data and the format of the output data.

Knowing common data formats and how they often have validation applied can be a great asset when building automated scripts and identifying any changes in the future. Enforcement of data validation also has several benefits and can help save the day when all of a sudden the script breaks without warning!

This recipe aims at walking you through reading a trivial CSV and outputting the data into some arbitrary formats.

主站蜘蛛池模板: 长顺县| 资阳市| 筠连县| 汉寿县| 佛学| 荣昌县| 中卫市| 灌南县| 南乐县| 舞钢市| 巫山县| 合川市| 天峨县| 兴宁市| 大安市| 淮南市| 葫芦岛市| 土默特左旗| 禹州市| 松溪县| 朔州市| 兴城市| 武乡县| 蒙自县| 金寨县| 曲阳县| 梅州市| 江阴市| 金平| 侯马市| 吉木萨尔县| 鄂伦春自治旗| 黄骅市| 东方市| 樟树市| 平武县| 武安市| 渝中区| 普兰店市| 仁怀市| 西城区|