- Python Data Visualization Cookbook(Second Edition)
- Igor Milovanovi? Dimitry Foures Giuseppe Vettigli
- 423字
- 2021-07-30 10:05:50
Importing data from tab-delimited files
Another very common format of flat datafile is the tab-delimited file. This can also come from an Excel export but can be the output of some custom software we must get our input from.
The good thing is that usually this format can be read in almost the same way as CSV files as the Python module csv
supports the so-called dialects that enable us to use the same principles to read variations of similar file formats, one of them being the tab- delimited format.
Getting ready
Now you're already able to read CSV files. If not, please refer to the Importing data from CSV recipe first.
How to do it...
We will reuse the code from the Importing data from CSV recipe, where all we need to change is the dialect we are using as shown in the following code:
import csv filename = 'ch02-data.tab' data = [] try: with open(filename) as f: reader = csv.reader(f, dialect=csv.excel_tab) header = reader.next() data = [row for row in reader] except csv.Error as e: print "Error reading CSV file at line %s: %s" % (reader.line_num, e) sys.exit(-1) if header: print header print '===================' for datarow in data: print datarow
How it works...
The dialect-based approach is very similar to what we already did in the Importing data from CSV recipe, except for the line where we instantiate the csv
reader object, giving it the parameter dialect
and specifying the excel_tab
dialect that we want.
There's more...
A CSV-based approach will not work if the data is "dirty", that is, if there are certain lines not ending with just a new line character but have additional \t
(Tab) markers. So we need to clean special lines separately before splitting them. The sample "dirty" tab-delimited file can be found in ch02-data-dirty.tab
. The following code sample cleans data as it reads it:
datafile = 'ch02-data-dirty.tab' with open(datafile, 'r') as f: for line in f: # remove next comment to see line before cleanup # print 'DIRTY: ', line.split('\t') # we remove any space in line start or end line = line.strip() # now we split the line by tab delimiter print line.split('\t')
We also see that there is another approach to do this—using the split('\t')
function.
The advantage of using the csv
module approach over split()
is that we can reuse the same code for reading by just changing the dialect and detecting it with the file extension (.csv
and .tab
) or some other method (for example, using the csv.Sniffer
class).
- JavaScript前端開發模塊化教程
- Node.js Design Patterns
- INSTANT OpenCV Starter
- 測試驅動開發:入門、實戰與進階
- 數據結構(Java語言描述)
- 64位匯編語言的編程藝術
- Python高級機器學習
- 飛槳PaddlePaddle深度學習實戰
- 深入RabbitMQ
- jQuery炫酷應用實例集錦
- Instant PHP Web Scraping
- Scratch趣味編程:陪孩子像搭積木一樣學編程
- Training Systems Using Python Statistical Modeling
- PostgreSQL Developer's Guide
- 第五空間戰略:大國間的網絡博弈