官术网_书友最值得收藏!

There's more...

Although it's impossible to discuss all the variations of output coming from sequencer files, paired-end reads are worth mentioning because they are common and require a different processing approach. With paired-end sequencing, both ends of a DNA fragment are sequenced with a gap in the middle (called the insert). In this case, two files will be produced: X_1.FASTQ and X_2.FASTQ. Both files will have the same order and exact same number of sequences. The first sequence will be in X_1 pairs with the first sequence of X_2, and so on. With regards to the programming technique, if you want to keep the pairing information, you might perform something like this:

f1 = gzip.open('X_1.filt.fastq.gz', 'rt, enconding='utf-8')
f2 = gzip.open('X_2.filt.fastq.gz', 'rt, enconding='utf-8')
recs1 = SeqIO.parse(f1, 'fastq')
recs2 = SeqIO.parse(f2, 'fastq')
cnt = 0
for rec1, rec2 in zip(recs1, recs2):
cnt +=1
print('Number of pairs: %d' % cnt)

The preceding code reads all pairs in order and just counts the number of pairs. You will probably want to do something more, but this exposes a dialect that is based on the Python zip function that allows you to iterate through both files simultaneously. Remember to replace X for your FASTQ prefix.

Note that the preceding code will most probably crash Python 2 as the  zip function is eager in Python 2, (that is, it will read all records before needing them). Indeed, the lazy behavior of iterators in Python 3 is one of the many features that makes it more suitable for big data analysis. If you really need to use Python 2, then consider the itertools module, which provides lazy implementations of common iterators.

Finally, if you are sequencing human genomes, you may want to use sequencing data from Complete Genomics. In this case, read the There's more section in the next recipe, where we briefly discuss Complete Genomics data.

主站蜘蛛池模板: 越西县| 五家渠市| 东平县| 盘锦市| 祁东县| 普格县| 渝中区| 思茅市| 甘谷县| 屏东县| 浏阳市| 高邑县| 西丰县| 马公市| 镇沅| 睢宁县| 喀喇| 齐齐哈尔市| 新巴尔虎右旗| 凤台县| 苍梧县| 巴塘县| 神木县| 辉南县| 罗平县| 于田县| 兴海县| 湖南省| 大庆市| 灵石县| 清远市| 康平县| 科技| 云林县| 都匀市| 广灵县| 本溪市| 禄丰县| 阿克苏市| 凌源市| 南昌市|