官术网_书友最值得收藏!

There's more...

Although it's impossible to discuss all the variations of output coming from sequencer files, paired-end reads are worth mentioning because they are common and require a different processing approach. With paired-end sequencing, both ends of a DNA fragment are sequenced with a gap in the middle (called the insert). In this case, two files will be produced: X_1.FASTQ and X_2.FASTQ. Both files will have the same order and exact same number of sequences. The first sequence will be in X_1 pairs with the first sequence of X_2, and so on. With regards to the programming technique, if you want to keep the pairing information, you might perform something like this:

f1 = gzip.open('X_1.filt.fastq.gz', 'rt, enconding='utf-8')
f2 = gzip.open('X_2.filt.fastq.gz', 'rt, enconding='utf-8')
recs1 = SeqIO.parse(f1, 'fastq')
recs2 = SeqIO.parse(f2, 'fastq')
cnt = 0
for rec1, rec2 in zip(recs1, recs2):
cnt +=1
print('Number of pairs: %d' % cnt)

The preceding code reads all pairs in order and just counts the number of pairs. You will probably want to do something more, but this exposes a dialect that is based on the Python zip function that allows you to iterate through both files simultaneously. Remember to replace X for your FASTQ prefix.

Note that the preceding code will most probably crash Python 2 as the  zip function is eager in Python 2, (that is, it will read all records before needing them). Indeed, the lazy behavior of iterators in Python 3 is one of the many features that makes it more suitable for big data analysis. If you really need to use Python 2, then consider the itertools module, which provides lazy implementations of common iterators.

Finally, if you are sequencing human genomes, you may want to use sequencing data from Complete Genomics. In this case, read the There's more section in the next recipe, where we briefly discuss Complete Genomics data.

主站蜘蛛池模板: 彭水| 吉隆县| 邯郸县| 依安县| 清徐县| 普安县| 武胜县| 特克斯县| 互助| 买车| 沧州市| 武威市| 汕尾市| 天等县| 长乐市| 屯门区| 仙游县| 凤阳县| 从江县| 泸西县| 永新县| 织金县| 家居| 农安县| 胶南市| 芒康县| 宁河县| 宁津县| 平舆县| 双流县| 富顺县| 樟树市| 滦南县| 洛扎县| 巴南区| 江口县| 衡水市| 珲春市| 谷城县| 宝丰县| 凤台县|