官术网_书友最值得收藏!

There's more...

The purpose of this recipe is to get you up to speed with the PyVCF module. At this stage, you should be comfortable with the API. We will not spend too much time on usage details because this will be the main purpose of the next recipe: using the VCF module to study the quality of your variant calls.

It will probably not be a shocking revelation that PyVCF is not the fastest module on earth. The file format (highly text-based) makes processing a time-consuming task. There are two main strategies for dealing with this problem. One strategy is parallel processing, which we will discuss in the last chapter, Chapter 9, Python for Big Genomics Datasets. The second strategy is to convert to a more efficient format; we will provide an example of this in Chapter 4, Population Genetics. Note that VCF developers are working on a binary (BCF) version to deal with parts of these problems (http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2).

主站蜘蛛池模板: 伊通| 班戈县| 从化市| 虎林市| 南丹县| 清新县| 即墨市| 寻乌县| 蒙阴县| 景洪市| 云和县| 绥棱县| 桑植县| 高尔夫| 英吉沙县| 营口市| 平安县| 桐梓县| 海阳市| 临漳县| 淄博市| 文山县| 兴化市| 武城县| 绿春县| 惠州市| 温泉县| 光泽县| 镇宁| 哈尔滨市| 茂名市| 高尔夫| 北票市| 宣恩县| 旺苍县| 左贡县| 曲麻莱县| 灵寿县| 哈尔滨市| 平阳县| 广宗县|