官术网_书友最值得收藏!

Getting ready

As discussed in the previous recipe, we will use data from the 1,000 Genomes Project. We will use the exome alignment for chromosome 20 of female NA18489. This is just 312 MB. The whole exome alignment for this individual is 14.2 GB, and the whole genome alignment (at a low coverage of 4x) is 40.1 GB. This data is a paired-end with reads of 76 bp. This is common nowadays, but slightly more complex to process. We will take this into account. If your data is not paired, just simplify the following recipe appropriately.

As usual, if you use Notebook, the cell at the top of Chapter02/Working_with_BAM.ipynb will download the data for you. If you don't use Notebooks, get the data from our dataset list at https://github.com/PacktPublishing/Bioinformatics-with-Python-Cookbook-Second-Edition/blob/master/Datasets.ipynb. The files you will want are NA18490_20_exome.bam and NA18490_20_exome.bam.bai.

We will use pysam, a Python wrapper to the SAMtools C API. This was installed in Chapter 1, Python and the Surrounding Software Ecology.

主站蜘蛛池模板: 双柏县| 保山市| 曲水县| 东阳市| 铜山县| 师宗县| 静宁县| 富蕴县| 马山县| 宜宾县| 前郭尔| 肃南| 北流市| 定陶县| 凌源市| 韶关市| 精河县| 壤塘县| 江西省| 凌海市| 株洲市| 镇江市| 株洲市| 湄潭县| 栾城县| 齐齐哈尔市| 宣威市| 隆林| 会泽县| 正阳县| 正宁县| 历史| 衡水市| 贵德县| 上饶县| 临夏市| 根河市| 牡丹江市| 犍为县| 卓尼县| 保德县|