官术网_书友最值得收藏!

Getting ready

As discussed in the previous recipe, we will use data from the 1,000 Genomes Project. We will use the exome alignment for chromosome 20 of female NA18489. This is just 312 MB. The whole exome alignment for this individual is 14.2 GB, and the whole genome alignment (at a low coverage of 4x) is 40.1 GB. This data is a paired-end with reads of 76 bp. This is common nowadays, but slightly more complex to process. We will take this into account. If your data is not paired, just simplify the following recipe appropriately.

As usual, if you use Notebook, the cell at the top of Chapter02/Working_with_BAM.ipynb will download the data for you. If you don't use Notebooks, get the data from our dataset list at https://github.com/PacktPublishing/Bioinformatics-with-Python-Cookbook-Second-Edition/blob/master/Datasets.ipynb. The files you will want are NA18490_20_exome.bam and NA18490_20_exome.bam.bai.

We will use pysam, a Python wrapper to the SAMtools C API. This was installed in Chapter 1, Python and the Surrounding Software Ecology.

主站蜘蛛池模板: 舟曲县| 上高县| 溧阳市| 柘荣县| 彭阳县| 开化县| 保康县| 新津县| 贵州省| 搜索| 沂水县| 喜德县| 黔东| 呼伦贝尔市| 钦州市| 桐乡市| 晋中市| 嘉峪关市| 正安县| 郁南县| 金山区| 新平| 湛江市| 鹤岗市| 阳谷县| 津南区| 中山市| 凤阳县| 米林县| 济宁市| 桂东县| 阳城县| 壤塘县| 营山县| 连江县| 蒲城县| 都昌县| 礼泉县| 民和| 汉源县| 阿瓦提县|