- Bioinformatics with Python Cookbook
- Tiago Antao
- 236字
- 2021-06-10 19:01:47
There's more...
Although we will discuss data filtering in the Studying genome accessibility and filtering SNP data recipe in this chapter, it's not our objective to explain the SAM format in detail or give a detailed course in data filtering. This task will require a book of its own, but with the basics of pysam, you can navigate through SAM/BAM files. However, in the last recipe of this chapter, we will take a look at extracting genome-wide metrics from BAM files (via annotations on VCF files that represent metrics of BAM files) for the purpose of understanding the overall quality of our dataset.
You will probably have very large data files to work with. It's possible that some BAM processing will take too much time. One of the first approaches to reduce the computation time is subsampling. For example, if you subsample at 10 percent, you ignore 9 records out of 10. For many tasks, such as some of the analysis done for the quality assessment of BAM files, subsampling at 10 percent (or even 1 percent) will be enough to get the gist of the quality of the file.
If you use human data, you may have your data sequenced at Complete Genomics. In this case, the alignment files will be different. Although Complete Genomics provides tools to convert to standard formats, you might be served better if you use their own data.
- 基于粒計(jì)算模型的圖像處理
- Python王者歸來(lái)
- Julia Cookbook
- Yocto for Raspberry Pi
- 可解釋機(jī)器學(xué)習(xí):模型、方法與實(shí)踐
- Expert Data Visualization
- Java EE核心技術(shù)與應(yīng)用
- 速學(xué)Python:程序設(shè)計(jì)從入門到進(jìn)階
- Java高并發(fā)核心編程(卷1):NIO、Netty、Redis、ZooKeeper
- App Inventor創(chuàng)意趣味編程進(jìn)階
- 大學(xué)計(jì)算機(jī)基礎(chǔ)實(shí)驗(yàn)指導(dǎo)
- JavaWeb從入門到精通(視頻實(shí)戰(zhàn)版)
- Java EE程序設(shè)計(jì)與開發(fā)實(shí)踐教程
- Mastering Magento Theme Design
- HTML 5與CSS 3權(quán)威指南(第4版·上冊(cè))