書名： Bioinformatics with Python Cookbook
作者名： Tiago Antao
本章字數： 187字
更新時間： 2021-06-10 19:01:49

Processing NGS data with HTSeq

HTSeq (https://htseq.readthedocs.io) is an alternative library that's used for processing NGS data. Most of the functionality made available by HTSeq is actually available in other libraries covered in this book, but you should be aware of it as an alternative way of processing NGS data. HTSeq supports, among others, FASTA, FASTQ, SAM (via pysam), VCF, GFF, and Browser Extensible Data (BED) file formats. It also includes a set of abstractions for processing (mapped) genomic data, encompassing concepts like genomic positions and intervals or alignments. A complete examination of the features of this library is beyond our scope, so we will concentrate on a small subset of features. We will take this opportunity to also introduce the BED file format.

The BED format allows for the specification of features for annotations tracks. It has many uses, but it's common to load BED files into genome browsers to visualize features. Each line includes information about at least the position (chromosome, start and end) and also optional fields such as name or strand. Full details about the format can be found at https://genome.ucsc.edu/FAQ/FAQformat.html#format1.

官术网_书友最值得收藏!

Bioinformatics with Python Cookbook

Processing NGS data with HTSeq