官术网_书友最值得收藏!

ADAM for large-scale genomics data processing

Analyzing DNA and RNA sequencing data requires large-scale data processing to interpret the data according to its context. Excellent tools and solutions have been developed at academic labs, but often fall short on scalability and interoperability. By this means, ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark and Parquet.

However, large-scale data processing solutions such as ADAM-Spark can be applied directly to the output data from a sequencing pipeline, that is, after quality control, mapping, read preprocessing, and variant quantification using single sample data. Some examples are DNA variants for DNA sequencing, read counts for RNA sequencing, and so on.

See more at http://bdgenomics.org/ and the related publication: Massie, Matt and Nothaft, Frank et al., ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing, UCB/EECS-2013-207, EECS Department, University of California, Berkeley.

In our study, ADAM is used to achieve the scalable genomics data analytics platform with support for the VCF file format so that we can transform genotype-based RDD into a Spark DataFrame.

主站蜘蛛池模板: 云南省| 博湖县| 曲水县| 甘肃省| 阿克| 五寨县| 高平市| 台东县| 梅州市| 辉南县| 木里| 阳谷县| 彰武县| 沙湾县| 上高县| 通海县| 保定市| 淮南市| 汉阴县| 满洲里市| 河源市| 弋阳县| 舟山市| 平原县| 伊宁县| 中牟县| 微山县| 江门市| 惠东县| 绥德县| 新田县| 馆陶县| 南郑县| 安宁市| 阿拉善右旗| 图们市| 湘西| 孝昌县| 曲沃县| 金昌市| 抚远县|