官术网_书友最值得收藏!

Population scale clustering and geographic ethnicity

Next-generation genome sequencing (NGS) reduces overhead and time for genomic sequencing, leading to big data production in an unprecedented way. In contrast, analyzing this large-scale data is computationally expensive and increasingly becomes the key bottleneck. This increase in NGS data in terms of number of samples overall and features per sample demands solutions for massively parallel data processing, which imposes extraordinary challenges on machine learning solutions and bioinformatics approaches. The use of genomic information in medical practice requires efficient analytical methodologies to cope with data from thousands of individuals and millions of their variants.

One of the most important tasks is the analysis of genomic profiles to attribute individuals to specific ethnic populations, or the analysis of nucleotide haplotypes for disease susceptibility. The data from the 1000 Genomes project serves as the prime source to analyze genome-wide single nucleotide polymorphisms (SNPs) at scale for the prediction of the individual's ancestry with regards to continental and regional origins.

主站蜘蛛池模板: 青田县| 哈密市| 铜梁县| 偃师市| 沛县| 枣强县| 东乡族自治县| 赤水市| 临洮县| 德江县| 定陶县| 文昌市| 龙口市| 五大连池市| 天津市| 阳信县| 清水河县| 綦江县| 兴化市| 都兰县| 渝中区| 永泰县| 蒙自县| 永和县| 洪泽县| 罗甸县| 敖汉旗| 安远县| 乐山市| 武宣县| 龙游县| 壤塘县| 定襄县| 灵山县| 福泉市| 双桥区| 大竹县| 菏泽市| 浦城县| 辉县市| 永新县|