官术网_书友最值得收藏!

Population scale clustering and geographic ethnicity

Next-generation genome sequencing (NGS) reduces overhead and time for genomic sequencing, leading to big data production in an unprecedented way. In contrast, analyzing this large-scale data is computationally expensive and increasingly becomes the key bottleneck. This increase in NGS data in terms of number of samples overall and features per sample demands solutions for massively parallel data processing, which imposes extraordinary challenges on machine learning solutions and bioinformatics approaches. The use of genomic information in medical practice requires efficient analytical methodologies to cope with data from thousands of individuals and millions of their variants.

One of the most important tasks is the analysis of genomic profiles to attribute individuals to specific ethnic populations, or the analysis of nucleotide haplotypes for disease susceptibility. The data from the 1000 Genomes project serves as the prime source to analyze genome-wide single nucleotide polymorphisms (SNPs) at scale for the prediction of the individual's ancestry with regards to continental and regional origins.

主站蜘蛛池模板: 洛宁县| 高尔夫| 兴化市| 葫芦岛市| 太原市| 东源县| 昌都县| 黄骅市| 商水县| 南雄市| 尤溪县| 灵宝市| 菏泽市| 尉氏县| 宜黄县| 邹城市| 甘孜县| 湘潭市| 光泽县| 婺源县| 玉田县| 吐鲁番市| 乌兰县| 罗定市| 铁岭市| 丹阳市| 澄城县| 永顺县| 肇源县| 桑日县| 旌德县| 南投市| 宜黄县| 开阳县| 北票市| 四会市| 乌审旗| 盐边县| 永年县| 射阳县| 卢湾区|