官术网_书友最值得收藏!

There's more...

There are many more databases at NCBI. You will probably want to check the Sequence Read Archive (SRA) database (previously known as Short Read Archive) if you are working with NGS data. The SNP database contains information on single-nucleotide polymorphisms (SNPs), whereas the protein database has protein sequences, and so on. A full list of databases in Entrez is linked in the See also section of this recipe.

Another database that you probably already know about with regard to NCBI is PubMed, which includes a list of scientific and medical citations, abstracts, and even full texts. You can also access it via Biopython. Furthermore, GenBank records often contain links to PubMed. For example, we can perform this on our previous record, as shown here:

from Bio import Medline
refs = rec.annotations['references']
for ref in refs:
if ref.pubmed_id != '':
print(ref.pubmed_id)
handle = Entrez.efetch(db='pubmed', id=[ref.pubmed_id], rettype='medline', retmode='text')
records = Medline.parse(handle)
for med_rec in records:
for k, v in med_rec.items():
print('%s: %s' % (k, v))

This will take all reference annotations, check whether they have a PubMed identifier, and then access the PubMed database to retrieve the records, parse them, and then print them.

The output per record is a Python dictionary. Note that there are many references to external databases on a typical GenBank record.

Of course, there are many other biological databases outside NCBI, such as Ensembl (http://www.ensembl.org) and UCSC Genome Bioinformatics (http://genome.ucsc.edu/). The support for many of these databases in Python will vary a lot.

An introductory recipe on biological databases would not be complete without at least a passing reference to BLAST. Basic local alignment search tool (BLAST) is an algorithm that assesses the similarity of sequences. NCBI provides a service that allows you to compare your sequence of interest against its own database. Of course, you can use have your local BLAST database instead of using NCBI's service. Biopython provides extensive support for this, but as this is too introductory, I will just refer you to the Biopython tutorial.

主站蜘蛛池模板: 延边| 湖口县| 申扎县| 包头市| 襄樊市| 井冈山市| 永宁县| 葫芦岛市| 永修县| 繁昌县| 绥阳县| 晋中市| 十堰市| 登封市| 鞍山市| 新平| 隆安县| 文水县| 德兴市| 兴义市| 板桥市| 遂昌县| 行唐县| 绥芬河市| 连江县| 鹤峰县| 巴塘县| 班玛县| 佛坪县| 龙井市| 遂平县| 汾阳市| 定日县| 临夏市| 石渠县| 华容县| 甘肃省| 濉溪县| 贵南县| 襄樊市| 桦南县|