官术网_书友最值得收藏!

How to do it...

Let's take a look at the following steps:

  1. As our sequence of interest is available in a Biopython sequence object, let's start by saving it to a FASTA file on our local disk:
from Bio import SeqIO
w_hdl = open('example.fasta', 'w')
w_seq = seq[11:5795]
SeqIO.write([w_seq], w_hdl, 'fasta')
w_hdl.close()

The SeqIO.write function takes a list of sequences to write (not just a single one). Be careful with this idiom. If you want to write many sequences (and you could easily write millions with NGS), do not use a list (as shown in the preceding code), because this will allocate massive amounts of memory. Either use an iterator, or use the SeqIO.write function several times with a subset of the sequence on each write.

  1. In most situations, you will actually have the sequence on the disk, so you will be interested in reading it:
recs = SeqIO.parse('example.fasta', 'fasta')
for rec in recs:
seq = rec.seq
print(rec.description)
print(seq[:10])
print(seq.alphabet)

Here, we are concerned with processing a single sequence, but FASTA files can contain multiple records. The Python idiom to perform this is quite easy. To read a FASTA file, you just use standard iteration techniques, as shown in the following code. For our example, the preceding code will print the following:

 NM_002299.3 Homo sapiens lactase (LCT), mRNA
ATGGAGCTGT
SingleLetterAlphabet()

Note that we printed seq[:10]. The sequence object can use typical array slices to get part of a sequence.

  1. We will now change the alphabet of our sequence:
from Bio import Seq
from Bio.Alphabet import IUPAC
seq = Seq.Seq(str(seq), IUPAC.unambiguous_dna)

Probably the biggest value of the sequence object (compared to a simple string) comes from the alphabet information. The sequence object will be able to impose useful constraints and operations on the underlying string, based on the expected alphabet. The original alphabet in the FASTA file is not very informative, but in this case, we know that we have a DNA alphabet. Therefore, we will create a new sequence with a more informative alphabet.

  1. As we now have an unambiguous DNA, we can transcribe it as follows:
rna = seq.transcribe()
print(rna)

Note that the seq constructor takes a string, not a sequence. You will see that the alphabet of the rna variable is now IUPACUnambigousRNA.

  1. Finally, we can translate our gene into a protein:
prot = seq.translate()
print(prot)

Now, we have a protein alphabet with the annotation that there is a stop codon (this means that our protein is complete).

主站蜘蛛池模板: 长岭县| 岳池县| 崇左市| 曲松县| 井冈山市| 鱼台县| 清水河县| 石嘴山市| 邢台市| 贵德县| 锡林郭勒盟| 青岛市| 台南县| 郓城县| 丘北县| 吴桥县| 新竹县| 永川市| 平陆县| 乌鲁木齐县| 黄平县| 新田县| 治多县| 卢氏县| 康定县| 尼木县| 巴彦淖尔市| 达州市| 贵阳市| 天水市| 昌平区| 亚东县| 汉寿县| 大渡口区| 岳阳市| 永春县| 扎鲁特旗| 博乐市| 得荣县| 枞阳县| 太原市|