Question

Which Alphabet Type Should I Use With Fasta Files In Biopython?

0

Entering edit mode

12.3 years ago

sameer ▴ 10

If I'm using the FASTA files from the link below, what Alphabet type should I use in Biopython? Would it be IUPAC.unambiguous_dna?

link to FASTA files: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/?C=S;O=A

fasta biopython • 3.0k views

ADD COMMENT • link updated 10.9 years ago by Biostar 20 • written 12.3 years ago by sameer ▴ 10

score 1 · Answer 1 · 2013-03-18

This is a duplicate of your question to the Biopython mailing list: http://lists.open-bio.org/pipermail/biopython/2013-March/008415.html

My answer is here: http://lists.open-bio.org/pipermail/biopython/2013-March/008416.html

Essentially I would use genericdna rather than unambiguousdna for now. The IUPAC alphabet object has a white list of expected letters, but current versions of Biopython do not enforce this in the sequence objects. That may change.

from Bio.Alphabet import generic_dna
from Bio.Alphabet.IUPAC import unambiguous_dna

You don't actually need to specify an alphabet at all, but telling Biopython the sequence is DNA will prevent some user errors.