If I'm using the FASTA files from the link below, what Alphabet type should I use in Biopython? Would it be IUPAC.unambiguous_dna?
link to FASTA files: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/?C=S;O=A
If I'm using the FASTA files from the link below, what Alphabet type should I use in Biopython? Would it be IUPAC.unambiguous_dna?
link to FASTA files: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/?C=S;O=A
This is a duplicate of your question to the Biopython mailing list: http://lists.open-bio.org/pipermail/biopython/2013-March/008415.html
My answer is here: http://lists.open-bio.org/pipermail/biopython/2013-March/008416.html
Essentially I would use genericdna rather than unambiguousdna for now. The IUPAC alphabet object has a white list of expected letters, but current versions of Biopython do not enforce this in the sequence objects. That may change.
from Bio.Alphabet import generic_dna
from Bio.Alphabet.IUPAC import unambiguous_dna
You don't actually need to specify an alphabet at all, but telling Biopython the sequence is DNA will prevent some user errors.