Question

How To Read Dna Sequences From More Than One Fasta File From A Directory?

0

Entering edit mode

11.8 years ago

viv_bio ▴ 50

import os
from Bio import SeqIO
import glob
list_of_files = glob.glob( "directory path/./*.fasta")
for file_name in list_of_files:
       R = SeqIO.parse(file_name)
       for records in R:
                     print records

from this i can parse over all the files in directory but i am not able to print Sequence records in it .

biopython python • 3.7k views

ADD COMMENT • link updated 11.8 years ago by Leonor Palmeira 3.9k • written 11.8 years ago by viv_bio ▴ 50

0

Entering edit mode

Is that pseudo code? The SeqIO parse function requires a format argument as well, e.g. "fasta" or "gb".

ADD REPLY • link 11.8 years ago by Peter 6.0k

score 3 · Answer 1 · 2012-07-01

SeqIO.parse() returns a SeqRecord object, and the __str__() method for this object (method implicitely called whenever you run a 'print x') will return a bunch of information and not just the sequence:

ID: Z78439.1
Name: Z78439
Description: P.barbatum 5.8S rRNA gene and ITS1 and ITS2 DNA.
Number of features: 5
/source=Paphiopedilum barbatum
/taxonomy=['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', ..., 'Paphiopedilum']
/keywords=['5.8S ribosomal RNA', '5.8S rRNA gene', 'internal transcribed spacer', 'ITS1', 'ITS2']
/references=[<Bio.SeqFeature.Reference ...>, <Bio.SeqFeature.Reference ...>]
/data_file_division=PLN
/date=30-NOV-2006
/organism=Paphiopedilum barbatum
/gi=2765564
Seq('CATTGTTGAGATCACATAATAATTGATCGAGTTAATCTGGAGGATCTGTTTACTTTGGTC ...', IUPACAmbiguousDNA())

Have you tried rather a

print records.seq

or a

print records.format("fasta")

This is possibly what you are looking for.