How To Read Dna Sequences From More Than One Fasta File From A Directory?
1
0
Entering edit mode
11.8 years ago
viv_bio ▴ 50
import os
from Bio import SeqIO
import glob
list_of_files = glob.glob( "directory path/./*.fasta")
for file_name in list_of_files:
       R = SeqIO.parse(file_name)
       for records in R:
                     print records

from this i can parse over all the files in directory but i am not able to print Sequence records in it .

biopython python • 3.7k views
ADD COMMENT
0
Entering edit mode

Is that pseudo code? The SeqIO parse function requires a format argument as well, e.g. "fasta" or "gb".

ADD REPLY
3
Entering edit mode
11.8 years ago

SeqIO.parse() returns a SeqRecord object, and the __str__() method for this object (method implicitely called whenever you run a 'print x') will return a bunch of information and not just the sequence:

ID: Z78439.1
Name: Z78439
Description: P.barbatum 5.8S rRNA gene and ITS1 and ITS2 DNA.
Number of features: 5
/source=Paphiopedilum barbatum
/taxonomy=['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', ..., 'Paphiopedilum']
/keywords=['5.8S ribosomal RNA', '5.8S rRNA gene', 'internal transcribed spacer', 'ITS1', 'ITS2']
/references=[<Bio.SeqFeature.Reference ...>, <Bio.SeqFeature.Reference ...>]
/data_file_division=PLN
/date=30-NOV-2006
/organism=Paphiopedilum barbatum
/gi=2765564
Seq('CATTGTTGAGATCACATAATAATTGATCGAGTTAATCTGGAGGATCTGTTTACTTTGGTC ...', IUPACAmbiguousDNA())

Have you tried rather a

print records.seq

or a

print records.format("fasta")

This is possibly what you are looking for.

ADD COMMENT

Login before adding your answer.

Traffic: 2283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6