Question: Create Seq Record(s) Object from my SORTED list!
0
gravatar for incensefrenzie2006
2.4 years ago by
incensefrenzie20060 wrote:

I have a program that reads in ~20 fasta files, then I parse the id name and sequence name. This information is then appended to a list and sorted...I am sorting on the first element, the directory/filename.

(Next I will concatenate all of the similar sequences, so I would like to use Biopython)

Here is raw output of the sorted list

   [ ('L_MV-22', '/Users/sueparks/P_2_FASTA.txt', '-------------------------------------------ASQKVDELRKK-LDEW----AEDYYAKDAPVVEDAVYDKAYQELVELEKQF----------------PDLVTSDSITQRVGGEIKSDLSKVEHPVPMLSMGDVFSKDELKEFDERITKLVGHPVAYNVELKIDGLSLSLEYTAGKLTRASTRGNGRVGEDVTANAKFIKDIPQTLPEPLTTEVRGECYMEKEAFAKLNAERDEKGEQVFANPRNAAAGSLRQLDARITKKR---------------------N-------LSTFIYTWVNPPKNITSQHQAIDEMNRLGFHTNQTGQRLESMDEVFKFIDEYTAKRNDLSYGIDGIVLKVDDLSLQNELGNTVKVPRWEIAYKFPPEEQETVVKEIEWTVGRTGVVTP-----TAVMDPVQLAGTVVSRASL-HNPDYLRE--------KGVRIGDTVKLHKAGDIIPEISSVVLNKRPKDSEPYQIPNKCPSCGQDLVHLQDEVALRCINP--------------------------------------------MCPAQV-------EEGIIHFASRGAMNIMGLGPRIVKQL-IDKNFVNDVADLYHLTNEQLSQLDHFKDKSITNLLTSIENSKQN-SAELLLFGL---------------------GIDHVGAKAARLILENIN--LEKVSQLTVPELTSIDTIGETIAESLTAYFNQPSAQKLLQELRDSGLNMEYLGTVEEEAPDNFFKEKTVVLTGKLSDFTRSEFTKKLQDLGAKVTG-SVSKKTDYLIYGADAGSKKDKAEKLQ-VPMLTEQEAIA---------------------------------------------'), ('L_MV-22', '/Users/sueparks/P_3_FASTA.txt', '-MAAKKTARKRRVKKHVESGVAHIHSTFNNTLVMITDVQGNAVAWSSAGALGFKGSRKSTPFAAQMAAEAAAKSAMDQGMKHVEVSVKGPGAGREAAIRALQAAGLEITAIRDVTPVPHNGSRPPKRRRV'), ('L_MV-22', '/Users/sueparks/P_4_FASTA.txt', '---------------ASQKVDELRKKLDEWAEDYYAKDAPVVEDAVYDKAYQELVELEKQFPDLVTSDSITQRVGGEIKSDLSKVEHPVPMLSMGDVFSKDELKEFDERITKL--VGHPVAYNVEIKLDGLSLSLEYTAGKLTRASTRGNGRVGEDVTANAKFIKDIPQTLPEPLTTEVRGECYMEKEAFAKLNAERDEKGEQVFANPRNAAAGSLRQLDARITKKRNLSTFIYTWVNP-PKNITSQHQAIDEMNRLGFHTNQTGQRLESMDEVFKFIDEYTAKRNDLSYGIDGIVLKVDDLSLQNELGNTVKVPRWEIAYKFPPEEQETVVKEIEWTVGRTGVVTPTAVMDPVQLAGTVVSRASLHNPDYLREKGVRIGDTVKLHKAGDIIPEISSVVLNKRPKDSEPYQIPNKCPSCGQDLVHLQDEVALRCINPMCPAQVEEGIIHFASRGAMNIMGLGPRIVKQLIDKNFVNDVADLYHLTNEQLSQLDHFKDKSITNLLTSIENSKQNSAELLLFGLGIDHVGAKAARLILENIKNLEKVSQLTVPELTSIDTIGETIAESLTAYFNQPSAQKLLQELRDSGLNMEYLGTVE--EEAPDNFFKEKTVVLTGKLSDFTRSEFTKKLQDLGAKVTGSVSKKTDYLIYGADAGSKKDKAEKLQVPMLTEQEAIA-------'), ('L_MV-22', '/Users/sueparks/P_5_FASTA.txt', '-----KEILLTGDRPTGKLHIGHYIGSLKNRVKLQNEGKYEPYIMIADTQALTDNARDPEKIKNSLIQVALDYLAVGLDPEKSTIYVQSQIPALFELTAYYMDLVTVARLERNPTVKTEIKQKNFKDSIPVGFLNYPVSQAADITAFKATVVPVGDDQEPMLEQTREIVRTFNRVYNTDILVEPKGYFPPKGQGRLPGLDGNAKMSKSLGNAIYLSDDAKTVQKKVMSMYTDPNHI-------------------------------------------------------------------------------------------------------------')]

Here is the file...

import re, glob, sys
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC



# Create a file path for glob function
path =  '/Users/sueparks/' + 'P_*' + '_FASTA.txt'
list_of_files = glob.glob(path)

#Create empty list to hold data
list_fasta = []
# Open file to be write
with open ('sorted_fasta_Files.txt', 'a+') as fw:
    # Loop through each FASTA file
    for file in list_of_files:
        protein = file
        # Parse each FASTA file from
        for s_record in SeqIO.parse(file, 'fasta'):
            #print s_record
            name = s_record.id # Assign a variable for the record id #
            seq = str(s_record.seq) # Assign a variable for the sequence
            link_record = name, protein, seq

            list_fasta.append(link_record) # Append to list

    list_fasta.sort() # sort data
record_list = []
for (name, seq) in enumerate(list_fasta):
    record = SeqRecord(Seq(seq, IUPAC.protein),id=name,  description="")
    #record_list.append(record)


    # write to file
    SeqIO.write(record_list, fw, 'fasta')

My error:

      File "sortFASTA.py", line 32, in <module>
    record = SeqRecord(Seq(seq, IUPAC.protein),id=name,  description="")
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/biopython-1.65-py2.7-macosx-10.6-intel.egg/Bio/Seq.py", line 110, in __init__
    raise TypeError("The sequence data given to a Seq object should be a string (not another Seq object etc)
seq record biopython python • 1.5k views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by incensefrenzie20060
1

Maybe you’d have better luck reading the SeqRecords directly in to a list a la:

recs = list(SeqIO.parse(...))

Then you can perform various sorts on the list directly (it’s not entirely clear to me on what basis you are sorting them), as lists have a list.sort() method, that can take parameters that control how the sorting is done.

ADD REPLYlink written 2.4 years ago by Joe18k
1
gravatar for incensefrenzie2006
2.4 years ago by
incensefrenzie20060 wrote:

It was enumerate was incorrect...

import re, glob, sys


from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC



# Create a file path for glob function
path =  '/Users/sueparks/' + 'P_*' + '_FASTA.txt'
list_of_files = glob.glob(path)

#Create empty list to hold data
list_fasta = []
# Open file to be write
with open ('sorted_fasta_Files.txt', 'a+') as fw:
    # Loop through each FASTA file
    for file in list_of_files:
        protein = file
        # Parse each FASTA file from
        for s_record in SeqIO.parse(file, 'fasta'):
            #print s_record
            name = s_record.id # Assign a variable for the record id #
            seq = str(s_record.seq) # Assign a variable for the sequence
            link_record = name, protein, seq

            list_fasta.append(link_record) # Append to list

    list_fasta.sort() # sort data
    record_list = []
    for (name,protein,seq) in list_fasta:
        record = SeqRecord(Seq(seq, IUPAC.protein),id=name,  description="")
        record_list.append(record)


        # write to file
    SeqIO.write(record_list, fw, 'fasta')
ADD COMMENTlink written 2.4 years ago by incensefrenzie20060
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1205 users visited in the last hour