Create Seq Record(s) Object from my SORTED list!
1
0
Entering edit mode
5.9 years ago

I have a program that reads in ~20 fasta files, then I parse the id name and sequence name. This information is then appended to a list and sorted...I am sorting on the first element, the directory/filename.

(Next I will concatenate all of the similar sequences, so I would like to use Biopython)

Here is raw output of the sorted list

   [ ('L_MV-22', '/Users/sueparks/P_2_FASTA.txt', '-------------------------------------------ASQKVDELRKK-LDEW----AEDYYAKDAPVVEDAVYDKAYQELVELEKQF----------------PDLVTSDSITQRVGGEIKSDLSKVEHPVPMLSMGDVFSKDELKEFDERITKLVGHPVAYNVELKIDGLSLSLEYTAGKLTRASTRGNGRVGEDVTANAKFIKDIPQTLPEPLTTEVRGECYMEKEAFAKLNAERDEKGEQVFANPRNAAAGSLRQLDARITKKR---------------------N-------LSTFIYTWVNPPKNITSQHQAIDEMNRLGFHTNQTGQRLESMDEVFKFIDEYTAKRNDLSYGIDGIVLKVDDLSLQNELGNTVKVPRWEIAYKFPPEEQETVVKEIEWTVGRTGVVTP-----TAVMDPVQLAGTVVSRASL-HNPDYLRE--------KGVRIGDTVKLHKAGDIIPEISSVVLNKRPKDSEPYQIPNKCPSCGQDLVHLQDEVALRCINP--------------------------------------------MCPAQV-------EEGIIHFASRGAMNIMGLGPRIVKQL-IDKNFVNDVADLYHLTNEQLSQLDHFKDKSITNLLTSIENSKQN-SAELLLFGL---------------------GIDHVGAKAARLILENIN--LEKVSQLTVPELTSIDTIGETIAESLTAYFNQPSAQKLLQELRDSGLNMEYLGTVEEEAPDNFFKEKTVVLTGKLSDFTRSEFTKKLQDLGAKVTG-SVSKKTDYLIYGADAGSKKDKAEKLQ-VPMLTEQEAIA---------------------------------------------'), ('L_MV-22', '/Users/sueparks/P_3_FASTA.txt', '-MAAKKTARKRRVKKHVESGVAHIHSTFNNTLVMITDVQGNAVAWSSAGALGFKGSRKSTPFAAQMAAEAAAKSAMDQGMKHVEVSVKGPGAGREAAIRALQAAGLEITAIRDVTPVPHNGSRPPKRRRV'), ('L_MV-22', '/Users/sueparks/P_4_FASTA.txt', '---------------ASQKVDELRKKLDEWAEDYYAKDAPVVEDAVYDKAYQELVELEKQFPDLVTSDSITQRVGGEIKSDLSKVEHPVPMLSMGDVFSKDELKEFDERITKL--VGHPVAYNVEIKLDGLSLSLEYTAGKLTRASTRGNGRVGEDVTANAKFIKDIPQTLPEPLTTEVRGECYMEKEAFAKLNAERDEKGEQVFANPRNAAAGSLRQLDARITKKRNLSTFIYTWVNP-PKNITSQHQAIDEMNRLGFHTNQTGQRLESMDEVFKFIDEYTAKRNDLSYGIDGIVLKVDDLSLQNELGNTVKVPRWEIAYKFPPEEQETVVKEIEWTVGRTGVVTPTAVMDPVQLAGTVVSRASLHNPDYLREKGVRIGDTVKLHKAGDIIPEISSVVLNKRPKDSEPYQIPNKCPSCGQDLVHLQDEVALRCINPMCPAQVEEGIIHFASRGAMNIMGLGPRIVKQLIDKNFVNDVADLYHLTNEQLSQLDHFKDKSITNLLTSIENSKQNSAELLLFGLGIDHVGAKAARLILENIKNLEKVSQLTVPELTSIDTIGETIAESLTAYFNQPSAQKLLQELRDSGLNMEYLGTVE--EEAPDNFFKEKTVVLTGKLSDFTRSEFTKKLQDLGAKVTGSVSKKTDYLIYGADAGSKKDKAEKLQVPMLTEQEAIA-------'), ('L_MV-22', '/Users/sueparks/P_5_FASTA.txt', '-----KEILLTGDRPTGKLHIGHYIGSLKNRVKLQNEGKYEPYIMIADTQALTDNARDPEKIKNSLIQVALDYLAVGLDPEKSTIYVQSQIPALFELTAYYMDLVTVARLERNPTVKTEIKQKNFKDSIPVGFLNYPVSQAADITAFKATVVPVGDDQEPMLEQTREIVRTFNRVYNTDILVEPKGYFPPKGQGRLPGLDGNAKMSKSLGNAIYLSDDAKTVQKKVMSMYTDPNHI-------------------------------------------------------------------------------------------------------------')]

Here is the file...

import re, glob, sys
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC



# Create a file path for glob function
path =  '/Users/sueparks/' + 'P_*' + '_FASTA.txt'
list_of_files = glob.glob(path)

#Create empty list to hold data
list_fasta = []
# Open file to be write
with open ('sorted_fasta_Files.txt', 'a+') as fw:
    # Loop through each FASTA file
    for file in list_of_files:
        protein = file
        # Parse each FASTA file from
        for s_record in SeqIO.parse(file, 'fasta'):
            #print s_record
            name = s_record.id # Assign a variable for the record id #
            seq = str(s_record.seq) # Assign a variable for the sequence
            link_record = name, protein, seq

            list_fasta.append(link_record) # Append to list

    list_fasta.sort() # sort data
record_list = []
for (name, seq) in enumerate(list_fasta):
    record = SeqRecord(Seq(seq, IUPAC.protein),id=name,  description="")
    #record_list.append(record)


    # write to file
    SeqIO.write(record_list, fw, 'fasta')

My error:

      File "sortFASTA.py", line 32, in <module>
    record = SeqRecord(Seq(seq, IUPAC.protein),id=name,  description="")
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/biopython-1.65-py2.7-macosx-10.6-intel.egg/Bio/Seq.py", line 110, in __init__
    raise TypeError("The sequence data given to a Seq object should be a string (not another Seq object etc)
Python Seq Record Biopython • 3.7k views
ADD COMMENT
1
Entering edit mode

Maybe you’d have better luck reading the SeqRecords directly in to a list a la:

recs = list(SeqIO.parse(...))

Then you can perform various sorts on the list directly (it’s not entirely clear to me on what basis you are sorting them), as lists have a list.sort() method, that can take parameters that control how the sorting is done.

ADD REPLY
1
Entering edit mode
5.9 years ago

It was enumerate was incorrect...

import re, glob, sys


from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC



# Create a file path for glob function
path =  '/Users/sueparks/' + 'P_*' + '_FASTA.txt'
list_of_files = glob.glob(path)

#Create empty list to hold data
list_fasta = []
# Open file to be write
with open ('sorted_fasta_Files.txt', 'a+') as fw:
    # Loop through each FASTA file
    for file in list_of_files:
        protein = file
        # Parse each FASTA file from
        for s_record in SeqIO.parse(file, 'fasta'):
            #print s_record
            name = s_record.id # Assign a variable for the record id #
            seq = str(s_record.seq) # Assign a variable for the sequence
            link_record = name, protein, seq

            list_fasta.append(link_record) # Append to list

    list_fasta.sort() # sort data
    record_list = []
    for (name,protein,seq) in list_fasta:
        record = SeqRecord(Seq(seq, IUPAC.protein),id=name,  description="")
        record_list.append(record)


        # write to file
    SeqIO.write(record_list, fw, 'fasta')
ADD COMMENT

Login before adding your answer.

Traffic: 2452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6