Entering edit mode
                    7.4 years ago
        
    I have a program that reads in ~20 fasta files, then I parse the id name and sequence name. This information is then appended to a list and sorted...I am sorting on the first element, the directory/filename.
(Next I will concatenate all of the similar sequences, so I would like to use Biopython)
Here is raw output of the sorted list
   [ ('L_MV-22', '/Users/sueparks/P_2_FASTA.txt', '-------------------------------------------ASQKVDELRKK-LDEW----AEDYYAKDAPVVEDAVYDKAYQELVELEKQF----------------PDLVTSDSITQRVGGEIKSDLSKVEHPVPMLSMGDVFSKDELKEFDERITKLVGHPVAYNVELKIDGLSLSLEYTAGKLTRASTRGNGRVGEDVTANAKFIKDIPQTLPEPLTTEVRGECYMEKEAFAKLNAERDEKGEQVFANPRNAAAGSLRQLDARITKKR---------------------N-------LSTFIYTWVNPPKNITSQHQAIDEMNRLGFHTNQTGQRLESMDEVFKFIDEYTAKRNDLSYGIDGIVLKVDDLSLQNELGNTVKVPRWEIAYKFPPEEQETVVKEIEWTVGRTGVVTP-----TAVMDPVQLAGTVVSRASL-HNPDYLRE--------KGVRIGDTVKLHKAGDIIPEISSVVLNKRPKDSEPYQIPNKCPSCGQDLVHLQDEVALRCINP--------------------------------------------MCPAQV-------EEGIIHFASRGAMNIMGLGPRIVKQL-IDKNFVNDVADLYHLTNEQLSQLDHFKDKSITNLLTSIENSKQN-SAELLLFGL---------------------GIDHVGAKAARLILENIN--LEKVSQLTVPELTSIDTIGETIAESLTAYFNQPSAQKLLQELRDSGLNMEYLGTVEEEAPDNFFKEKTVVLTGKLSDFTRSEFTKKLQDLGAKVTG-SVSKKTDYLIYGADAGSKKDKAEKLQ-VPMLTEQEAIA---------------------------------------------'), ('L_MV-22', '/Users/sueparks/P_3_FASTA.txt', '-MAAKKTARKRRVKKHVESGVAHIHSTFNNTLVMITDVQGNAVAWSSAGALGFKGSRKSTPFAAQMAAEAAAKSAMDQGMKHVEVSVKGPGAGREAAIRALQAAGLEITAIRDVTPVPHNGSRPPKRRRV'), ('L_MV-22', '/Users/sueparks/P_4_FASTA.txt', '---------------ASQKVDELRKKLDEWAEDYYAKDAPVVEDAVYDKAYQELVELEKQFPDLVTSDSITQRVGGEIKSDLSKVEHPVPMLSMGDVFSKDELKEFDERITKL--VGHPVAYNVEIKLDGLSLSLEYTAGKLTRASTRGNGRVGEDVTANAKFIKDIPQTLPEPLTTEVRGECYMEKEAFAKLNAERDEKGEQVFANPRNAAAGSLRQLDARITKKRNLSTFIYTWVNP-PKNITSQHQAIDEMNRLGFHTNQTGQRLESMDEVFKFIDEYTAKRNDLSYGIDGIVLKVDDLSLQNELGNTVKVPRWEIAYKFPPEEQETVVKEIEWTVGRTGVVTPTAVMDPVQLAGTVVSRASLHNPDYLREKGVRIGDTVKLHKAGDIIPEISSVVLNKRPKDSEPYQIPNKCPSCGQDLVHLQDEVALRCINPMCPAQVEEGIIHFASRGAMNIMGLGPRIVKQLIDKNFVNDVADLYHLTNEQLSQLDHFKDKSITNLLTSIENSKQNSAELLLFGLGIDHVGAKAARLILENIKNLEKVSQLTVPELTSIDTIGETIAESLTAYFNQPSAQKLLQELRDSGLNMEYLGTVE--EEAPDNFFKEKTVVLTGKLSDFTRSEFTKKLQDLGAKVTGSVSKKTDYLIYGADAGSKKDKAEKLQVPMLTEQEAIA-------'), ('L_MV-22', '/Users/sueparks/P_5_FASTA.txt', '-----KEILLTGDRPTGKLHIGHYIGSLKNRVKLQNEGKYEPYIMIADTQALTDNARDPEKIKNSLIQVALDYLAVGLDPEKSTIYVQSQIPALFELTAYYMDLVTVARLERNPTVKTEIKQKNFKDSIPVGFLNYPVSQAADITAFKATVVPVGDDQEPMLEQTREIVRTFNRVYNTDILVEPKGYFPPKGQGRLPGLDGNAKMSKSLGNAIYLSDDAKTVQKKVMSMYTDPNHI-------------------------------------------------------------------------------------------------------------')]
Here is the file...
import re, glob, sys
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
# Create a file path for glob function
path =  '/Users/sueparks/' + 'P_*' + '_FASTA.txt'
list_of_files = glob.glob(path)
#Create empty list to hold data
list_fasta = []
# Open file to be write
with open ('sorted_fasta_Files.txt', 'a+') as fw:
    # Loop through each FASTA file
    for file in list_of_files:
        protein = file
        # Parse each FASTA file from
        for s_record in SeqIO.parse(file, 'fasta'):
            #print s_record
            name = s_record.id # Assign a variable for the record id #
            seq = str(s_record.seq) # Assign a variable for the sequence
            link_record = name, protein, seq
            list_fasta.append(link_record) # Append to list
    list_fasta.sort() # sort data
record_list = []
for (name, seq) in enumerate(list_fasta):
    record = SeqRecord(Seq(seq, IUPAC.protein),id=name,  description="")
    #record_list.append(record)
    # write to file
    SeqIO.write(record_list, fw, 'fasta')
My error:
      File "sortFASTA.py", line 32, in <module>
    record = SeqRecord(Seq(seq, IUPAC.protein),id=name,  description="")
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/biopython-1.65-py2.7-macosx-10.6-intel.egg/Bio/Seq.py", line 110, in __init__
    raise TypeError("The sequence data given to a Seq object should be a string (not another Seq object etc)
                    
                
                
Maybe you’d have better luck reading the SeqRecords directly in to a list a la:
Then you can perform various sorts on the list directly (it’s not entirely clear to me on what basis you are sorting them), as lists have a
list.sort()method, that can take parameters that control how the sorting is done.