Entering edit mode
5.9 years ago
I have a program that reads in ~20 fasta files, then I parse the id name and sequence name. This information is then appended to a list and sorted...I am sorting on the first element, the directory/filename.
(Next I will concatenate all of the similar sequences, so I would like to use Biopython)
Here is raw output of the sorted list
[ ('L_MV-22', '/Users/sueparks/P_2_FASTA.txt', '-------------------------------------------ASQKVDELRKK-LDEW----AEDYYAKDAPVVEDAVYDKAYQELVELEKQF----------------PDLVTSDSITQRVGGEIKSDLSKVEHPVPMLSMGDVFSKDELKEFDERITKLVGHPVAYNVELKIDGLSLSLEYTAGKLTRASTRGNGRVGEDVTANAKFIKDIPQTLPEPLTTEVRGECYMEKEAFAKLNAERDEKGEQVFANPRNAAAGSLRQLDARITKKR---------------------N-------LSTFIYTWVNPPKNITSQHQAIDEMNRLGFHTNQTGQRLESMDEVFKFIDEYTAKRNDLSYGIDGIVLKVDDLSLQNELGNTVKVPRWEIAYKFPPEEQETVVKEIEWTVGRTGVVTP-----TAVMDPVQLAGTVVSRASL-HNPDYLRE--------KGVRIGDTVKLHKAGDIIPEISSVVLNKRPKDSEPYQIPNKCPSCGQDLVHLQDEVALRCINP--------------------------------------------MCPAQV-------EEGIIHFASRGAMNIMGLGPRIVKQL-IDKNFVNDVADLYHLTNEQLSQLDHFKDKSITNLLTSIENSKQN-SAELLLFGL---------------------GIDHVGAKAARLILENIN--LEKVSQLTVPELTSIDTIGETIAESLTAYFNQPSAQKLLQELRDSGLNMEYLGTVEEEAPDNFFKEKTVVLTGKLSDFTRSEFTKKLQDLGAKVTG-SVSKKTDYLIYGADAGSKKDKAEKLQ-VPMLTEQEAIA---------------------------------------------'), ('L_MV-22', '/Users/sueparks/P_3_FASTA.txt', '-MAAKKTARKRRVKKHVESGVAHIHSTFNNTLVMITDVQGNAVAWSSAGALGFKGSRKSTPFAAQMAAEAAAKSAMDQGMKHVEVSVKGPGAGREAAIRALQAAGLEITAIRDVTPVPHNGSRPPKRRRV'), ('L_MV-22', '/Users/sueparks/P_4_FASTA.txt', '---------------ASQKVDELRKKLDEWAEDYYAKDAPVVEDAVYDKAYQELVELEKQFPDLVTSDSITQRVGGEIKSDLSKVEHPVPMLSMGDVFSKDELKEFDERITKL--VGHPVAYNVEIKLDGLSLSLEYTAGKLTRASTRGNGRVGEDVTANAKFIKDIPQTLPEPLTTEVRGECYMEKEAFAKLNAERDEKGEQVFANPRNAAAGSLRQLDARITKKRNLSTFIYTWVNP-PKNITSQHQAIDEMNRLGFHTNQTGQRLESMDEVFKFIDEYTAKRNDLSYGIDGIVLKVDDLSLQNELGNTVKVPRWEIAYKFPPEEQETVVKEIEWTVGRTGVVTPTAVMDPVQLAGTVVSRASLHNPDYLREKGVRIGDTVKLHKAGDIIPEISSVVLNKRPKDSEPYQIPNKCPSCGQDLVHLQDEVALRCINPMCPAQVEEGIIHFASRGAMNIMGLGPRIVKQLIDKNFVNDVADLYHLTNEQLSQLDHFKDKSITNLLTSIENSKQNSAELLLFGLGIDHVGAKAARLILENIKNLEKVSQLTVPELTSIDTIGETIAESLTAYFNQPSAQKLLQELRDSGLNMEYLGTVE--EEAPDNFFKEKTVVLTGKLSDFTRSEFTKKLQDLGAKVTGSVSKKTDYLIYGADAGSKKDKAEKLQVPMLTEQEAIA-------'), ('L_MV-22', '/Users/sueparks/P_5_FASTA.txt', '-----KEILLTGDRPTGKLHIGHYIGSLKNRVKLQNEGKYEPYIMIADTQALTDNARDPEKIKNSLIQVALDYLAVGLDPEKSTIYVQSQIPALFELTAYYMDLVTVARLERNPTVKTEIKQKNFKDSIPVGFLNYPVSQAADITAFKATVVPVGDDQEPMLEQTREIVRTFNRVYNTDILVEPKGYFPPKGQGRLPGLDGNAKMSKSLGNAIYLSDDAKTVQKKVMSMYTDPNHI-------------------------------------------------------------------------------------------------------------')]
Here is the file...
import re, glob, sys
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
# Create a file path for glob function
path = '/Users/sueparks/' + 'P_*' + '_FASTA.txt'
list_of_files = glob.glob(path)
#Create empty list to hold data
list_fasta = []
# Open file to be write
with open ('sorted_fasta_Files.txt', 'a+') as fw:
# Loop through each FASTA file
for file in list_of_files:
protein = file
# Parse each FASTA file from
for s_record in SeqIO.parse(file, 'fasta'):
#print s_record
name = s_record.id # Assign a variable for the record id #
seq = str(s_record.seq) # Assign a variable for the sequence
link_record = name, protein, seq
list_fasta.append(link_record) # Append to list
list_fasta.sort() # sort data
record_list = []
for (name, seq) in enumerate(list_fasta):
record = SeqRecord(Seq(seq, IUPAC.protein),id=name, description="")
#record_list.append(record)
# write to file
SeqIO.write(record_list, fw, 'fasta')
My error:
File "sortFASTA.py", line 32, in <module>
record = SeqRecord(Seq(seq, IUPAC.protein),id=name, description="")
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/biopython-1.65-py2.7-macosx-10.6-intel.egg/Bio/Seq.py", line 110, in __init__
raise TypeError("The sequence data given to a Seq object should be a string (not another Seq object etc)
Maybe you’d have better luck reading the SeqRecords directly in to a list a la:
Then you can perform various sorts on the list directly (it’s not entirely clear to me on what basis you are sorting them), as lists have a
list.sort()
method, that can take parameters that control how the sorting is done.