Extracting named fasta sequences according to list with Biopython
Entering edit mode
7 weeks ago
lachiemck • 0

Hi all, I'm trying to work out a quick script to extract a set of sequence fasta files from a multifasta and write them all to a new, single fasta file. To elaborate, I've got a proteome, and I want to extract a group of 15 or so proteins associated with a certain process, and write them to a new multifasta. To do so I want the script to read a document with a list of sequence names in it and sort the original multifasta using that list. I'm aiming to do this using Biopython.

This is my code so far:

from Bio import SeqIO
import sys

sample_file = open(str(sys.argv[2]), "r")
seq_list = []

outfile = open(str(sys.argv[3]), "w")

#This reads the guide document and turns each line into a list item in seq_list.
for line in sample_file:
    stripped_line = line.strip()
    line_list = stripped_line.split()

#This print function is to confirm that seq_list is indeed storing the names.

for record in SeqIO.parse(str(sys.argv[1]), "fasta"):
    for n in seq_list:
        if n == record.id:
            SeqIO.write(record, outfile, "fasta")


The main problem so far is that I can load the document of names into seq_list and print the list, but parsing SeqIO with it doesn't seem to do anything. However, hardcoding the names into the code seemed to work fine. Any help would be greatly appreciated.

Thanks, Lachlan

Biopython FASTA • 208 views
Entering edit mode
6 weeks ago

The problem is that the elements of your seq_list are other lists, whereas the record.id is a string. Plus you should never use the in operator on a list, here is a better solution

collect = set()
for line in sample_file:
    stripped_line = line.strip()
    line_list = stripped_line.split()

then later:

for record in SeqIO.parse(str(sys.argv[1]), "fast"):
    if record.id in collect:
         SeqIO.write(record, outfile, "fasta")

Login before adding your answer.

Traffic: 2488 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6