Question: Python 3.4, BioPython 1.65: Specifying dictionary keys in FASTA file and writing sequences with certain key values to file
1
gravatar for tyleraelliott
5.4 years ago by
Canada
tyleraelliott50 wrote:

I have a multi-fasta file containing sequences with headers such as:

>AAST01014508.1|(1..6240)|LTR/Pao|ROOA_I-int:ROOA_LTR

>AAST01026747.1|c(2745..6820)|LTR/Pao|ROO_I-int

 

In the second section of the header where the position is given I want to index all of the sequences using this and print to file all those sequence from the C strand, eg. >AAST01026747.1|c(2745..6820)|LTR/Pao|ROO_I-int

I was able to create a function easily which did this:

def get_compstrandTE(record):
    parts=record.id.split("|")
    assert len(parts) ==4
    return parts[1]

 

However, I am now stuck as to how to search through the keys in the dictionary and find only those containing the 'c' and write those to a file. I tried using the example from the BioPython manual but kept running into difficulties. 

 

If anyone has any suggestions I would really appreciate it.

 

 

biopython python • 1.9k views
ADD COMMENTlink modified 4.1 years ago by Biostar ♦♦ 20 • written 5.4 years ago by tyleraelliott50
2
gravatar for Peter
5.4 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

You don't need a dictionary for this task.

Something like this should work, loosely based on of the filtering examples from the Biopython Tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html using a generator expression:

from Bio import SeqIO
input_file = "big_file.fasta"
output_file = "complements.fasta"

def wanted(record):
    """Returns True if name scheme suggests from complement stand."""
    parts = record.id.split("|")
    assert len(parts) == 4
    return parts[1].startswith("c")

records = (r for r in SeqIO.parse(input_file, "fasta") if wanted(r))
count = SeqIO.write(records, output_file, "fasta")
print("Saved %i records from %s to %s" % (count, input_file, output_file))
ADD COMMENTlink written 5.4 years ago by Peter5.8k
1

Wonderful, thank again!

 

ADD REPLYlink written 5.4 years ago by tyleraelliott50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1200 users visited in the last hour