Question: How to filter a fasta file?
gravatar for oussama.badad
4.4 years ago by
oussama.badad10 wrote:

Dear All

I am trying to remove the sequences with no functional information from a functional annotation fasta file

>Oeu043107.1|g7lir3_medtr alpha beta-hydrolase superfamily protein os=medicago truncatula gn=mtr_8g086260 pe=4 sv=1

i was wondering if someone can help me with a python script or a shell script

Thank you


ADD COMMENTlink modified 4.4 years ago by • written 4.4 years ago by oussama.badad10

You could convert to single line fasta and use a simple grep to remove lines with --NA-- or use bio Python to have more control.

ADD REPLYlink written 4.4 years ago by geek_y11k
gravatar for novice
4.4 years ago by
United States
novice960 wrote:

Too easy bro:


use strict; use warnings;

my $print;

while (<>) {
    $print = m/---NA---/ ? 0: 1 if m/>/;
    print if $print;

Usage: perl sequences.fasta > filtered.fasta or ./ ... if you made the script executable.

Since I've got time on my hands and like Perl, I wrote you another script. This one would "slurp" the entire file into memory, so it's probably best to avoid if your file is huge.


use strict; use warnings;

my $whole = do { local $/; <> };
my @keep = map { s/\A([^>])/>$1/; $_ }
    grep { ! m/---NA---/; } split />/, $whole;
print @keep;
ADD COMMENTlink modified 5 months ago by RamRS27k • written 4.4 years ago by novice960
gravatar for
4.4 years ago by
Philadelphia, PA wrote:

Since you mentioned you wanted python:

Edit '':

#!/usr/bin/env python

import sys

input_file = sys.argv[1]
output file = sys.argv[2]

with open(input_file, 'r') as f:
    headers = []
    seqs = []

    for line in f:
        if line.startswith(">"):

myseqs = dict(zip(headers,seqs))

with open(outfile, 'w') as out: 
    for m in myseqs:
        if '---NA---' not in m:
            print >> out, m, '\n', myseqs[m]


python input.fasta output.fasta
ADD COMMENTlink modified 5 months ago by RamRS27k • written 4.4 years ago by
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2238 users visited in the last hour