filter out fasta File by pattern
2
2
Entering edit mode
6.9 years ago
Assa Yeroslaviz ★ 1.6k

Hi,

I would like to filter out some sequences from a fasta file by using a specific pattern.

for example I have this file:

>input1
UGAGGUAGUAGG
>input2
CUAUGCUUACC
>out1
UCCCUGAGACCGUGA
>out2
CUCCGGGUACC
>desc1
ACUUCCUUACAUGCCC

I know already how I can extract all the fasta sequences with a specific pattern into a new file by using awk.

But what I would like to do is to remove all entries of a specific pattern from the original fasta file and save the newly made file into a new one. In my file above, I would like for example to remove all sequences with the header pattern out. and save only the other to a new file.

Is there a tool somewhere for doing that, or is it possible in awk/sed or even grep

 

thanks

Assa

awk pattern regexp fasta sed • 6.0k views
ADD COMMENT
7
Entering edit mode
6.9 years ago

> I would like for example to remove all sequences with the header pattern out. and save only the other to a new file

awk '/^>/ {P=index($0,"out")==0} {if(P) print} ' in.fasta > out.fasta
ADD COMMENT
0
Entering edit mode

thanks. that was fast :-)

 

ADD REPLY
0
Entering edit mode

Hi Pierre! Thanks for your command.

I have a question on this issue. Instead of one header pattern "out" (in your case), I am looking for many patterns that are stored in a file. So what should I do? Your help is appreciated in advance.

ADD REPLY
1
Entering edit mode

You could modify my answer below:

$ pip install pyfaidx
$ xargs faidx in.fasta -g > out.fasta < patterns.txt
ADD REPLY
3
Entering edit mode
6.9 years ago

I like Pierre's answer to this since it's simple. However, I had been thinking about adding regular expression filtering to my pyfaidx project, and this morning I finished up adding this functionality:

$ pip install pyfaidx
$ faidx in.fasta -g "out" > out.fasta

The (small) advantage here is that faidx will perform filtering on an indexed file, preventing you from reading the entire file through your filter.

ADD COMMENT

Login before adding your answer.

Traffic: 2217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6