Question: filter out fasta File by pattern
1
gravatar for Assa Yeroslaviz
4.1 years ago by
Assa Yeroslaviz1.2k
Munich
Assa Yeroslaviz1.2k wrote:

Hi,

I would like to filter out some sequences from a fasta file by using a specific pattern.

for example I have this file:

>input1
UGAGGUAGUAGG
>input2
CUAUGCUUACC
>out1
UCCCUGAGACCGUGA
>out2
CUCCGGGUACC
>desc1
ACUUCCUUACAUGCCC

I know already how I can extract all the fasta sequences with a specific pattern into a new file by using awk.

But what I would like to do is to remove all entries of a specific pattern from the original fasta file and save the newly made file into a new one. In my file above, I would like for example to remove all sequences with the header pattern out. and save only the other to a new file.

Is there a tool somewhere for doing that, or is it possible in awk/sed or even grep

 

thanks

Assa

awk sed pattern regexp fasta • 3.6k views
ADD COMMENTlink modified 4.1 years ago by Matt Shirley8.9k • written 4.1 years ago by Assa Yeroslaviz1.2k
6
gravatar for Pierre Lindenbaum
4.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

> I would like for example to remove all sequences with the header pattern out. and save only the other to a new file

awk '/^>/ {P=index($0,"out")==0} {if(P) print} ' in.fasta > out.fasta
ADD COMMENTlink written 4.1 years ago by Pierre Lindenbaum118k

thanks. that was fast :-)

 

ADD REPLYlink written 4.1 years ago by Assa Yeroslaviz1.2k

Hi Pierre! Thanks for your command.

I have a question on this issue. Instead of one header pattern "out" (in your case), I am looking for many patterns that are stored in a file. So what should I do? Your help is appreciated in advance.

ADD REPLYlink written 3.7 years ago by xiachongjing10
1

You could modify my answer below:

$ pip install pyfaidx
$ xargs faidx in.fasta -g > out.fasta < patterns.txt
ADD REPLYlink written 3.7 years ago by Matt Shirley8.9k
3
gravatar for Matt Shirley
4.1 years ago by
Matt Shirley8.9k
Cambridge, MA
Matt Shirley8.9k wrote:

I like Pierre's answer to this since it's simple. However, I had been thinking about adding regular expression filtering to my pyfaidx project, and this morning I finished up adding this functionality:

$ pip install pyfaidx
$ faidx in.fasta -g "out" > out.fasta

The (small) advantage here is that faidx will perform filtering on an indexed file, preventing you from reading the entire file through your filter.

ADD COMMENTlink written 4.1 years ago by Matt Shirley8.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1261 users visited in the last hour