Question: Filtering fasta file based on identifier
2
gravatar for jyu429
4.4 years ago by
jyu429120
United States
jyu429120 wrote:

Hi, I have a fasta file with many segments and I want to filter out all the segments that have a "P" in the identifier of the segment. Is there a conventional way to do so? Thanks.

filter fasta • 2.0k views
ADD COMMENTlink modified 4.1 years ago by Biostar ♦♦ 20 • written 4.4 years ago by jyu429120

Thank you! 

ADD REPLYlink written 4.4 years ago by jyu429120
4
gravatar for RamRS
4.4 years ago by
RamRS20k
Houston, TX
RamRS20k wrote:
bioawk -c fastx '$name ~ /P/ { print ">"$name; print $seq }' <sequences.fa

If you wanna take all except those with a "P",

bioawk -c fastx '$name ! /P/ { print ">"$name; print $seq }' <sequences.fa

bioawk here: https://github.com/lh3/bioawk

ADD COMMENTlink modified 9 months ago • written 4.4 years ago by RamRS20k
1

Neat. I've not found a use for bioawk before but this seems perfect.

ADD REPLYlink written 4.4 years ago by Matt Shirley8.9k

It kinda clicked out of the blue for me yesterday. Now I'm gonna add this to my arsenal of regular-use tools :-)

ADD REPLYlink modified 9 months ago • written 4.4 years ago by RamRS20k
0
gravatar for Pierre Lindenbaum
4.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

just awk

awk '/^>/{N=0} /^>P/{N=1} {if(N)print}' *.fa
ADD COMMENTlink modified 9 months ago by RamRS20k • written 4.4 years ago by Pierre Lindenbaum118k

Maybe

/^>\S*P\S*/

To match identifiers (up to the first space) that contain P rather than just identifiers that start with P.

ADD REPLYlink modified 9 months ago by RamRS20k • written 4.4 years ago by Rob Syme530

Would this not print only headers, Pierre?

ADD REPLYlink written 4.4 years ago by RamRS20k

no, if there is no 'next' statement, awk continues to scan all the patterns.

ADD REPLYlink written 4.4 years ago by Pierre Lindenbaum118k

Oops, I read it wrong. I read it as the if(N) print being in the same {} as the N=1. My bad!

ADD REPLYlink modified 2.4 years ago • written 4.4 years ago by RamRS20k

But where does the ouput go? Sorry for my ignorance.

ADD REPLYlink written 2.4 years ago by jahn.davik0

"standard out" or "stdout". You can redirect this to a file like:

awk '/^>/{N=0} /^>P/{N=1} {if(N)print}' *.fa > out.fa
ADD REPLYlink written 2.4 years ago by Matt Shirley8.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1155 users visited in the last hour