Question: Filtering fasta file based on identifier
2
gravatar for jyu429
5.0 years ago by
jyu429120
United States
jyu429120 wrote:

Hi, I have a fasta file with many segments and I want to filter out all the segments that have a "P" in the identifier of the segment. Is there a conventional way to do so? Thanks.

filter fasta • 2.2k views
ADD COMMENTlink modified 4.8 years ago by Biostar ♦♦ 20 • written 5.0 years ago by jyu429120

Thank you! 

ADD REPLYlink written 5.0 years ago by jyu429120
4
gravatar for RamRS
5.0 years ago by
RamRS24k
Houston, TX
RamRS24k wrote:
bioawk -c fastx '$name ~ /P/ { print ">"$name; print $seq }' <sequences.fa

If you wanna take all except those with a "P",

bioawk -c fastx '$name ! /P/ { print ">"$name; print $seq }' <sequences.fa

bioawk here: https://github.com/lh3/bioawk

ADD COMMENTlink modified 18 months ago • written 5.0 years ago by RamRS24k
1

Neat. I've not found a use for bioawk before but this seems perfect.

ADD REPLYlink written 5.0 years ago by Matt Shirley9.2k

It kinda clicked out of the blue for me yesterday. Now I'm gonna add this to my arsenal of regular-use tools :-)

ADD REPLYlink modified 18 months ago • written 5.0 years ago by RamRS24k

how can give the transcriptome.fasta and headerlist.txt in this command?

ADD REPLYlink written 3 months ago by Shahzad10

What are those two files?

ADD REPLYlink written 3 months ago by RamRS24k
0
gravatar for Pierre Lindenbaum
5.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

just awk

awk '/^>/{N=0} /^>P/{N=1} {if(N)print}' *.fa
ADD COMMENTlink modified 18 months ago by RamRS24k • written 5.0 years ago by Pierre Lindenbaum124k

Maybe

/^>\S*P\S*/

To match identifiers (up to the first space) that contain P rather than just identifiers that start with P.

ADD REPLYlink modified 18 months ago by RamRS24k • written 5.0 years ago by Rob Syme540

Would this not print only headers, Pierre?

ADD REPLYlink written 5.0 years ago by RamRS24k

no, if there is no 'next' statement, awk continues to scan all the patterns.

ADD REPLYlink written 5.0 years ago by Pierre Lindenbaum124k

Oops, I read it wrong. I read it as the if(N) print being in the same {} as the N=1. My bad!

ADD REPLYlink modified 3.0 years ago • written 5.0 years ago by RamRS24k

But where does the ouput go? Sorry for my ignorance.

ADD REPLYlink written 3.0 years ago by jahn.davik0

"standard out" or "stdout". You can redirect this to a file like:

awk '/^>/{N=0} /^>P/{N=1} {if(N)print}' *.fa > out.fa
ADD REPLYlink written 3.0 years ago by Matt Shirley9.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1248 users visited in the last hour