Question: Extract fasta-headers that have the same word
0
gravatar for genomes_and_MGEs
8 months ago by
genomes_and_MGEs0 wrote:

Hey guys,

I have a multi-fasta protein file like this

>SF_hydrolase MKG...
>LH_reductase MKI...
>SM_hydrolase MSN...

Basically, I would like to extract only the fasta headers that have the other "reductase". I know how to extract headers that have the same headers as the ones present on a list, but I don't know how to extract fasta-headers solely based on one of the words.. Hope you can help me! Cheers

sequencing assembly • 317 views
ADD COMMENTlink modified 8 months ago by genomax74k • written 8 months ago by genomes_and_MGEs0
3
gravatar for genomax
8 months ago by
genomax74k
United States
genomax74k wrote:

grep is your friend. It will look for words you specify in lines.

ADD COMMENTlink written 8 months ago by genomax74k

...and you can use grep -A and grep -B to extract lines before or after the line on which the matching keyword was found.

ADD REPLYlink written 8 months ago by Kevin Blighe51k

That's a valid option, but sometimes fasta sequences might be wrapped in which case it might be a bit difficult. Why not use grep and parse it using a fasta parser?

ADD REPLYlink written 8 months ago by harish230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1773 users visited in the last hour