Question: Extract fasta-headers that have the same word
0
gravatar for genomes_and_MGEs
17 months ago by
genomes_and_MGEs0 wrote:

Hey guys,

I have a multi-fasta protein file like this

>SF_hydrolase MKG...
>LH_reductase MKI...
>SM_hydrolase MSN...

Basically, I would like to extract only the fasta headers that have the other "reductase". I know how to extract headers that have the same headers as the ones present on a list, but I don't know how to extract fasta-headers solely based on one of the words.. Hope you can help me! Cheers

sequencing assembly • 536 views
ADD COMMENTlink modified 17 months ago by genomax87k • written 17 months ago by genomes_and_MGEs0
3
gravatar for genomax
17 months ago by
genomax87k
United States
genomax87k wrote:

grep is your friend. It will look for words you specify in lines.

ADD COMMENTlink written 17 months ago by genomax87k

...and you can use grep -A and grep -B to extract lines before or after the line on which the matching keyword was found.

ADD REPLYlink written 17 months ago by Kevin Blighe63k

That's a valid option, but sometimes fasta sequences might be wrapped in which case it might be a bit difficult. Why not use grep and parse it using a fasta parser?

ADD REPLYlink written 17 months ago by harish290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1588 users visited in the last hour