Is it possible to delete part of the header string?
2
0
Entering edit mode
2.5 years ago
Riku ▴ 80

Hi, all.

I would like to remove "locus=" and "gene=" from headers of fasta as following. I used tr, but the other strings disappeared, too.

Before;

>3R5.1a wormpep=CE24758 gene=WBGene00007065 locus=pot-3 insdc=CAA21777.2 product="POT1PC domain-containing protein"
>2RSSE.1a wormpep=CE32785 gene=WBGene00007064 locus=rga-9 insdc=CCD61138.1 product="Rho-GAP domain-containing protein"
>2L52.1a wormpep=CE32090 gene=WBGene00007063 insdc=CCD61130.1

After;

>3R5.1a wormpep=CE24758 WBGene00007065 pot-3 insdc=CAA21777.2 product="POT1PC domain-containing protein"
>2RSSE.1a wormpep=CE32785 WBGene00007064 rga-9 insdc=CCD61138.1 product="Rho-GAP domain-containing protein"
>2L52.1a wormpep=CE32090 WBGene00007063 insdc=CCD61130.1

Is there a way to delete only a specific string in a non delimited string? Could you please give me some help.

Thank you very much for your help!

bash Linux fasta • 1.0k views
ADD COMMENT
1
Entering edit mode
2.5 years ago
GenoMax 141k
sed -e 's/locus=//g' -e 's/gene=//g' your.fa > new.fa 
ADD COMMENT
0
Entering edit mode

It's a simplest answer! I can remove it because of you.

Thank you very much for your quick answer!

ADD REPLY
0
Entering edit mode
2.5 years ago
$ seqkit replace -p 'gene=|locus=' -r "" file.fa
$ awk '/^>/ {gsub(/gene=|locus=/,"",$0)}1' file.fa
ADD COMMENT
0
Entering edit mode

I can do the same thing with "seqkit" and "awk". I'm good to know other way, too.

Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 2657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6