Reduce the headers in a fasta file to just the gi number?
1
0
Entering edit mode
6.5 years ago
I simply want to reduce the headers in a fasta file from the long version below to simply the gi. i.e.

    >gi|103058628|gb|DQ517338.1| Staphylococcus phage 80alpha, complete sequence
    AGGTATCTGCATAGTTATTCCGAACTTCCAATTAATAAAACTCTATACCCGTAATCTTCAATGAGTTCTG
    GCGCTTCCCTTTAATTCCTTTTACATATTCAAAATGAATGTTTTTGATTGCCATCTTTATGAATTCAGTT
    TTTAACTCATCTTCCATTAATTCCCAGCCGTTTAGCAATGAATACTTGAAATTTTTAATCTTCTCATAGT

To:

     >103058628
    AGGTATCTGCATAGTTATTCCGAACTTCCAATTAATAAAACTCTATACCCGTAATCTTCAATGAGTTCTG
    GCGCTTCCCTTTAATTCCTTTTACATATTCAAAATGAATGTTTTTGATTGCCATCTTTATGAATTCAGTT
    TTTAACTCATCTTCCATTAATTCCCAGCCGTTTAGCAATGAATACTTGAAATTTTTAATCTTCTCATAGT

I'm guessing awk or grep has the technology!
grep awk fasta sequence • 2.5k views
ADD COMMENT
1
Entering edit mode

Tip: awk and grep extract things, sed alters things. See Pierre's answer.

ADD REPLY
6
Entering edit mode
6.5 years ago

just sed:

 

sed 's/^>gi|\([0-9]*\)|.*/>\1/' < in.fasta
ADD COMMENT
1
Entering edit mode

Excellent! Well sed

ADD REPLY
0
Entering edit mode

How can I do it If I need both gi and gb numbers like ">gi|103058628|gb|DQ517338.1|"?

ADD REPLY

Login before adding your answer.

Traffic: 2891 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6