Remove and Substitute Fasta Header
1
0
Entering edit mode
4.5 years ago
selplat21 ▴ 20

I have a concatenated fasta file for a series of genbank entries with different headers. I need to edit the fasta headers to all say "BCH" in place of the header up to and including the space after "Archilocus alexandri".

For example, DQ432746.1 Archilochus alexandri voucher B02923 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial

should be

BCH voucher B02923 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial

>DQ432746.1 Archilochus alexandri voucher B02923 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial
GGCTGGAATAGTTGGAACCTCTCTAAGCCTACTAATCCGAGCGGAACTCGGCCAGCCAGGCACCCTCCTA
GGGGACGACCAAATTTATAATGTGATCGTCACTGCCCATGCCTTCGTAATAATCTTCTTCATAGTTATAC
CAATTATAATCGGAGGCTTTGGAAACTGATTAGTTCCCCTCATAATTGGAGCCCCCGACATAGCATTCCC
ACGTATAAATAACATAAGCTTCTGACTCCTACCACCATCGTTCCTCTTACTCCTTGCTTCCTCTACTGTC
GAAGCAGGCGCAGGCACGGGATGAACTGTATACCCCCCTCTGGCCGGCAACTTAGCCCACGCAGGAGCAT
CAGTAGACCTAGCCATCTTCTCCTTGCACCTGTCAGGCATCTCATCAATCCTAGGAGCAATTAACTTCAT
TACCACCGCAATCAATATAAAACCACCCGCCCTATCTCAATACCAAACCCCCCTATTTGTTTGATCCGTC
CTCATTACTGCCGTCCTACTCCTTCTTTCACTCCCAGTACTTGCTGCCGGAATTACCATGCTACTCACAG
ACCGAAACC

>KX011590.1 Archilochus alexandri cytochrome oxidase subunit I (COI) gene, partial cds; mitochondrial
TAAGCCTACTAATCCGAGCAGAACTCGGCCAGCCAGGCACCCTCCTAGGGGACGACCAAATTTATAATGT
GATCGTCACTGCTCATGCCTTCGTAATAATCTTCTTCATAGTTATACCAATTATAATCGGAGGCTTTGGA
AACTGATTAGTCCCCCTCATAATTGGAGCCCCCGACATAGCATTCCCACGTATAAATAACATAAGCTTCT
GACTCCTACCACCATCGTTCCTCTTACTCCTTGCTTCCTCTACTATCGAAGCAGGCGCAGGCACGGGATG
AACTGTATACCCCCCTCTAGCCGGCAACTTAGCCCACGCAGGAGCATCAGTAGACCTAGCCATCTTCTCC
TTACACCTATCAGGCATCTCATCAATCCTAGGAGCAATTAACTTCATTACCACCGCAATCAATATAAAAC
CACCCGCCCTATCTCAATACCAAACCCCCCTATTTGTTTGATCCGTCCTCATTACTGCCGTCCTACTCCT
TCTTTCACTCCCAGTACTTGCTGCCGGAATTACCATGCTACTCACAGACCGAAACCTAAACACCACATTT
TTCGACCCCGCTGGAGGAGGAGACCCCATCCTCTATCAGCACTTATTCTGATTCTT

>KJ602592.1 Archilochus alexandri voucher LSUMZ_B-21848 ornithine decarboxylase gene, exons 6 through 8 and partial cds
GCGTGCAAAAGAACTTGACCTTGCCATTGTTGGAGTTAGGTGAGTTGATATCATCAAAATTAAGATTTCT
TTAAATGGTCTGCCTGACAATAGAGGAGTGTATGGTGACTTGAGTTTTGTACAGACTTCTTGATGAGTCT
GCCAAATAGCAACTGATGTTTTGTATCTTTGTAGTTTCCATGTTGGAAGTGGATGTACTGACCCTGAGAC
CTTTGTTCAAGCCATTTCTGATGCCCGCTGTGTGTTTGATATGGGAGTAAGTAGCTCTGCTCTGCTTTCT
CTGTTTCTGCTGCTCAGCTGATGTGGCAAAACTGACTCTTACATGTTTTAAAGCTAGCTAAGTTACTAAT
TTCATGTTGGAATTGTTGAGTCGTGATGGCTTATCTTGACCTGTTCTGCAAAACTCACTTCTATATGTAG
TTAAAATAATCAGCTCAAACTGAAGTGACTTGAACATGATGAATTAGCTCTGTTCCAATATTAATGAAAT
TACTTTGCATTACTTTTTAAGCAAAAATAATAACATAACTGCTTTCTTGACAGTATTGCTGTTAATCTCT
TCTCAGGCTGAACTTGGCTTCRACATGTGTCTGCTTGATATTG

Any help would be greatly appreciated!

fasta alignment • 797 views
ADD COMMENT
1
Entering edit mode

What have you tried? You've learned about regex?

ADD REPLY
1
Entering edit mode

This is great! Thanks so much!

ADD REPLY
1
Entering edit mode
4.5 years ago
Dave Carlson ★ 1.7k

How about:

sed 's/^>[A-Z0-9]\+.[0-9] Archilochus alexandri />BCH /' seq.fasta > new_seq.fasta

Using your examples, this will produce:

>BCH voucher B02923 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial
GGCTGGAATAGTTGGAACCTCTCTAAGCCTACTAATCCGAGCGGAACTCGGCCAGCCAGGCACCCTCCTA
GGGGACGACCAAATTTATAATGTGATCGTCACTGCCCATGCCTTCGTAATAATCTTCTTCATAGTTATAC
CAATTATAATCGGAGGCTTTGGAAACTGATTAGTTCCCCTCATAATTGGAGCCCCCGACATAGCATTCCC
ACGTATAAATAACATAAGCTTCTGACTCCTACCACCATCGTTCCTCTTACTCCTTGCTTCCTCTACTGTC
GAAGCAGGCGCAGGCACGGGATGAACTGTATACCCCCCTCTGGCCGGCAACTTAGCCCACGCAGGAGCAT
CAGTAGACCTAGCCATCTTCTCCTTGCACCTGTCAGGCATCTCATCAATCCTAGGAGCAATTAACTTCAT
TACCACCGCAATCAATATAAAACCACCCGCCCTATCTCAATACCAAACCCCCCTATTTGTTTGATCCGTC
CTCATTACTGCCGTCCTACTCCTTCTTTCACTCCCAGTACTTGCTGCCGGAATTACCATGCTACTCACAG
ACCGAAACC

>BCH cytochrome oxidase subunit I (COI) gene, partial cds; mitochondrial
TAAGCCTACTAATCCGAGCAGAACTCGGCCAGCCAGGCACCCTCCTAGGGGACGACCAAATTTATAATGT
GATCGTCACTGCTCATGCCTTCGTAATAATCTTCTTCATAGTTATACCAATTATAATCGGAGGCTTTGGA
AACTGATTAGTCCCCCTCATAATTGGAGCCCCCGACATAGCATTCCCACGTATAAATAACATAAGCTTCT
GACTCCTACCACCATCGTTCCTCTTACTCCTTGCTTCCTCTACTATCGAAGCAGGCGCAGGCACGGGATG
AACTGTATACCCCCCTCTAGCCGGCAACTTAGCCCACGCAGGAGCATCAGTAGACCTAGCCATCTTCTCC
TTACACCTATCAGGCATCTCATCAATCCTAGGAGCAATTAACTTCATTACCACCGCAATCAATATAAAAC
CACCCGCCCTATCTCAATACCAAACCCCCCTATTTGTTTGATCCGTCCTCATTACTGCCGTCCTACTCCT
TCTTTCACTCCCAGTACTTGCTGCCGGAATTACCATGCTACTCACAGACCGAAACCTAAACACCACATTT
TTCGACCCCGCTGGAGGAGGAGACCCCATCCTCTATCAGCACTTATTCTGATTCTT

>BCH voucher LSUMZ_B-21848 ornithine decarboxylase gene, exons 6 through 8 and partial cds
GCGTGCAAAAGAACTTGACCTTGCCATTGTTGGAGTTAGGTGAGTTGATATCATCAAAATTAAGATTTCT
TTAAATGGTCTGCCTGACAATAGAGGAGTGTATGGTGACTTGAGTTTTGTACAGACTTCTTGATGAGTCT
GCCAAATAGCAACTGATGTTTTGTATCTTTGTAGTTTCCATGTTGGAAGTGGATGTACTGACCCTGAGAC
CTTTGTTCAAGCCATTTCTGATGCCCGCTGTGTGTTTGATATGGGAGTAAGTAGCTCTGCTCTGCTTTCT
CTGTTTCTGCTGCTCAGCTGATGTGGCAAAACTGACTCTTACATGTTTTAAAGCTAGCTAAGTTACTAAT
TTCATGTTGGAATTGTTGAGTCGTGATGGCTTATCTTGACCTGTTCTGCAAAACTCACTTCTATATGTAG
TTAAAATAATCAGCTCAAACTGAAGTGACTTGAACATGATGAATTAGCTCTGTTCCAATATTAATGAAAT
TACTTTGCATTACTTTTTAAGCAAAAATAATAACATAACTGCTTTCTTGACAGTATTGCTGTTAATCTCT
TCTCAGGCTGAACTTGGCTTCRACATGTGTCTGCTTGATATTG
ADD COMMENT

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6