Reformatting Fasta Files
1
0
Entering edit mode
5.5 years ago
Gritz122 • 0

Hi,

I am new to regular expressions and programming in general and I need help with reformatting FASTA files from

>AB000667.1.2469.3417   root;Eukaryota;Chordata;Actinopteri;Pleuronectiformes;Paralichthyidae;Paralichthys;Paralichthys olivaceus
CAAAGGCTTGGTCCTGACTTTACTGTCGACTCTAACTAGACTTACACATGCAAGTATCCG
CCCCCCTGTGAGAATGCCCATAACGCCCTGCTCGGGAACAAGGAGCTGGCATCAGGCACA
...

some of them have:

>AB002132.1._1.403

in the header

and I want to change it to:

>AccessionNumber.Version SpeciesName
nucleotide sequence
nucleotide sequence, continued
nucleotide sequence, continued
etc...

The requirments include:

  • The sequence name is just the Accession number followed by the version number
  • This is followed by a space and then the species name. There should be a space between the genus and species name

Thanks!

Regular expressions FASTA header • 1.8k views
ADD COMMENT
1
Entering edit mode
5.5 years ago
$ sed '/>/ s/\(.*\)\s.*;\(.*\)$/\1\2/g' test.fa 

>AB000667.1.2469.3417  Paralichthys olivaceus
CAAAGGCTTGGTCCTGACTTTACTGTCGACTCTAACTAGACTTACACATGCAAGTATCCG
CCCCCCTGTGAGAATGCCCATAACGCCCTGCTCGGGAACAAGGAGCTGGCATCAGGCACA

input:

$ cat test.fa 

>AB000667.1.2469.3417   root;Eukaryota;Chordata;Actinopteri;Pleuronectiformes;Paralichthyidae;Paralichthys;Paralichthys olivaceus
CAAAGGCTTGGTCCTGACTTTACTGTCGACTCTAACTAGACTTACACATGCAAGTATCCG
CCCCCCTGTGAGAATGCCCATAACGCCCTGCTCGGGAACAAGGAGCTGGCATCAGGCACA

More the test data is provided, better would be the solution to OP. Gritz122

ADD COMMENT
0
Entering edit mode

What would I have to change the regular expression to if I just need the

Genus_species

ADD REPLY
1
Entering edit mode
 $  sed '/>/ s/.*;\(.*\)$/>\1/g;s/ /_/g' test.fa

>Paralichthys_olivaceus
CAAAGGCTTGGTCCTGACTTTACTGTCGACTCTAACTAGACTTACACATGCAAGTATCCG
CCCCCCTGTGAGAATGCCCATAACGCCCTGCTCGGGAACAAGGAGCTGGCATCAGGCACA
ADD REPLY
0
Entering edit mode

Thanks so much, huge help!

ADD REPLY
0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6