Replace one number in the header of a fasta file
2
0
Entering edit mode
7.2 years ago
beacamara • 0

Hi everybody! I am pretty new in the bioinformatic field. I hope you can help me in this matter.

I am working with a fasta file with the following header format

>r2044088.2 |SOURCES={KEY=a0ea3476...,fw,2795082-2795162}|ERRORS={8:G,40:C}|SOURCE_1="FN869568.1 Halomonas elongata DSM 2581, complete genome" (a0ea3476c9d169a3045db6ff30c679db40e694f8)
TGCGGTCAGCGTCAAGTCGAGCAGCACGCCGCGCAGCACCCCATCCTTGTCGTCCAGCGACAGGT

>r2044089.2 |SOURCES={KEY=a0ea3476...,fw,2675676-2675756}|ERRORS={}|SOURCE_1="FN869568.1 Halomonas elongata DSM 2581, complete genome" (a0ea3476c9d169a3045db6ff30c679db40e694f8)
TACTTGGGAAGCGCTGGGAGCCAATGCAACCCCCATGGCATGGACTGAAGTCTACACCGCCCTCC

What I would like to do is to change slightly the header of each read entry in the following way using a python script:

>r2044088.1 |SOURCES={KEY=a0ea3476...,fw,2795082-2795162}|ERRORS={8:G,40:C}|SOURCE_1="FN869568.1 Halomonas elongata DSM 2581, complete genome" (a0ea3476c9d169a3045db6ff30c679db40e694f8)
TGCGGTCAGCGTCAAGTCGAGCAGCACGCCGCGCAGCACCCCATCCTTGTCGTCCAGCGACAGGT

>r2044089.1 |SOURCES={KEY=a0ea3476...,fw,2675676-2675756}|ERRORS={}|SOURCE_1="FN869568.1 Halomonas elongata DSM 2581, complete genome" (a0ea3476c9d169a3045db6ff30c679db40e694f8)
TACTTGGGAAGCGCTGGGAGCCAATGCAACCCCCATGGCATGGACTGAAGTCTACACCGCCCTCC

Any idea?

Thanks a lot in advance

Bea

sequence • 1.3k views
ADD COMMENT
1
Entering edit mode
7.2 years ago

That's hard to see what's changed. And wrapping a fasta record with " is weird.

It seems that you want change the version number , e.g., from r2044088.2 to r2044088.1.

Use seqkit:

seqkit replace -p "^(.+?)\.2" -r "\$1.1"   seqs.fa > new_seqs.fa
ADD COMMENT
0
Entering edit mode

Thank you so much shenwei356 for your quick reply. I tried to paste the read entries as they appear in my fasta file but I do not it did not show the symbol ">". That's why I introduced " at the beginning and at the end of the read entries.

As you have mentioned, what I am trying to do is to change the version number. Thank you very much. I will try with your suggestion

ADD REPLY
0
Entering edit mode

I've edited your post using code formatting to better display the fasta, with the 101010 button.

ADD REPLY
1
Entering edit mode
7.2 years ago
kloetzl ★ 1.1k
sed 's/^\(>r[0-9]*\.\)2/\11/' in.fasta > out.fasta
ADD COMMENT
0
Entering edit mode

Thank you so much. I will try also this option

ADD REPLY

Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6