How to change the chromosome names of the head line of fasta file?
1
0
Entering edit mode
21 months ago
Dan ▴ 180

I want to add "chr" to the chromosome names of the head line of fasta file:

>1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF

to

>chr1 dna:chromosome chromosome:GRCm38:chr1:1:195471971:1 REF

I tried

cat Mus_musculus.GRCm38.dna.primary_assembly.fa | sed -e 's/^>\([0-9XY]\)/>chr\1/' -e 's/.*GRCm38:\([0-9XY]\):.*/chr\1/'

which can only change the first position, how should I change the second position?

Thanks

sed • 499 views
ADD COMMENT
3
Entering edit mode
21 months ago
ATpoint 81k

Since the patterns to replace are so unique you can literally just do the sledgehammer method:

awk '{gsub("^>",">chr");gsub(":GRCm38:",":GRCm38:chr");print}' your.fa
ADD COMMENT

Login before adding your answer.

Traffic: 1593 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6