How to get 'chr' prefix hg38 reference genome
1
0
Entering edit mode
2.8 years ago
alan ▴ 10

I download several version of hg38 from gatk or ncbi. All of them are with 'chromosome N' prefix. I need to change it to 'chr' like 'chromosome 1' to 'chr1' . I know hg19 is having chr prefix but in my condition. I can't use hg19

HG38 human reference chr • 2.0k views
ADD COMMENT
0
Entering edit mode

Have you checked out the GENCODE reference files?

ADD REPLY
1
Entering edit mode
2.8 years ago
glarue ▴ 70

If I understand your issue correctly (and assuming you can use a Unix-like command line), this is a perfect use-case for sed:

~ $ cat example.fa 
>chromosome 1
seq1
>chromosome 2
seq2
>chromosome 3
seq3
~ $ sed 's/>chromosome\s/>chr/' example.fa 
>chr1
seq1
>chr2
seq2
>chr3
seq3
ADD COMMENT
0
Entering edit mode

Thanks for the answer. I downloaded the gatk hg38 reference. The header is 'CM000663.2 Homo sapiens chromosome 1,...' I need to change this to 'chr1' like the header in hg19. The issue is that I need to integrate my chipseq and hic data together but just found my chipseq used hg38 to align while my hic used hg19 for alignment. Right now, I need hg38 with 'chr' prefix for my hic alignment.

ADD REPLY
1
Entering edit mode

This is still easily done with sed (though I caution you strongly against adopting the "copy what someone else wrote online" research methodology - learning the basics of the common gnu utilities is essential to do this kind of work):

~ $ cat example.fa 
>CM000663.2 Homo sapiens chromosome 1, blah blah
seq1
>CM000663.2 Homo sapiens chromosome 2, blah blah
seq2
>CM000663.2 Homo sapiens chromosome 3, blah blah
seq3
~ $ sed -r 's/(>.+chromosome\s)([0-9]+)(.+)/>chr\2/' example.fa 
>chr1
seq1
>chr2
seq2
>chr3
seq3
ADD REPLY

Login before adding your answer.

Traffic: 3404 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6