I download several version of hg38 from gatk or ncbi. All of them are with 'chromosome N' prefix. I need to change it to 'chr' like 'chromosome 1' to 'chr1' . I know hg19 is having chr prefix but in my condition. I can't use hg19
I download several version of hg38 from gatk or ncbi. All of them are with 'chromosome N' prefix. I need to change it to 'chr' like 'chromosome 1' to 'chr1' . I know hg19 is having chr prefix but in my condition. I can't use hg19
If I understand your issue correctly (and assuming you can use a Unix-like command line), this is a perfect use-case for sed
:
~ $ cat example.fa
>chromosome 1
seq1
>chromosome 2
seq2
>chromosome 3
seq3
~ $ sed 's/>chromosome\s/>chr/' example.fa
>chr1
seq1
>chr2
seq2
>chr3
seq3
Thanks for the answer. I downloaded the gatk hg38 reference. The header is 'CM000663.2 Homo sapiens chromosome 1,...' I need to change this to 'chr1' like the header in hg19. The issue is that I need to integrate my chipseq and hic data together but just found my chipseq used hg38 to align while my hic used hg19 for alignment. Right now, I need hg38 with 'chr' prefix for my hic alignment.
This is still easily done with sed
(though I caution you strongly against adopting the "copy what someone else wrote online" research methodology - learning the basics of the common gnu utilities is essential to do this kind of work):
~ $ cat example.fa
>CM000663.2 Homo sapiens chromosome 1, blah blah
seq1
>CM000663.2 Homo sapiens chromosome 2, blah blah
seq2
>CM000663.2 Homo sapiens chromosome 3, blah blah
seq3
~ $ sed -r 's/(>.+chromosome\s)([0-9]+)(.+)/>chr\2/' example.fa
>chr1
seq1
>chr2
seq2
>chr3
seq3
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Have you checked out the GENCODE reference files?