Question: Change chromosome notation to match a new reference
gravatar for umn_bist
4.3 years ago by
umn_bist370 wrote:

I have a bam file that I would like sorted karyotypically (not lexicographically) but my contigs are not matching the reference file provided by GATK. Getting the reference file that was originally used for alignment and realigning my sample are unavailable options.

My reference uses "1,2,3,...,X,Y,MT" notation but my bam file uses "chr1,chr2,chr3,...chrX,chrY,chrM" notation. Is there a way to remove the chr prefix and change chrM to MT in my bam file? Can I get by with just revising the header only without messing with the reads in the bam file? Thank you for your help!

rna-seq • 3.8k views
ADD COMMENTlink modified 4.3 years ago by Chris Miller21k • written 4.3 years ago by umn_bist370
gravatar for Chris Miller
4.3 years ago by
Chris Miller21k
Washington University in St. Louis, MO
Chris Miller21k wrote:

If all your bams are this way, it's probably easier to change your reference to match the bams. (That's just changing a few lines of a fasta and rebuilding an index or two).

If you do end up needing to reformat your bams, there is good advice in previous threads:

ADD COMMENTlink modified 4 months ago by RamRS27k • written 4.3 years ago by Chris Miller21k

I had considered this but for some reason the GATK forum moderator did not recommend this option. If I were to change the reference instead, how would I ensure that its corresponding SNP.vcf will work properly. Can I just leave the snp.vcf alone?

Also is there a quick fix to this error as well - "Discordant contig lengths: read MT LN=16571, ref MT LN=16569". My rna-seq was aligned against Ensembl so the GATK reference is giving me a hard time.

ADD REPLYlink modified 4 months ago by RamRS27k • written 4.3 years ago by umn_bist370
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1413 users visited in the last hour