Change chromosome notation to match a new reference
Entering edit mode
8.1 years ago
umn_bist ▴ 390

I have a bam file that I would like sorted karyotypically (not lexicographically) but my contigs are not matching the reference file provided by GATK. Getting the reference file that was originally used for alignment and realigning my sample are unavailable options.

My reference uses "1,2,3,...,X,Y,MT" notation but my bam file uses "chr1,chr2,chr3,...chrX,chrY,chrM" notation. Is there a way to remove the chr prefix and change chrM to MT in my bam file? Can I get by with just revising the header only without messing with the reads in the bam file? Thank you for your help!

RNA-Seq • 7.3k views
Entering edit mode
8.1 years ago

If all your bams are this way, it's probably easier to change your reference to match the bams. (That's just changing a few lines of a fasta and rebuilding an index or two).

If you do end up needing to reformat your bams, there is good advice in previous threads:

Entering edit mode

I had considered this but for some reason the GATK forum moderator did not recommend this option. If I were to change the reference instead, how would I ensure that its corresponding SNP.vcf will work properly. Can I just leave the snp.vcf alone?

Also is there a quick fix to this error as well - "Discordant contig lengths: read MT LN=16571, ref MT LN=16569". My rna-seq was aligned against Ensembl so the GATK reference is giving me a hard time.


Login before adding your answer.

Traffic: 2389 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6