Should all the HG19 reference be the same?
1
1
Entering edit mode
9.4 years ago
jz6002 ▴ 10

I got some bam files generated by using hg19 reference. Then I started to apply GATK pipeline to these files. But when I use the Picard to reorder the file, I got the error as

Exception in thread "main" net.sf.picard.PicardException: Discordant contig lengths: read chrM LN=16569, ref chrM LN=16571

I guess this problem is caused by using a different reference genome. I know hg19 reference has different versions (such as 1000 genome or ucsc), but I googled online and got the information that there is no difference between different versions.

So I am confused. Does anyone know what's the problem?

Thank you

sequencing • 5.1k views
ADD COMMENT
4
Entering edit mode
9.4 years ago
Denise CS ★ 5.2k

There was a discordance regarding the sequence of the mtDNA genome depending where you got the sequence from.When it comes to GRCh37, UCSC was working with NC_001807, which is 16,571 bp long whereas Ensembl was working with the Cambridge Reference Sequence of the mtDNA genome, i.e. NC_012920, which is 16,569 bp long.

ADD COMMENT
0
Entering edit mode

Thank you. Do you know where can I download the reference file based on Cambridge Reference Sequence for GATK? It seems on the GATK ftp, there is only one hg19 reference file.

ADD REPLY
0
Entering edit mode

You can download it from the Ensembl FTP site where the FASTA DNA sequence is available. You will have the sequences unmasked, soft masked (sm) and hard masked (rm). More details can be found in the README file. Good luck with your analyses

ADD REPLY
0
Entering edit mode

Thank you very much. But I notice that these files are GRCh37. So we are talking the difference between GRCh37 and hg19.

Ok, I get it.

I guess it was the analysis software fooled me. The software I used to get the bam file is called ION Torrent. We selected the hg19 as reference and get the bam file. Then , I tried to use GATK pipeline to get the variants from the bam file. This problem came up and I thought it shouldn't be like this because I also used hg19 as reference.

So it turned out to be the ION Torrent was actually using GRCh37 as reference. Am I correct?

Thank you.

ADD REPLY
1
Entering edit mode

In short, yes. The "hg19" on Ion Torrent uses the MT sequence in GRCh37 but it follows naming conventions from UCSC hg19 so that it was named chrM. (Chromosomes in UCSC hg19 and also in Ion Torrent hg19 are named chr1, chr2, ..., chrX, chrY, chrM, whereas in GRCh37 they are named 1, 2, ..., X, Y, MT). Except chrM, the sequences for all other chromosomes are the same. Also the Ion Torrent hg19 removed all the unlocalized/unplaced/alternate sequences ("ChrUn*, etc.) from UCSC hg19.

ADD REPLY
0
Entering edit mode

Thank you so much.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Sorry, I am now confused. I thought we were talking about the difference in different versions of hg19. Your link is talking about the difference between GRCh37 and hg19. So the Ensembl version is GRCh37, not a different version of hg19?

ADD REPLY
0
Entering edit mode

Ah yes, sorry, I thought to wanted to compare different versions of the same build.

ADD REPLY

Login before adding your answer.

Traffic: 3383 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6