I try to call snps to learn how to do it. I got myt bam file, I hadn't made it. I have the snp, indel report from this bam on excel file, thats because I thought the most easy way for me to learn call snps and see if I do right.
I get this error message (see above)
As I understand chrM has different length as my reference chrM (hg19). These people when aligned the file used another chrM reference than mine? or is another reason why I get this error?Is a way to fix it?
I follow this tutorial What Is The Best Pipeline For Human Whole Exome Sequencing?
java -jar /home/cri/Desktop/GATK/GenomeAnalysisTK.jar -T RealignerTargetCreator -R hg19.fa -I 010.bam -o 010.intervals ##### ERROR MESSAGE: Input files reads and reference have incompatible contigs. Please see https://www.broadinstitute.org/gatk/guide/article?id=63for more information. Error details: Found contigs with the same name but different lengths or MD5s: ##### ERROR contig reads is named chrM with length 16569 ##### ERROR contig reference is named chrM with length 16571 and MD5 d2ed829b8a1628d16cbeee88e88e39eb. ##### ERROR reads contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM] ##### ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]
For UCSC hg38, "chrM" is the sequence that NCBI and Ensembl call "MT".
For UCSC hg19, a long time ago, had to pick a mitochondrial reference sequence as it wasn't part of the assembly release and UCSC picked a different mito genome not the Cambridge reference sequence. But they added the reference later. So in the updated hg19 UCSC genome files you have both "chrM" (for backwards compatibility) and "chrMT" (for newer results) See the discussion at https://genome.ucsc.edu/FAQ/FAQgenes.html#mito This is one good reason to download the sequence file "for analysis" (=alignment, etc) from https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/analysisSet/ rather than the complete/full/patched files at https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/latest/ the "latest" directory has fasta files with all and any sequences, patches, fixes, alternates, most of which you never need for alignments or most other analysis. The "latest" directory is rather made if you want the entire full sequence that the genome browser can show, including all the weird sequences that most people usually ignore or gloss over. :-)
If you are using your own reference files then why are you getting the length mismatch error? If you are not using your own copy of the hg19 that you used for the alignment then use that.
I dont use my own copy, I use this one instead because I have the results of snp calling already done, just for learn the steps and see if I do right. Isn't recommended?
Anyhow why this miss match in just chrM? Is any reason for that? (Just for know it)
Thanks again genomax!
As you discovered you can't do that (since there may be subtle differences in the data, as you discovered here). In NGS analyses copy of the genome (and ALL allied files e.g. indexes, annotation) have to be identical from start to finish. If you got them from the same source (e.g. iGenomes) then you could use the same files that someone else did.
If you want to use some other copy of genome/indexes/data then you need to do the entire set of alignment steps again :-)
oh...ok! Thanks for explanation...Didn't think all of this is sooooo tricky O_o
then I try with my bam file :)
Hi, I've got a similar issue, just posted it around here yesterday Variant annotation help - predictCoding error message
I'm not sure how can I best move forward - I don't have the fastq files for the data used to generate the vcf files so the option re-map and align this data is not valid. Is there a way I could alter those for the annotation database (txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene) ?
So to put this in context, I'm following the tutorial here to annotate my VCF files https://rockefelleruniversity.github.io/RU_GenomicVariants/presentations/singlepage/RU_GenomicVariant_session2.html#Predict_amino_acid_changes25
and once I reach the step to run predictCoding to predict the amino acid changes that happen when I run this command:
I get the following error message
I had initially subset chr1, so the VCF file should contain chr1 only so I'm finding it a bit odd that it says sequence chrM has incompatible seqlengths. (Instead of vcf_chr1 above, my file name is HCC_Tissue_041_chr1)
Any suggestions, please? Thank you!