How to fix mismatch between snp file and reference?
1
1
Entering edit mode
6.1 years ago
Sharon ▴ 600

Hi Everyone I am doing rnaseq variant calling on mouse, I am using the reference and indels and snps for mouse from here, the same source [ftp://ftp-mouse.sanger.ac.uk/]. indels file indels.dbSNP142 did not cause any issues with indel realigner, but snps snps.dbsnp142 file throws the following error with baserecabliration:

java -jar ${GATK}/GenomeAnalysisTK.jar \
    -T BaseRecalibrator \
    -R ${WHOLEGENOME} \
    -I ${WHERE}/${CURRENT}-realigned.bam \
    -knownSites ${DBSNP} \
    -o ${WHERE}/${CURRENT}.recal_data.table

ERROR MESSAGE: Input files snps.dbSNP142.vcf and reference have incompatible contigs. Error details: The contig order in snps.dbSNP142.vcf and reference is not the same; to fix this please see: (https://www.broadinstitute.org/gatk/guide/article?id=1328), which describes reordering contigs in BAM and VCF files.. ##### ERROR snps.dbSNP142.vcf contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, X, Y, MT] ##### ERROR reference contigs = [1, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2, 3, 4, 5, 6, 7, 8, 9, MT, X, Y, JH584295.1, JH584292.1, GL456368.1, GL456396.1, GL456359.1, GL456382.1, GL456392.1, GL456394.1, GL456390.1, GL456387.1, GL456381.1, GL456370.1, GL456372.1, GL456389.1, GL456378.1, GL456360.1, GL456385.1, GL456383.1, GL456213.1, GL456239.1, GL456367.1, GL456366.1, GL456393.1, GL456216.1, GL456379.1, JH584304.1, GL456212.1, JH584302.1, JH584303.1, GL456210.1, GL456219.1, JH584300.1, JH584298.1, JH584294.1, GL456354.1, JH584296.1, JH584297.1, GL456221.1, JH584293.1, GL456350.1, GL456211.1, JH584301.1, GL456233.1, JH584299.1]

I tried this too as in the link in the error,:

java -jar ${PICARD}/picard.jar SortVcf \
        I= ${DBSNP} \
        O= sorted.vcf \
        SEQUENCE_DICTIONARY= GRCm38_68.dict

But then i got:

Exception in thread "main" java.lang.IllegalArgumentException: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=X,length=171031299,dict_index=19,assembly=null) was found when SAMSequenceRecord(name=MT,length=16299,dict_index=19,assembly=null) was expected. at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:126) at picard.vcf.SortVcf.doWork(SortVcf.java:95) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:228) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104) Caused by: java.lang.AssertionError: SAM dictionaries are not the same: SAMSequenceRecord(name=X,length=171031299,dict_index=19,assembly=null) was found when SAMSequenceRecord(name=MT,length=16299,dict_index=19,assembly=null) was expected. at htsjdk.samtools.SAMSequenceDictionary.assertSameDictionary(SAMSequenceDictionary.java:170) at picard.vcf.SortVcf.collectFileReadersAndHeaders(SortVcf.java:124) ... 4 more make: * [sortvcf] Error 1 sortvc

Any hint?

Thanks

RNA-Seq • 2.6k views
ADD COMMENT
1
Entering edit mode

When you did the original alignment you did not include unplaced and unlocalized contigs in your reference. The solution you linked to is only applicable when the sort order is wrong but there are no mismatches. I suppose you could remove lines with the offending references from your SNP reference.

ADD REPLY
0
Entering edit mode

I did not understand this part <unplaced and="" unlocalized="">? And also you mean, I manually remove those extras stuff from the snp file? {JH584295.1, JH584292.1, GL456368.1, GL456396.1, GL456359.1, GL456382.1, GL456392.1, GL456394.1, GL456390.1, GL456387.1, GL456381.1, GL456370.1, GL456372.1, GL456389.1, GL456378.1, GL456360.1, GL456385.1, GL456383.1, GL456213.1, GL456239.1, GL456367.1, GL456366.1, GL456393.1, GL456216.1, GL456379.1, JH584304.1, GL456212.1, JH584302.1, JH584303.1, GL456210.1, GL456219.1, JH584300.1, JH584298.1, JH584294.1, GL456354.1, JH584296.1, JH584297.1, GL456221.1, JH584293.1, GL456350.1, GL456211.1, JH584301.1, GL456233.1, JH584299.1] Thanks

ADD REPLY
0
Entering edit mode

Like the human genome those GL* and JH* contigs are known to be present in the mouse genome but their precise location is not known. Did you delete the index file before running SortVcf?

ADD REPLY
0
Entering edit mode

No, I did not delete anything.

ADD REPLY
0
Entering edit mode

@Goutham says this in the post you linked above.

Note that you may need to delete the index file that gets created automatically for your new VCF by the Picard tool. GATK will automatically regenerate an index file for your VCF.

ADD REPLY
0
Entering edit mode

This is what I don't understand, which index they mean? The index I downloaded with the reference? The index of the snp file is deleted already.

ADD REPLY
1
Entering edit mode
6.1 years ago
ibelcarri ▴ 10

I had the exact same problem and this helped me. C: How to sort a VCF file lexicographically by chromosome number?

ADD COMMENT
0
Entering edit mode

Thanks ibelcarri, I just used your mentioned post, and it works now !

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6