BaseRecalibrator User error.
1
0
Entering edit mode
3 months ago

Hi everyone,

I am new to bioinformatics and I am struggling with GATK's somatic mutation variant calling pipeline.

I have completed most of the preprocessing steps: CreateSequenceDictionary, bwa index, bwa mem, and MarkDuplicatesSpark.

Yet, I've been struggling with a UserError on the BaseRecalibrator step.

For my known sites file, I have been using a C57/BL6 known sites vcf file I found on the Mouse Genome project website.

For the reference genome, I used the GRCm39 latest release.

My initial error with BaseRecalibrator was that my contigs were incompatible between reference and vcf file. I tried to solve this by using bcftools annotate --rename-chrs to alter the vcf files.

Yet, now I am getting a new error:

A USER ERROR has occurred: Input files reference and features have incompatible contigs: Found contigs with the same name but different lengths: contig reference = NC_000067.7 / 195154279 contig features = NC_000067.7 / 195471971.

At this point, I am not sure if I should just redo the analysis with an older version of the mouse reference genome, or if this error can be fixed. Any pointers?

GATK Mutect2 • 710 views
ADD COMMENT
0
Entering edit mode
3 months ago
GenoMax 148k

For my known sites file, I have been using a C57/BL6 known sites vcf file I found on the Mouse Genome project website.

Do you know what genome build this was based on?

For the reference genome, I used the GRCm39 latest release.

My initial error with BaseRecalibrator was that my contigs were incompatible between reference and vcf file. I tried to solve this by using bcftools annotate --rename-chrs to alter the vcf files.

Are you certain you are not mixing and matching genome builds. You can't do this. Results will be nonsense if you do.

Found contigs with the same name but different lengths:

This is more or less an indication that there is some sort of mismatch between the files you are using.

ADD COMMENT
0
Entering edit mode

I'm an idiot... I just checked, yes, the vcf file was for the GRCm38_68 from Sanger. That makes total sense. I think this was the issue. Thanks a lot!

ADD REPLY
0
Entering edit mode

I am running into the same issue, except with a balb/c reference genome that has contigs that don't match my known sites vcf file and my sequencing data contigs.

In this case, do you recommend that I use any particular tool to convert the contigs of my reference genome to match my sequencing data? Is this common practice?

If not, what is usually done when there is only one balb/c genome build version available, and my sequencing data contigs don't match.

ADD REPLY
0
Entering edit mode

I figured it out, to anybody who may be wondering in the future.

Use picard UpdateVcfSequenceDictionary with Input as your old vcf and -SD as your reference genome.

Then, index the newly generated vcf file and use this known sites file instead for your gatk analysis.

ADD REPLY

Login before adding your answer.

Traffic: 3419 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6