Question: Mouse Training Set For Variantrecalibrator
1
gravatar for Leandro Batista
7.0 years ago by
Paris
Leandro Batista100 wrote:

I am calling SNPs from mouse whole genome sequences by using GATK.

Right now I'm stuck on the Variant quality score Recalibration because I don't know what to use as a training set for mouse SNPs. Every example that I see concerns human genome analyses and people use both Hapmap and Omni data, in general.

Is there someone doing the same thing in mouse who might know a good training set for this species?

Thanks

mouse snp gatk • 2.5k views
ADD COMMENTlink modified 15 months ago by Biostar ♦♦ 20 • written 7.0 years ago by Leandro Batista100
1
gravatar for Zev.Kronenberg
7.0 years ago by
United States
Zev.Kronenberg11k wrote:

The Mouse Genome Institue has 18 strains in VCF you can use for variant quality score recalibration.

look under DATA Release:

http://www.sanger.ac.uk/resources/mouse/genomes/

ADD COMMENTlink written 7.0 years ago by Zev.Kronenberg11k
1

To overcome that, I was thinking to repeat the SNP calling on all the strains I'm using to compare. That way I would follow the exact same steps and parameters for each data.

ADD REPLYlink written 7.0 years ago by Leandro Batista100

I've already tried those VCF but it seems that they are an older version of VCF, VCF3, that is no longer supported by GATK. At least that is the error message. I also tried to convert them using vcftools but they are too big and it takes too long.

ADD REPLYlink written 7.0 years ago by Leandro Batista100

Welcome to the world of big data. Can I ask you what your end goal is? Perhaps there is another way. I have been working on calling variants in mouse tumors. There are a fair amount of mouse sequences in the short read archive, but if you don't want to convert the vcf you most certainly won't want to deal with raw reads.

ADD REPLYlink written 7.0 years ago by Zev.Kronenberg11k

There's no problem in converting this file. I just wanted to know if there was another way or another files. I just received the whole-exome sequence for 1 strain and our goal is to call variants, specially SNPs and compare it to some other strains completely sequenced in the Sanger's mouse project, as you mentioned. Actually the file is being converted right now.

ADD REPLYlink written 7.0 years ago by Leandro Batista100

Good deal. One thing I would watch out for is pipeline discrepancies. I found that you can get alot of false positives if you don't have control of the backgrounds (what your comparing your exome to)

ADD REPLYlink written 7.0 years ago by Zev.Kronenberg11k

Still considering this question of Training set, do you use the Sanger VCFs as truth sites or just training?

ADD REPLYlink written 7.0 years ago by Leandro Batista100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 714 users visited in the last hour