Question: Confusion with High Confidence SNP Calls (Overlap B/w RNA-seq & WGBS calls) and Genome Assemblies Used for Alignment and Annotation
gravatar for Dataminer
5.9 years ago by
Dataminer2.7k wrote:


Here is what I have, I have SNP calls from RNA-Seq (paired end) and SNP calls from WGBS performed on the same sample.

I have used GATK (Unified caller) for RNA-seq sample and for WGBS I had calls at 30x.

What, I have done, is made an overlap between the SNP calls from RNA-seq (Irrespective if it has PASS or Low or undetermined tag at its filter column) and from WGBS (Only with PASS tags), because if the call is present from both methods, it can be considered as high confidence call. The idea is to reduce the number of calls and get the high confidence calls. Let me know, if this is wrong approach

Secondly, RNA-Seq file for GATK was aligned using hg19 assembly from UCSC (provided by GATK) and after the VCF file generation I used SNPEff to annotate it. For SNPEff I was forced to use GRCh37.75. Is this change in builts will be a cause of concern? OR it is fine?

Thank you for your time




rna-seq snp wgbs snpeff • 2.6k views
ADD COMMENTlink modified 5.9 years ago by Devon Ryan96k • written 5.9 years ago by Dataminer2.7k

Just to clarify: SnpEff does not"force you to use GRCh37.75 at all. I provide pre-built databases for RefSeq (hg19), ENSEMBL (GRCh37.*) and KnownGenes (hg19kg). You can use whichever you reference genome you prefer.

Although some genes and transcript differ form hg19 to GRCh, the reference sequence is the same in all three cases. So it's perfectly OK to align to hg19 and annotate with GRCh.

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Pablo1.9k

Hi Pablo,

Thank's for pointing me to the hg19 database for SNPEff. Never meant to demean what SNPEff does, it is a wonderful annotation tool.


ADD REPLYlink written 5.9 years ago by Dataminer2.7k
gravatar for Devon Ryan
5.9 years ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:

Given how much bisulfite treatment can degrade DNA quality, I'd generally be hesitant to then try and use it for variant calling. Firstly, I would recommend that you filter fairly stringently. You'll also need to screen out apparent C/G sites that are T/A. In general, you might look into BisSNP for handling the WGBS data.

For SNPEff, just keep in mind that UCSC and Ensembl use different chromosome names. Perhaps SNPEff knows to convert things, which is good. You just need to ensure that the resulting VCF files have the same coordinate names.

ADD COMMENTlink written 5.9 years ago by Devon Ryan96k

Hi Devon, 

I have variant calls from both RNA-Seq and WGBS, what I was thinking since, WGBS has depth of 30x, I can use an overlap between the calls that have PASS tag in both RNA-Seq and WGBS method. This will enable me in getting high confidence SNP calls for my data. Or am I completely wrong in taking the overlap? I am not too experienced in this WGBS SNP call thing, a little help and guidiance will be deeply appreciated.


ADD REPLYlink written 5.9 years ago by Dataminer2.7k

The only concern is decreasing the false-positive rate on the WGBS dataset to a reasonable level before doing the overlap. If you overlap noise with noise, you get noise out.

ADD REPLYlink written 5.9 years ago by Devon Ryan96k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1091 users visited in the last hour