Question

How do you validate called SNPs from NGS data?

2

Entering edit mode

9.5 years ago

mattbawn ▴ 60

I am new to bioinformatics in general and have the following situation:

I have just got 4 samples sequenced by Whole Exome Sequencing using Macrogen. I have received called variants from their bioinformatics pipeline as well as putting the generated fastq data from each sample through my own pipeline. I am looking to find a novel disease causing mutation in chromosome 2.

Amongst both pipelines a mutation a potentially interesting gene is called. However, the chromosomal coordinates are different. I understand that this is a somewhat probable situation but is there a way that I might infer that one location is more likely than the other?

I used GATK for variant calling and was thinking of using their Variant Quality Score Recalibration (VQSR) algorithms, but as this I believe, depends on previously determined SNPs I think is would bias against novel mutations.

Any ideas or suggestions would be appreciated.

sequencing SNP • 2.8k views

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by mattbawn ▴ 60

Ram · Accepted Answer · 2014-10-20

3

Entering edit mode

9.5 years ago

Devon Ryan 104k

You'll want to use the VQSR, since it'll decrease the false-positive rate. It's not so much that this biases against novel calls, but rather it uses the information gleaned from known sites to better gauge what's required for a real call.

For actual validation, you want to use an orthogonal technology (e.g., Sanger sequencing would suffice for a single gene).

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.5 years ago by Devon Ryan 104k