Question: allelic unbalance for multiplex PCR based amplicon data
0
gravatar for J.F.Jiang
2.3 years ago by
J.F.Jiang760
China
J.F.Jiang760 wrote:

Hi all,

We used multiplex PCR to enrich the target regions, and then get them sequenced on HiSeq platform.

For germline variants, it will be ideally that the ratio for ref allele against alt is around 0.5 for heterozygous variants.

However, in our data, we find that sometimes, this ratio is less than 0.1 according to GATK calling result.

I am wondering why this could be happen for germline variants?

And the most confusing thing is that we find calling results differently but slightly difference for this ratio, etc., 0.06 for homozygous but 0.07 for heterozygous.

It will be great if you can give me some suggestions.

pcr multiplex allelic amplicon • 611 views
ADD COMMENTlink modified 2.3 years ago by Kevin Blighe52k • written 2.3 years ago by J.F.Jiang760

Finding a solution to such a problem during genotype assignment and variant calling is not very easy as it depends on a variety of factors such as sequence and mapping errors that any variant calling software takes into consideration. Some of them can be taken care using methods mentioned by Kevin.. but further filtering can be also be done using genotype quality making it stricter(although haplotypecaller itself applies it by default). And most of these quality values are phred scaled likelihood values..so it's an estimation what the tool is making about the genotype again taking sequencing and mapping errors into consideration..and it's the best it can estimate based on the sequencing data..

You can also try gatk's genotype refinement tool to refine your assigned genotypes if you have a truth set for the kind of data you are exploring..

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by prasundutta87360

Thank,

Yes, I indeed find it is tough to find "truth" calling when the allelic unbalance came out.

GATK refinement workflow require a truth set, such as trio/pedegree data, as the prior knowledge to adjust the variant calling. However, our data is based on sporadic population, and the adjustment without any dataset makes even worse at sometimes.

We also applied 1KG dataset as the truth, and similar results were found.

So we believe refinement will not work fine if no trio/pedegree data are offerred.

ADD REPLYlink written 2.3 years ago by J.F.Jiang760
0
gravatar for Kevin Blighe
2.3 years ago by
Kevin Blighe52k
Kevin Blighe52k wrote:

This is a frequent problem in NGS data analysis, i.e., expecting a germline heterozygous variant at ~50% frequency but observing it at <20% frequency.

Just to be sure:

  • remove PCR duplicates from your aligned BAM file using Picard (http://broadinstitute.github.io/picard/)
  • ensure that only reads with high mapping quality (MAPQ), e.g. 50, are retained using samtools view -bq 50 Input.bam > out.bam
  • when using the GATK, use HaplotypeCaller, not UnifiedGenotyper

If you still have problems after that, I would be somewhat surprised.

ADD COMMENTlink written 2.3 years ago by Kevin Blighe52k

Thanks Kevin,

  • Since it is PCR enriched amplicon data, duplicates can not be removed
  • I did not remove the low mapping quality reads with samtools, but did it when variants calling using GATK with -mmq 30 to get confident calling
  • HaplotypeCaller take much longer time against UnifiedGenotyper, and GGA mode is not recommended at HC based on GATK forum threads.

Junfeng

ADD REPLYlink written 2.3 years ago by J.F.Jiang760
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 804 users visited in the last hour