How to tell whether indel calls in NGS are artefacts caused by homopolymer (e.g. poly-A) regions
2
0
Entering edit mode
8.7 years ago
CMM • 0

Hi,

I am relatively new at working with NGS data (i.e. I am familiar with the basic concepts of QC, alignment and variant calling, but have less experience with downstream analysis of the data).

I have hybrid-capture sequencing data of human cancer tumour cells and I would like to check whether the indels we are calling are "real" or whether they are caused by homopolymer regions (which I have heard can be a common problem).

My approach would be to identify the insertions/deletions in the VCF file and then manually match them against the reference genome to see whether there are polyA or polyT regions there.

Is this the correct approach? Or can someone suggest a better tactic.

Thanks in advance.

sequencing alignment next-gen Assembly • 4.0k views
ADD COMMENT
1
Entering edit mode
8.7 years ago

We annotate our variant files with repeat tracks, including homopolymers, and (most often) discard variants in the homopolymers (since they have a high likelihood of being false positive). You can do this with snpeff (http://snpeff.sourceforge.net/SnpEff_manual.html, under the title "adding your own annotations")

ADD COMMENT
0
Entering edit mode

Thanks Wouter, that sounds like a good solution! I'll give it a go.

ADD REPLY
0
Entering edit mode
8.7 years ago

I just came across this blog post as another potential solution: http://apol1.blogspot.be/2014/05/how-do-i-identify-all-homopolymer-run.html

Surprisingly fast!

ADD COMMENT

Login before adding your answer.

Traffic: 4131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6