Hi,
I am relatively new at working with NGS data (i.e. I am familiar with the basic concepts of QC, alignment and variant calling, but have less experience with downstream analysis of the data).
I have hybrid-capture sequencing data of human cancer tumour cells and I would like to check whether the indels we are calling are "real" or whether they are caused by homopolymer regions (which I have heard can be a common problem).
My approach would be to identify the insertions/deletions in the VCF file and then manually match them against the reference genome to see whether there are polyA or polyT regions there.
Is this the correct approach? Or can someone suggest a better tactic.
Thanks in advance.
Thanks Wouter, that sounds like a good solution! I'll give it a go.