Question: how do you do variant calling on tumor samples without a normal tissue sample
0
2.9 years ago by
b10hazard0 wrote:

I read a paper on somatic variant callers for tumor tissue and I have a few questions. The variant callers seem to work by subtracting the variants found in normal tissue from tumor tissue. Here is the article.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4803342/

My questions are...

Is there a more accurate way to do this? Is this currently the most accurate technique?

What if you only have a tumor sample and you don't have a normal tissue sample to subract from? Is variant calling on tumor tissue still possible?

snp next-gen • 1.5k views
modified 2.7 years ago by Biostar ♦♦ 20 • written 2.9 years ago by b10hazard0

The variant callers seem to work by subtracting the variants found in normal tissue from tumor tissue

Actually, most methods use something a bit more complex than a simple subtraction... The article you reference says in the introduction

Newer algorithms utilize advanced statistical methods for the complex task of detecting somatic events. Several somatic variant callers use a Bayesian approach [8,11–15], modified in different ways, while others uses a Fisher’s exact statistics [16,17]. Each somatic variant caller is built on its own mathematical algorithm,

I don't know if others agree, but in my (limited and anecdotal) experience, the concordance between callers varies a lot even between samples prepared in the same way and sequenced at the same depth. It seems to me that the concordance strongly depends on the number and frequency of mutations in the sample. I found that the concordance between mutect2 and strelka in SNV detection can be as high as ~80% (i.e. 80% of the union calls are found by both) for sample with high mutational load. But it can drop to few percent points in samples with very few mutations. So it's hard to draw conclusions from studies that consider only few samples prepared in different ways (exome or panel) and with different depth.

1
2.9 years ago by
Kevin Blighe66k
Kevin Blighe66k wrote:

Relating to the manuscript to which you linked, it is a similar story for other variant callers (samtools mpileup, GATK, etc) in the sense that they disagree on the calls that they make (and don't make). I'm not surprised, therefore, that somatic variant callers also differ. The authors particularly noted that sequencing depth had a major impact on variant calling, which is something that I also noted in my own analyses (greater depth is not necessarily better).

If you don't have a matched normal, then you can still call variants in your sample using the standard tools and then use available online resources for filtering out variants that are most likely not somatic (but this will not identify all of these such variants because most somatic variants will be private to the individual and thus not seen in any other individual), such as:

• dbSNP / ClinVar
• 1000 Genomes Phase III
• Exome Variant Server
• Greater Middle East

Remember, however, that if you are interested in how germline variants increase risk of cancer, then you don't want to be filtering these out just because they may be present in one of these databases listed above. These large-scale sequencing projects don't contain true healthy controls and each of these individuals that partook in these studies carry their own inherited risk factor alleles (even hg19 / GRCh37 is not a 'healthy' reference genome, and neither is hg38 - no true healthy genome exists).

You can also filter your data against cancer-specific data in order to increase likelihood that they are indeed somatic, such as:

• COSMIC
• TCGA
• ICGC

For both TCGA and ICGC, download the level 3 / open access data and build your own dataset for filtering.

Finally, you can annotate your variants with in silico prediction tools (don't just use one) in order to further refine the list to include only those likely to be pathogenic. You will find, however, that that majority of somatic variants are just 'passenger' variants and won't actually modify anything. It is the key 'driver' variants that you ought to find, which will make the genome or cell cycle (or something else) unstabe and that would have resulted in the cells becoming neoplastic.

Edit 20th June 2018: see this list for in silico pathogenicity / functional impact predictors: A: pathogenicity predictors of cancer mutations

Kevin