Non-cancer somatic mutation calling
2
4
Entering edit mode
4.9 years ago
DVA ▴ 550

I'm having trouble selecting a somatic mutation software for non-cancer cases, and would really appreciate some suggestion here. Basically, we are comparing multiple tiny small chunk of healthy tissues to blood sample, which serves as germline. (Not going to talk about all the details here) However, I briefly looked into and tried the following software and none of them seems to be completely satisfying:

1. MuTect 2. It is too sensitive, because it considers "varying allelic fraction for each variant, as is often seen in tumors with purity less than 100%, multiple subclones, and/or copy number variation". (reference). It's not to say that my tissue sample is super pure, but instead of looking into the impurity, I rather discard them for now. ---Also, MuTect 2 is not working very well on heterogeneity changes. I observed similar situations as it is described in this post: http://gatkforums.broadinstitute.org/gatk/discussion/7619/how-does-mutect2-treat-heterozygous-mutant-to-homozygous-mutant

2. Samtools. It is quite flexible, especially if I just stop at the pileup step, and do all the filtration on my own. However, it is very very slow and generate huge resulting files.

3. VarScan 2. It seems to work, but it gives me way more mutation callings than I was expecting, even after the hard filtration. I do like the detailed output. Currently I haven't tried any other software for comparison yet, so not sure if the result is reliable.

4. GATK (to each sample individually). I know it is not designed for somatic mutation calling, but I tried it just because I'm familiar with it. (In fact, if I assume each of my sample is pure, then the traditional diploid assumption still apply and then GATK is okay to use I think.) It is also too sensitive. The outcome is completely not comparable to VarScan 2.

Any other software you would recommend for my case please? (I don't mind filter out bad or even not-so-good reads and positions at this moment. So I guess I care more about specificity than sensitivity. Also, I care about change of heterogeneity on single sites - a lot, since it is supposed to be common in healthy tissues) Thanks a lot in advance.

somatic mutations mutect varscan samtools gatk • 2.5k views
0
Entering edit mode

I am not really recommending samtools, but – if your only problem with samtools is the huge output file, you can use unix pipe like: samtools mpileup sample1.bam sample2.bam | post-filter.py.

0
Entering edit mode

Wow thanks so much for your reply - I really like your software!! Yes I will use piping if I decided to use samtools eventually, but what makes you think samtools is not recommended please?

4
Entering edit mode
4.9 years ago
Amitm ★ 2.1k

hi,

Have you tried testing the tools (+ parameters) you are using on the Genome in a Bottle data, esp. NA1278. Its not a 'somatic' scenario but the sample has been sequenced extensively using multiple platforms and a 'reference' set of variants have been documented. Details here and here.

More relevant probably are two publications that I am aware of, where generating a reference set of somatic calls (albeit in tumor samples) has been attempted - 1) Somatic ref. standard for cancer genome seq. 2) Assesment of somatic mut. detection in cancer using WGS

You can use tool guidelines of GIAB or above publications or (time notwithstanding) arrive at your own params. by looking yourself into above datasets.

Lastly, some points regarding my own experience with above tools - 1) VarScan2 is very customizable and works well but it can get spooked by noisy seq. data. If it is so than the var. allele read depth would be mostly very low than would be expected from read depth of your data.

2) Under default settings, GATK HaplotypeCaller & MuTect are way more sensitive than VarScan2. So your var calls should probably reflect that

3) If I have had run multiple callers on a sample then in addition to simple overlap of coordinates, I would also make plots for the read-depths of the pass calls. Very informative in inferring if things are going right.

0
Entering edit mode

Thank you for the very detailed reply. I really like your idea to make plots for read depths --- going to do it now! I also will look into the publications, GIAB, and NA1278. (The longer I stay in this field, the more to learn, which is exciting:)

My genome is mostly 25X-33X covered in my project, and I'm thinking to avoid calling variants for the regions not well covered. Do you think MuTect can allow me to apply a customized harsh filtering on read quality and coverage, before/after the core variant calling please?

0
Entering edit mode

hi, MuTect has a complex mixture of set filters and parameter options, but nonetheless gives a sensitive call set. The only parameter that I have played with is the downsampling setting (not a MuTect specific but GATK engine param.). Comparing with calls from other callers like VarScan and manual inspection on IGV, I have found that for moderately sequenced WES (60-100x), MuTect calls (from default settings) can be accepted as such for downstream analysis. The page here from Geraldine describes the filters in detail, in case you would want to alter them.

I use though the VariantFiltration utility when I have done single sample calling using GATK HaplotypeCaller. The filter param. here can accept any tag-value pair to create custom filters. This can potentially be used on MuTect VCFs as well.

Alternatively, a simple parser can be written that filters VCF based on DP (total depth), AD (allele-wise depth) and BQ (avg. base qual) values (in the last two cols.). Check the comment lines for more detail.

0
Entering edit mode

Thank you very much! I very much appreciate all the information.

2
Entering edit mode
4.9 years ago
Len Trigg ★ 1.5k

You can try the somatic caller from RTG Core (which is free for non-commercial use). It does have an option to include LOH as a possibility during it's Bayesian model, although for LOH detection you probably want to incorporate additional tools (in the simplest case, looking for clusters of LOH candidates).

0
Entering edit mode

Thank you for the suggestion - will take a look. Sorry I was not very clear in the question and just updated it. LOH on a range of genome is another story and I could do it once I obtain a VCF. However, I concern about change of heterogeneity of single sites, and they seem to be considered as sample impurity very often in MuTect2.