Music Analysis Workflow
1
9
Entering edit mode
8.7 years ago
Pascal ▴ 250

Hello there,

I am pretty new to the cancer field and now try to run MuSiC for WashU to detect significantly mutated genes in our exome-seq samples. Since I am not sure if I am doing the right things I would appreciate if someone can take a look at what tools I am using. Probable there is a lot room for improvements. So here are the steps I am running to get significantly mutated genes:

1. FastQ files
2. BWA map to reference genome
3. Mark duplicates by picard tools
4. Use GATK to InDel realign and base recalibration
5. samtools mpileup for each normal and tumor sample
6. Somatic variation calling. I use VarScan somatic for this.
7. snpEff to annotate the mutations and their consequences
8. Convert VCF file from previous step to MAF file and filter for consequences that very likely change the gene function (e.g. missense)
9. Combine filtered MAF files from all samples
10. Run MuSiC bmr calc-covg
11. Run MuSiC bmr calc-bmr
12. Run MuSiC smg

Especially step 8 and 9 seem to be tricky. I am not sure if I am missing some straight forward solution from calling mutations to MAF files.

Thanks

cancer exome-sequencing music snp • 4.2k views
5
Entering edit mode
8.7 years ago

Looks good. In step 8, do not "filter for consequences that very likely change the gene function". That's too stringent, and you might miss something novel or non-coding. Rather, reduce the noise from false-positive variants using tools like this. MuSiC's calc-bmr step will exclude Silent (synonymous) SNVs by default. Steps 7 and 8 can be solved using this script. And step 9 shouldn't be tricky... simply concatenate the MAFs.

If any of the resulting SMGs (significantly mutated genes) don't make sense, then take a closer look at their variants. This is a good way to weed out recurrent false-positives - usually germline calls that are incorrectly called somatic for reasons like amplification bias, or artifacts in the reference sequence like misplaced paralogs. You can also try calc-bmr with an option called --separate-truncations, which prioritizes truncating variants in the math. "Truncations" include frame-shift, nonsense, and splice-site mutations.

0
Entering edit mode

Hi Cyriac,

Sorry for asking this as a comment. I have data from 50 patients, all are from targeted capture of around 120 genes. My question is.. is it ok to use MuSic for targeted capture data ? I guess its would be biased to study smg's from targeted capture, just wanted an opinion on this.

Thank you.

1
Entering edit mode

Yea, that's totally fine. MuSiC's SMG test was meant to shortlist genes significantly altered in exome-seq, so that you could then target them for capture on larger cohorts. But when your ROI file (regions of interest) lists only about 120 genes, then the SMG test will at least help you rank them in order of significance.

0
Entering edit mode

Hi Cyriac,

I have mutation calls on both, set of matched tumor-normal pairs and tumor-only samples (using panel of normal approach). I doubt if music2 can be used for calling mutational significance for tumor-only samples but if you have any suggestions to do so otherwise or use comparable tools, like oncodriveFM or others, that would be of help.

Thanks,
Samir