Question: Mutations Not Recognized in MuSiC
1
gravatar for Charles Warden
2.1 years ago by
Charles Warden4.9k
Duarte, CA
Charles Warden4.9k wrote:

Hi,

I am trying to use MuSiC to analyse mutation rates in novel, non-coding genes.  I am able to successfully run the relevant commands in MuSiC and the coverage statistics look correct, but the results show no mutations in any genes (which I know isn't true).  My guess is that there is probably some formatting issue with the .maf file containing somatic mutations, which is causing the output of the "bmr calc-bmr" to be inaccurate.

Here are the first few lines of my .maf file

#version 2.3
Hugo_Symbol    Entrez_Gene_Id    Center    NCBI_Build    Chromosome    Start_Position    End_Position    Strand    Variant_Classification    Variant_Type    Reference_Allele    Tumor_Seq_Allele1    Tumor_Seq_Allele2    dbSNP_RS    dbSNP_Val_Status    Tumor_Sample_Barcode    Matched_Norm_Sample_Barcode    Match_Norm_Seq_Allele1    Match_Norm_Seq_Allele2    Tumor_Validation_Allele1    Tumor_Validation_Allele2    Match_Norm_Validation_Allele1    Match_Norm_Validation_Allele2    Verification_Status    Validation_Status    Mutation_Status    Sequencing_Phase    Sequence_Source    Validation_Method    Score    BAM_File    Sequencer    Tumor_Sample_UUID    Matched_Norm_Sample_UUID
Unknown    0    genome.wustl.edu    GRCh37-lite    1    322115    322115    +    Targeted_Region    SNP    G    A    G    NA    NA    TCGA-E2-A15K    TCGA-E2-A15K    G    G    NA    NA    NA    NA    Unknown    Unknown    Somatic    PhaseI    WGS    No    NA    NA    Illumina    f289e8b7-68db-48b9-8dcc-1349269eb54b    c24945be-a051-4797-b7e6-09b32396f354
Unknown    0    genome.wustl.edu    GRCh37-lite    1    328193    328193    +    Targeted_Region    SNP    A    A    G    NA    NA    TCGA-E2-A15K    TCGA-E2-A15K    A    A    NA    NA    NA    NA    Unknown    Unknown    Somatic    PhaseI    WGS    No    NA    NA    Illumina    f289e8b7-68db-48b9-8dcc-1349269eb54b    c24945be-a051-4797-b7e6-09b32396f354
Unknown    0    genome.wustl.edu    GRCh37-lite    1    384901    384901    +    Targeted_Region    SNP    G    A    G    NA    NA    TCGA-E2-A15K    TCGA-E2-A15K    G    G    NA    NA    NA    NA    Unknown    Unknown    Somatic    PhaseI    WGS    No    NA    NA    Illumina    f289e8b7-68db-48b9-8dcc-1349269eb54b    c24945be-a051-4797-b7e6-09b32396f354
Unknown    0    genome.wustl.edu    GRCh37-lite    1    390657    390657    +    Targeted_Region    SNP    A    A    G    NA    NA    TCGA-E2-A15K    TCGA-E2-A15K    A    A    NA    NA    NA    NA    Unknown    Unknown    Somatic    PhaseI    WGS    No    NA    NA    Illumina    f289e8b7-68db-48b9-8dcc-1349269eb54b    c24945be-a051-4797-b7e6-09b32396f354
Unknown    0    genome.wustl.edu    GRCh37-lite    1    404577    404577    +    Targeted_Region    SNP    G    A    G    NA    NA    TCGA-E2-A15K    TCGA-E2-A15K    G    G    NA    NA    NA    NA    Unknown    Unknown    Somatic    PhaseI    WGS    No    NA    NA    Illumina    f289e8b7-68db-48b9-8dcc-1349269eb54b    c24945be-a051-4797-b7e6-09b32396f354

Here are the music commands that I am using:

     genome music bmr calc-covg --bam-list /path/to/bam.list --output-dir /path/to/output_folder --reference-sequence /path/to/GRCh37-lite.fa --roi-file /path/to/gene_coordinates.bed

     genome music bmr calc-bmr --bam-list /tcga/users/cdwarden/wgs/BRCA/MuSiC/bam.list --maf-file /path/to/somatic.maf --output-dir /path/to/output_folder --reference-sequence /path/to/GRCh37-lite.fa --roi-file /path/to/gene_coordinates.bed

     genome music smg --gene-mr-file /path/to/gene_mrs --output-file /path/to/smgs

I have also tried adding the transcript ID to the first mutation in the .maf file (so that I would expect to see one mutation in the "smgs_detailed" file), but that gene still is reported to have 0 mutations.

Can you please help me troubleshoot this issue?

Thanks,

Charles

music mutation dna-seq maf • 923 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Charles Warden4.9k

I think its because Hugo_Symbols are Unknown in your maf file. 

ADD REPLYlink written 2.1 years ago by poisonAlien2.4k

I changed the transcript ID for the first mutation to match the corresponding gene, and that gene was still reported to not have any mutations.  Also, I used "Unknown" (instead of NA, etc.) because that is what I thought the .maf format required for such genes.

Is there something else that should be changed besides "Unknown"?

ADD REPLYlink written 2.1 years ago by Charles Warden4.9k

I have used this programme a while back, and what I understand is, the gene names in maf file must match the gene names in your roi file, which you use for calc-covg function. Also, it will skip all those silent variants in Variant_Classification column ; unless you mention not skip so. In your example, I see that most of the varaints have Variant_Classification set to Unknown, which might be the one reason. 

ADD REPLYlink written 2.1 years ago by poisonAlien2.4k

This is correct. The Hugo_Symbol needs to be properly defined. These calls seem to be annotated incorrectly as Targeted_Region, which is something that MuSiC skips as intergenic. Considering that the MAF says WGS, these might be legitimately intergenic calls. Check in a genome browser.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Cyriac Kandoth4.6k

Yes - I want to characterize mutation rates in ncRNAs (most of which will not be covered in exome designs, and many of which are novel).

What would you recommend for the Variant_Classification and Variant_Type, in this situation?

ADD REPLYlink written 2.1 years ago by Charles Warden4.9k
1

You can refer to the documentation here. When you run music bmr calc-bmr, enable the option --noskip-non-coding. You'll still need to annotate each variant with a symbol that it can match back to a region in your ROI file. MAF format is not as detailed in distinguishing between ncRNA types. Variant_Classification will always say RNA. But name the genes differently using annotators like VEP, and you should be fine. Have you tried the maf2maf tool?

ADD REPLYlink written 2.1 years ago by Cyriac Kandoth4.6k

Thank you very much !!

ADD REPLYlink written 2.1 years ago by Chirag Nepal1.9k

This is also something i wonder how to prioritize such intergenic/intronic SNVs.

ADD REPLYlink written 2.1 years ago by Chirag Nepal1.9k
2
gravatar for Charles Warden
2.1 years ago by
Charles Warden4.9k
Duarte, CA
Charles Warden4.9k wrote:

Thanks to Cyriac, I found the solution is as follows:

1) Set Variant_Classification to RNA

2) Use the "--noskip-non-coding" option when running music bmr calc-bmr

ADD COMMENTlink written 2.1 years ago by Charles Warden4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1409 users visited in the last hour