Question: How to use MutSigCV correctly
0
gravatar for bioxujintian
3.5 years ago by
bioxujintian0 wrote:
I'm a beginer with MutSigCV, I had ran it with example files successfully.But I really don't konw how to produce the maf,coverage and covariates table from raw sequencing data,which I have 40 CRC  
whole-genome(cancer-normal),that want to detect mutation significant genes.Which software should I use step by step? and the pipeline to use MutSigCV?
software error • 5.7k views
ADD COMMENTlink modified 20 months ago by achristofferson0 • written 3.5 years ago by bioxujintian0
4
gravatar for poisonAlien
3.5 years ago by
poisonAlien2.8k
Asgard
poisonAlien2.8k wrote:

You don't really need coverage and covariates table (mutsig comes with some of these files, in case you don't have). But maf file is necessary. Read about maf specification here.

For mutsig, 9 fields are necessary.

Hugo_Symbol, Chromosome, Start_Position, End_Position, Reference_Allele, Tumor_Seq_Allele1, Tumor_Seq_Allele2, Variant_Classification, Tumor_Sample_Barcode

You can use this script to generate maf file.

ADD COMMENTlink modified 6 months ago by RamRS21k • written 3.5 years ago by poisonAlien2.8k

Thank you for your answer. I'm sorry to tell you I can't visit the script to generate maf file you provided.So I have two questions:

  1. How can I produce maf table from raw sequencing data(cancer-normal),should I use BWA,GATK and other software?Can you tell me the pipeline to produce the maf table.
  2. How can I get the coverage and covariates table?

Thanks again for your help.

ADD REPLYlink modified 6 months ago by RamRS21k • written 3.5 years ago by bioxujintian0

I assume that you have fastq files but nothing else ? How comfortable are you in using Unix ?

ADD REPLYlink written 3.5 years ago by poisonAlien2.8k

I only have the fastq files, and I use Linux to work. How should I do next?

ADD REPLYlink written 3.5 years ago by bioxujintian0

okay. Since you sound like you have just started dealing with ngs data , I would suggest you start with reading about some basic file formats that you will commonly encounter, like - fastq and sam format. You will also be using a lot of tools and some of them are essential (like samtools). 

Anyways first step for you to do is align your fastq files to reference genome. Now there many aligners but most commonly used one is bwa (at-least for WGS and WES). This will generate bam files, which you will use for detecting somatic variants. 

There is a great tools which does all this for you in 2 or 3 commands, - check out speedseq

ADD REPLYlink written 3.5 years ago by poisonAlien2.8k

Thank you for your reply, poisonAlien. I had call snp for another data set(51 cancer-normal samples) use BWA and GATK for each sample. 

Next step, should I use VEP to annotation?

 and then, how can I concatenate these file to a vcf and should I use vcf2maf?

ADD REPLYlink written 3.5 years ago by bioxujintian0

Hi, You need to give more info. As far as I know, GATK does not call somatic variants. Maybe try using more sophisticated somatic callers such as VarScan2 or MuTect. Then you annotate them using annovar or vep (I would suggest annovar since its simple and easy to use). After annotation, usual protocol is to remove those variants commonly found in general populations (such as those found in 1000 genome project). Once you do this, what you left with are candidate somatic variants, which you will use for MutSig.

ADD REPLYlink written 3.5 years ago by poisonAlien2.8k

Thanks for your reply again.I have 102 vcf file,which called snp for each sample,it isn't somatic snp,but for each-sample's snp.Can I use these file for MutSigCV after using Annovar? Maybe I should do some other work?

ADD REPLYlink written 3.5 years ago by bioxujintian0

there is no point is using these for mutsig. These are not somatic variants (present in cancer sample but not in matched normal). You need to identify somatic variants first.  

ADD REPLYlink written 3.5 years ago by poisonAlien2.8k

Thanks a lot,I learn so much from you these days. SpeedSeq is good tool to analysis sequencing data.Maybe now I will learn SpeedSeq and MuTect first and if I have question,I will ask you, thank you very much.

ADD REPLYlink written 3.5 years ago by bioxujintian0

Dear poisonAlien,

I take your advice to use SpeedSeq tool to call somatic snp,but when I use speedseq somatic command,I don't konw how to create tumor or normal bam file from raw WGS samples,respectively.Can you help me?Thanks...

ADD REPLYlink written 3.5 years ago by bioxujintian0

Hello,

I stumbled across this thread while searching for something similar.

So if I understood you correctly, after SNP annotation (I'm using Haplotype Caller), I'm left with a VCF file, from which I need to remove all the annotated SNPs. This leaves me with only un-annotated variants (which are the putative somatic variants present in the tumor samples). This will be my input into MutSig. Am I right? Appreciate your reply. Thanks!

ADD REPLYlink written 23 months ago by apuhegde20

HI, Alien
I have some difficulties in using MutSig. I am looking for solutions and find your answer here. I think you must be an expert in bioinformatics. Could you help me?
I don't have the coverage file, so I use the full coverage file provided by MutSig. I also follow the guide by using 6 arguments to run MutSig like this:

MutSigCV('F:LUSC.MutSigCV.input.data.v1.0\LUSC.maf', 'F:exome_full192.coverage.txt', 'F:LUSC.MutSigCV.input.data.v1.0\gene.covariates.txt', 'F:LUSC.MutSigCV.input.data.v1.0\output.txt', 'F:mutation_type_dictionary_file.txt', 'F:chr_files_hg19')

But the program still tells me it cannot finish the categ discovery:

NOTE: unable to perform category discovery, because no chr_files available.Will use two categories: missense and null+indel

though I include the chr files.
Do you know how to solve this problem?
Thank you very much!

ADD REPLYlink modified 6 months ago by RamRS21k • written 20 months ago by shiyang9350
0
gravatar for achristofferson
20 months ago by
United States
achristofferson0 wrote:

CovGen Can help with making a capture/target specific coverage table. The default coverage table that MutSig provides may not be appropriate for all cohorts.

ADD COMMENTlink written 20 months ago by achristofferson0

Thank you for the pointer to CovGen. I am working on the canine data and I was able to successfully generate the MutSigCV formatted coverage file using CovGen.

Do you have any directions to generate the gene.covariates.txt required by MutSig. Also, my data is WGS and pointers regarding this are well appreciated.

ADD REPLYlink written 10 months ago by sutturka140
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1775 users visited in the last hour