I am trying to run MutSIgCV and got stuck with this error:
MutSigCV allsamples.md.tc.ir.br.pr.ug.dbsnp.vep.maf \
"$anno"exome_full192.coverage.txt \
"$anno"gene.covariates.txt \
my_results \
"$anno"mutation_type_dictionary_file.txt \
"$anno"chr_files_hg19
======================================
MutSigCV
v1.4
(c) Mike Lawrence and Gaddy Getz
Broad Institute of MIT and Harvard
======================================
MutSigCV: PREPROCESS
--------------------
Loading mutation_file...
Error using MutSigCV>MutSig_preprocess (line 246)
MutSig is not applicable to single patients.\n
Error in MutSigCV (line 184)
I suspect this is because I did not create the maf file properly. A run-down of the experiment:
- variants were called using the GATK pipeline (unified genotyper)
- annotated with VEP
- vcf was converted to maf using the tool vcf2maf
I am not working with cancer data.
Questions:
- is the issue due to the maf?
- If yes, how can it be prepared to include patient information?
A few of the vcf (header removed, this is a test so only chr14 is present):
##INFO=
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Psio10_gDNA Psio11_gDNA Psio12_gDNA Psio1_gDNA Psio2_gDNA Psio3_gDNA Psio4_gDNA Psio5_gDNA Psio6_gDNA Psio7_gDNA Psio8_gDNA Psio9_gDNA
chr14 62290315 . G A 27.75 LowQual AC=4;AF=0.250;AN=16;BaseCounts=4,0,9,0;BaseQRankSum=-1.271;DP=13;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=3;MLEAF=0.188;MQ=50.39;MQ0=0;MQRankSum=0.480;QD=13.88;ReadPosRankSum=-1.733;CSQ=intron_variant&nc_transcript_variant|||ENSG00000258882|CTD-2277K2.1|ENST00000554138|||||lincRNA GT:AD:DP:GQ:PL 0/0:2,0:2:6:0,6,69 0/0:2,0:2:6:0,6,68 ./. ./. ./. ./. 1/1:0,1:1:3:32,3,0 0/0:1,0:1:3:0,3,32 1/1:0,1:1:3:33,3,0 0/0:1,1:2:3:0,3,33 0/0:2,0:2:3:0,3,33 0/0:1,0:1:3:0,3,33
chr14 62292953 . A T 10.92 LowQual AC=2;AF=1.00;AN=2;BaseCounts=0,0,0,1;DP=1;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;QD=10.92;CSQ=intron_variant&nc_transcript_variant|||ENSG00000258882|CTD-2277K2.1|ENST00000554138|||||lincRNA GT:AD:DP:GQ:PL ./. ./. 1/1:0,1:1:3:33,3,0 ./. ./. ./. ./. ./. ./. ./. ./. ./.
chr14 62293825 . G T 20.64 LowQual AC=2;AF=0.167;AN=12;BaseCounts=0,0,9,2;BaseQRankSum=-1.296;DP=11;Dels=0.00;FS=0.000;HaplotypeScore=0.1662;MLEAC=2;MLEAF=0.167;MQ=60.00;MQ0=0;MQRankSum=0.354;QD=4.13;ReadPosRankSum=0.354;CSQ=downstream_gene_variant|||ENSG00000258956|COX4I1P1|ENST00000554239|||||processed_pseudogene,intron_variant&nc_transcript_variant|||ENSG00000258882|CTD-2277K2.1|ENST00000554138|||||lincRNA GT:AB:AD:DP:GQ:PL ./. 0/0:.:2,0:2:6:0,6,68 ./. ./. ./. ./. 0/0:.:1,0:1:3:0,3,33 0/1:0.670:2,1:3:26:26,0,39 ./. 0/0:.:2,0:2:6:0,6,68 0/1:0.500:1,1:2:27:27,0,27 0/0:.:1,0:1:3:0,3,33
chr14 62294744 . G A 28.05 LowQual AC=3;AF=0.300;AN=10;BaseCounts=2,1,5,0;BaseQRankSum=-0.198;DP=8;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLEAC=3;MLEAF=0.300;MQ=57.05;MQ0=0;MQRankSum=0.198;QD=9.35;ReadPosRankSum=0.198;CSQ=downstream_gene_variant|||ENSG00000258956|COX4I1P1|ENST00000554239|||||processed_pseudogene,intron_variant&nc_transcript_variant|||ENSG00000258882|CTD-2277K2.1|ENST00000554138|||||lincRNA GT:AB:AD:DP:GQ:PL ./. ./. 1/1:.:0,1:1:3:33,3,0 ./. 0/0:.:1,0:1:3:0,3,33 0/1:0.500:1,1:2:26:26,0,26 ./. 0/0:.:1,0:1:3:0,3,32 0/0:.:2,0:2:3:0,3,45 ./. ./. ./.
chr14 62299005 . A C 144.99 . AC=1;AF=0.042;AN=24;BaseCounts=301,8,0,0;BaseQRankSum=-4.723;DP=309;Dels=0.00;FS=2.199;HaplotypeScore=0.9147;InbreedingCoeff=-0.0435;MLEAC=1;MLEAF=0.042;MQ=59.70;MQ0=0;MQRankSum=0.508;QD=7.63;ReadPosRankSum=-1.488;CSQ=non_coding_exon_variant&nc_transcript_variant|||ENSG00000258956|COX4I1P1|ENST00000554239|1/1||||processed_pseudogene,intron_variant&nc_transcript_variant|||ENSG00000258882|CTD-2277K2.1|ENST00000554138|||||lincRNA GT:AB:AD:DP:GQ:PL 0/0:.:26,0:26:78:0,78,892 0/0:.:29,0:29:81:0,81,927 0/0:.:17,0:17:45:0,45,528 0/0:.:28,0:28:75:0,75,861 0/0:.:25,0:25:69:0,69,777 0/0:.:35,0:35:99:0,102,1127 0/0:.:38,0:38:99:0,102,1149 0/1:0.580:11,8:19:99:180,0,294 0/0:.:24,0:24:66:0,66,764 0/0:.:33,0:33:93:0,93,1065 0/0:.:16,0:16:45:0,45,502 0/0:.:19,0:19:54:0,54,603
Thanks for the reply Cyriac. I was going to post your email an an answer tomorrow, but since you have done it I can't give you the rep of an accepted answer :)
One thing that I noticed while doing the conversion: For the purposes of running MutSigCV, vcf2maf needs to be run with the option
--input-vcf, and vep
(annotations version 74) must in the path. I was running a previously annotated vcf (using--input-vep
) and the annotations were lost during the conversion.As for Music, I will definitely give a go.
Thanks for the upvote! VEP can be run with many different options, and I haven't tested vcf2maf with all the different combinations that you can feed into
--input-vep
. It is always safest to let vcf2maf run VEP itself, with the runtime parameters it uses.Hello Cyriac! Thank you so much for your work and such detailed explanation, another upvote;) I am not sure if that's correct to ask here but it's related to the topic and particularly to the @fridaymeetssunday last comment. I am also trying to convert my vcf's to maf for MutSigCV. And the problem is: my data are in vcf 4.1 format and apparently VEP only accept vcf 4.0. Didn't you come across such cases? I was trying different combinations like annotating with snpEff both via vcf2maf and separately but the annotation info is always lost in the conversion. And the only thing I could do with VEP is to input data in VEP default format but then the output is not vcf either... Thank you!
Can you post a new question on Biostars with the problematic VCF line? There is not much different between VCF 4.0 and 4.1, except maybe in the support for structural variants. There's only so much VEP can do with structural variants.
Hello, can I annotate my vcf with VEP by hand first then vcf2maf skip annotation? Because I can run VEP annotate, but
vcf2maf.pl
will fail at running VEP annotation stage. Maybe the reason is using different perl but I was under same env andPERL5LIB
.The above code has helped me a lot and it works, in a typical vcf file in the last column of the header TUMOUR and NORMAL are written. before running vep make sure to rename the headers according to the sample id. then while running the vcf2maf use --tumor-id " "--normal-id " " --vcf-tumor-id " " --vcf-normal-id " ".
then use the last two commands given by Cyriac Kandoth for concatenation and it will work.