Convert VCF to MAF from RNASeq mutation analysis
1
0
Entering edit mode
7.3 years ago
Ron ★ 1.2k

Hi all,

I am using this software for RNAseq mutation analysis: https://github.com/davidliwei/rnaseqmut

My final output file is a VCF file and I want to convert it to a MAF file.

I have come across these posts,however the output VCF from RNAseq is a bit different in this case.

Converting Vcf File To Maf

Vcf To Maf (Mutation Annotation Format) Conversion ?

chr1    877831  rs6672356;COSM4144217   T   C   1.0 Sample_6385_11.DP4=0,0,1,2;Sample_G1.DP4=0,0,1,0;Sample_G2.DP4=0,0,0,0;Sample_G3.DP4=0,0,0,0;Sample_G4.DP4=0,0,0,0;Sample_G5.DP4=0,0,0,1;Sample_NY_D_TA23_PDX_T1.DP4=0,0,0,0;Sample_NY_D_TA23_PR.DP4=0,0,0,0;Sample_Pa_PDX.DP4=0,0,0,0;Sample_PA_primary.DP4=0,0,1,0;Sample_TE_PDX.DP4=0,0,0,0;Sample_TE_primary.DP4=0,0,0,0;   ASP=true;GNO=true;HD=true;INT=true;KGPROD=true;KGPhase1=true;NSM=true;OTHERKG=true;REF=true;RS=6672356;RSPOS=877831;SAO=0;SLO=true;SSR=0;VC=SNV;VP=0x050100080a05000516000100;WGT=1;dbSNPBuildID=116;AA=p.W343R;CDS=c.1027T>C;CNT=23;GENE=SAMD11;SNP=true;STRAND=+;EFF=missense_variant(MODERATE|MISSENSE|Tgg/Cgg|p.Trp343Arg/c.1027T>C|681|SAMD11|protein_coding|CODING|ENST00000342066|10|1),missense_variant(MODERATE|MISSENSE|Tgg/Cgg|p.Trp250Arg/c.748T>C|588|SAMD11|protein_coding|CODING|ENST00000341065|8|1|WARNING_TRANSCRIPT_NO_START_CODON),missense_variant(MODERATE|MISSENSE|Tgg/Cgg|p.Trp169Arg/c.505T>C|540|SAMD11|protein_coding|CODING|ENST00000455979|4|1|WARNING_TRANSCRIPT_NO_START_CODON),downstream_gene_variant(MODIFIER||1753||749|NOC2L|protein_coding|CODING|ENST00000327044||1),downstream_gene_variant(MODIFIER||1753|||NOC2L|retained_intron|CODING|ENST00000483767||1),downstream_gene_variant(MODIFIER||1754|||NOC2L|retained_intron|CODING|ENST00000477976||1),downstream_gene_variant(MODIFIER||3160||178|SAMD11|protein_coding|CODING|ENST00000420190||1),downstream_gene_variant(MODIFIER||2868|||NOC2L|processed_transcript|CODING|ENST00000496938||1),downstream_gene_variant(MODIFIER||278|||SAMD11|processed_transcript|CODING|ENST00000478729||1),non_coding_exon_variant(MODIFIER|||n.286T>C||SAMD11|retained_intron|CODING|ENST00000464948|1|1),non_coding_exon_variant(MODIFIER|||n.389T>C||SAMD11|retained_intron|CODING|ENST0000474461|3|1),non_coding_exon_variant(MODIFIER|||n.191T>C||SAMD11|retained_intron|CODING|ENST00000466827|2|1)

chr1    878314  rs142558220;COSM426784  G   C   1.0 Sample_6385_11.DP4=5,3,5,4;Sample_G1.DP4=0,0,0,0;Sample_G2.DP4=0,0,0,0;Sample_G3.DP4=1,0,0,0;Sample_G4.DP4=1,0,0,0;Sample_G5.DP4=2,1,0,0;Sample_NY_D_TA23_PDX_T1.DP4=2,3,0,0;Sample_NY_D_TA23_PR.DP4=0,1,0,0;Sample_Pa_PDX.DP4=2,1,0,0;Sample_PA_primary.DP4=1,0,0,0;Sample_TE_PDX.DP4=0,0,0,0;Sample_TE_primary.DP4=0,0,0,0;ASP=true;INT=true;KGPROD=true;KGPhase1=true;OTHERKG=true;REF=true;RS=142558220;RSPOS=878314;SAO=0;SSR=0;SYN=true;VC=SNV;VP=0x050000080305100016000100;WGT=1;dbSNPBuildID=134;AA=p.G480G;CDS=c.1440G>C;CNT=2;GENE=SAMD11;SNP=true;STRAND=+;EFF=synonymous_variant(LOW|SILENT|ggG/ggC|p.Gly480Gly/c.1440G>C|681|SAMD11|protein_coding|CODING|ENST00000342066|11|1),synonymous_variant(LOW|SILENT|ggG/ggC|p.Gly387Gly/c.1161G>C|588|SAMD11|protein_coding|CODING|ENST00000341065|9|1|WARNING_TRANSCRIPT_NO_START_CODON),synonymous_variant(LOW|SILENT|ggG/ggC|p.Gly306Gly/c.918G>C|540|SAMD11|protein_coding|CODING|ENST00000455979|5|1|WARNING_TRANSCRIPT_NO_START_CODON),downstream_gene_variant(MODIFIER||1270||749|NOC2L|protein_coding|CODING|ENST00000327044||1),downstream_gene_variant(MODIFIER||1270|||NOC2L|retained_intron|CODING|ENST00000483767||1),downstream_gene_variant(MODIFIER||1271|||NOC2L|retained_intron|CODING|ENST00000477976||1),downstream_gene_variant(MODIFIER||3643||178|SAMD11|protein_coding|CODING|ENST00000420190||1),downstream_gene_variant(MODIFIER||2385|||NOC2L|processed_transcript|CODING|ENST00000496938||1),downstream_gene_variant(MODIFIER||761|||SAMD11|processed_transcript|CODING|ENST00000478729||1),downstream_gene_variant(MODIFIER||132|||SAMD11|retained_intron|CODING|ENST00000466827||1),downstream_gene_variant(MODIFIER||42|||SAMD11|retained_intron|CODING|ENST00000464948||1),non_coding_exon_variant(MODIFIER|||n.802G>C||SAMD11|retained_intron|CODING|ENST00000474461|4|1)

for example the annotation column has each Sample with 4 values namely reference allele reads and alternate allele reads Sample_6385_11.DP4=0,0,1,2

Any suggestions on how to get this data to a format like this?

 Chromosome Start_position  End_position    Strand  Variant_Classification  Variant_Type    Reference_Allele    Tumor_Seq_Allele1   Tumor_Seq_Allele2   dbSNP_RS    Tumor_Sample_Barcode    Matched_Norm_Sample_Barcode n_alt_count n_ref_count t_alt_count t_ref_count amino_acid_change_WU
X   47044502    47044502    +   Nonsense_Mutation   SNP G   G   T   novel   UTUC123_1       0   96  28  48  p.E667*
2   192701329   192701329   +   Missense_Mutation   SNP C   C   T   novel   UTUC123_1       0   81  18  49  p.V200M
5   112824048   112824048   +   In_Frame_Ins    INS -   #NAME?  #NAME?  novel   UTUC123_1       0   17  7   14  p.S22_nofs
11  62286810    62286810    +   Missense_Mutation   SNP T   T   C   novel   UTUC123_1       0   111 25  72  p.K5027E

Thanks, Ron

rna-seq mutation next-gen • 2.1k views
ADD COMMENT
1
Entering edit mode
7.3 years ago

try python script https://github.com/cbare/vcf2maf

ADD COMMENT

Login before adding your answer.

Traffic: 1955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6