Entering edit mode
7.5 years ago
Ron
★
1.2k
Hi all,
I am doing mutation analysis from RNAseq data and I have a VCF file. I want to get most frequently mutated genes(most number of mutations per gene) from the VCF file,and used snpEFF
count number of variants per gene from .vcf annovar&snpsift
However I dont think this is the output what I need.Please see the output below:
chr start end type IDs Reads:ALLMUT_FILTERED_Annotated
1 1 11873 Intergenic;DDX11L1 1 1
1 1 11873 Intergenic;DDX11L1 1 1
1 1 249250621 Chromosome;1 638541 638541
1 6874 11873 Upstream;NR_046018.2;;DDX11L1;Non-coding_transcript 1 1
1 6874 11873 Upstream;NR_046018.2;;DDX11L1;Non-coding_transcript 1 1
1 9362 14361 Downstream;NR_024540.1;;WASH7P;Non-coding_transcript 7 7
1 9362 14361 Downstream;NR_024540.1;;WASH7P;Non-coding_transcript 7 7
1 11874 14409 Gene;DDX11L1;Non-coding_transcript 6 6
1 11874 14409 Transcript;NR_046018.2;;DDX11L1;Non-coding_transcript 6 6
1 12369 17368 Downstream;NR_106918.1.3;;MIR6859-1.3;Non-coding_transcript 170 170
1 12369 17368 Downstream;NR_106918.1.3;;MIR6859-1.3;Non-coding_transcript 170 170
1 12369 17368 Downstream;NR_107062.1.2;;MIR6859-2.2;Non-coding_transcript 170 170
1 12369 17368 Downstream;NR_107062.1.2;;MIR6859-2.2;Non-coding_transcript 170 170
1 12722 13220 Intron;intron_2_RETAINED-RETAINED;NR_046018.2;;DDX11L1;Non-coding_transcript 1 1
1 13221 14409 Exon;exon_3_3_RETAINED;NR_046018.2;;DDX11L1;Non-coding_transcript 5 5
1 14362 14829 Exon;exon_11_11_RETAINED;NR_024540.1;;WASH7P;Non-coding_transcript 74 74
1 14362 29370 Transcript;NR_024540.1;;WASH7P;Non-coding_transcript 361 361
1 14362 29370 Gene;WASH7P;Non-coding_transcript 361 361
1 14410 19409 Downstream;NR_046018.2;;DDX11L1;Non-coding_transcript 233 233
1 14410 19409 Downstream;NR_046018.2;;DDX11L1;Non-coding_transcript 233 233
Let me know if there is any way to do that,or I can just the count the mutations per gene by writing a script?
Thanks,
Ron