Question: Mutation count in multi-sample vcf file
0
gravatar for sara_wasl
13 months ago by
sara_wasl0
sara_wasl0 wrote:

Hello all,

I have a one vcf file of 26 samples, what I need is to annotate this file and count the number of mutations for each gene for each sample of those 26.

I tried to use Annovar, but it seems that it doesn't support multi-sample vcf file.

Any suggestion or help, please?

count mutations annotate vcf • 698 views
ADD COMMENTlink modified 13 months ago by Pierre Lindenbaum124k • written 13 months ago by sara_wasl0
0
gravatar for Pierre Lindenbaum
13 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

I wrote groupbygene: http://lindenb.github.io/jvarkit/GroupByGene.html

your vcf must be annotated with VEP or SNPEFF.

$ curl -s -k "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" |\
java -jar dist/groupbygene.jar |\
head | column  -t
#chrom  min.POS    max.POS    gene.name  gene.type         samples.affected  count.variations  M10475  M10478  M10500  M128215
chr10   52004315   52004315   ASAH2      snpeff-gene-name  2                 1                 0       0       1       1
chr10   52004315   52004315   ASAH2      vep-gene-name     2                 1                 0       0       1       1
chr10   52497529   52497529   ASAH2B     snpeff-gene-name  2                 1                 0       1       1       0
chr10   52497529   52497529   ASAH2B     vep-gene-name     2                 1                 0       1       1       0
chr10   48003992   48003992   ASAH2C     snpeff-gene-name  3                 1                 1       1       1       0
chr10   48003992   48003992   ASAH2C     vep-gene-name     3                 1                 1       1       1       0
chr10   126678092  126678092  CTBP2      snpeff-gene-name  1                 1                 0       0       0       1
chr10   126678092  126678092  CTBP2      vep-gene-name     1                 1                 0       0       0       1
chr10   135336656  135369532  CYP2E1     snpeff-gene-name  3                 2                 0       2       1       1
ADD COMMENTlink modified 13 months ago • written 13 months ago by Pierre Lindenbaum124k

Just to make sure, here "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" it should be my vcf file after annotated by VEP or SnpEff, right?

ADD REPLYlink written 13 months ago by sara_wasl0
1

it should be my vcf file after annotated by VEP or SnpEff,

yes

ADD REPLYlink written 13 months ago by Pierre Lindenbaum124k

Could you please help me with the annotation, I'm new to that field. Are there clear steps for the annotation?

ADD REPLYlink written 13 months ago by sara_wasl0
1

see snpeff: http://snpeff.sourceforge.net/SnpEff_manual.html

ADD REPLYlink written 13 months ago by Pierre Lindenbaum124k

After annotating with SnpEff and running GroupByGene, I got this error:

[SEVERE][GroupByGene]Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:119)
    at com.github.lindenb.jvarkit.util.vcf.VCFUtils.parseHeader(VCFUtils.java:184)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.read(GroupByGene.java:379)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.doWork(GroupByGene.java:694)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1145)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1303)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.main(GroupByGene.java:710)
[INFO][Launcher]groupbygene Exited with failure (-1)

What's could be the problem?

ADD REPLYlink modified 13 months ago • written 13 months ago by sara_wasl0

Thanks for the help, I try to run GroupByGene using the same command and file just to see the output but it gives this error. Also when I tried my file (I was concerning if the problem from my file but that also happen for the provided file):

[SEVERE][GroupByGene]Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:119)
    at com.github.lindenb.jvarkit.util.vcf.VCFUtils.parseHeader(VCFUtils.java:184)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.read(GroupByGene.java:379)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.doWork(GroupByGene.java:694)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1145)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1303)
    at com.github.lindenb.jvarkit.tools.groupbygene.GroupByGene.main(GroupByGene.java:710)
[INFO][Launcher]groupbygene Exited with failure (-1)

Any suggestion or help, please?

ADD REPLYlink written 13 months ago by sara_wasl0

what was the input file ? how is it formatted ? what is the output of grep -m1 CHROM your_input.vcf ?

ADD REPLYlink written 13 months ago by Pierre Lindenbaum124k

The input file was the one after annotated using SnpEff, This is what I got for grep -m1 CHROM myfile.ann.vcf :

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  LIBRARY_NAME=S2000  2:LIBRARY_NAME=S2000    3:LIBRARY_NAME=S2000    4:LIBRARY_NAME=S2000    5:LIBRARY_NAME=S2000    6:LIBRARY_NAME=S2000    7:LIBRARY_NAME=S2000    8:LIBRARY_NAME=S2000    9:LIBRARY_NAME=S2000    10:LIBRARY_NAME=S2000   11:LIBRARY_NAME=S2000   12:LIBRARY_NAME=S2000   13:LIBRARY_NAME=S2000   14:LIBRARY_NAME=S2000   15:LIBRARY_NAME=S2000   16:LIBRARY_NAME=S2000   17:LIBRARY_NAME=S2000   18:LIBRARY_NAME=S2000   19:LIBRARY_NAME=S2000   20:LIBRARY_NAME=S2000   21:LIBRARY_NAME=S2000   22:LIBRARY_NAME=S2000   23:LIBRARY_NAME=S2000   24:LIBRARY_NAME=S2000   25:LIBRARY_NAME=S2000   26:LIBRARY_NAME=S2000
ADD REPLYlink modified 13 months ago • written 13 months ago by sara_wasl0
1

please post a fragment of your vcf, including the header in something like https://gist.github.com/ or whatever...

ADD REPLYlink modified 13 months ago • written 13 months ago by Pierre Lindenbaum124k

See here: /................./

ADD REPLYlink modified 13 months ago • written 13 months ago by sara_wasl0
1

this works on my machine:

 java -jar dist/groupbygene.jar your.vcf

#chrom  min.POS max.POS gene.name   gene.type   samples.affected    count.variations    10:LIBRARY_NAME=S2000   11:LIBRARY_NAME=S2000   12:LIBRARY_NAME=S2000   13:LIBRARY_NAME=S2000   14:LIBRARY_NAME=S2000   15:LIBRARY_NAME=S2000   16:LIBRARY_NAME=S2000   17:LIBRARY_NAME=S2000   18:LIBRARY_NAME=S2000   19:LIBRARY_NAME=S2000   20:LIBRARY_NAME=S2000   21:LIBRARY_NAME=S2000   22:LIBRARY_NAME=S2000   23:LIBRARY_NAME=S2000   24:LIBRARY_NAME=S2000   25:LIBRARY_NAME=S2000   26:LIBRARY_NAME=S2000   2:LIBRARY_NAME=S2000    3:LIBRARY_NAME=S2000    4:LIBRARY_NAME=S2000    5:LIBRARY_NAME=S2000    6:LIBRARY_NAME=S2000    7:LIBRARY_NAME=S2000    8:LIBRARY_NAME=S2000    9:LIBRARY_NAME=S2000    LIBRARY_NAME=S2000
1   10402   10440   DDX11L1 ann_gene_name   2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENSG00000223972 ann_gene_id 2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENSG00000227232 ann_gene_id 2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000423562 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000438504 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000450305 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000456328 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000488147 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000515242 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000518655 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000538476 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   ENST00000541675 ann_feature_id  2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
1   10402   10440   WASH7P  ann_gene_name   2   2   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0
ADD REPLYlink written 13 months ago by Pierre Lindenbaum124k

This is more logical, I'm thinking that the command you provided not correct:

file.vcf |\
java -jar dist/groupbygene.jar

This is what I have been tried and give me the error.

ADD REPLYlink written 13 months ago by sara_wasl0
file.vcf |\
java -jar dist/groupbygene.jar

I didn't wrote this.

ADD REPLYlink modified 13 months ago • written 13 months ago by Pierre Lindenbaum124k

Sorry but that what I understand from your post above with:

$ curl -s -k "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" |\
java -jar dist/groupbygene.jar
ADD REPLYlink written 13 months ago by sara_wasl0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 897 users visited in the last hour