Question: extracting depth info from vcf
0
gravatar for victoria_aleks
11 months ago by
victoria_aleks30 wrote:

Dear all, I need to know the "coverage" of sites in vcf files, but the files i am working with do not contain DP info, only AD. Most of the tools for vcf parcing use DP parameter. I understand that I can use AD to estimate depth, too. Is there any known ways of how to? Will be happy to any help:)

depth ad vcf • 844 views
ADD COMMENTlink modified 10 months ago • written 11 months ago by victoria_aleks30
0
gravatar for Pierre Lindenbaum
11 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

Use gatk variantAnnotator https://software.broadinstitute.org/gatk/gatkdocs/3.8-0/org_broadinstitute_gatk_tools_walkers_annotator_VariantAnnotator.php with Coverage https://software.broadinstitute.org/gatk/gatkdocs/3.8-0/org_broadinstitute_gatk_tools_walkers_annotator_Coverage.php

 ls *.bam > bam.list
 java -jar GenomeAnalysisTK.jar \
   -R reference.fasta \
   -T VariantAnnotator \
   -I bam.list \
   -V input.vcf \
   -o output.vcf \
   -A Coverage \
   -L input.vcf
ADD COMMENTlink written 11 months ago by Pierre Lindenbaum112k

and what if i dont have neither bams nor reference fasta? I only downloaded vcf files from the source

ADD REPLYlink written 11 months ago by victoria_aleks30

and what if i dont have neither bams

you cannot get the depth without the bams...

nor reference fasta?

a reference sequence is always needed to run gatk. https://gatkforums.broadinstitute.org/gatk/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference

ADD REPLYlink written 11 months ago by Pierre Lindenbaum112k

ah, sorry, now I see why you're asking: you already have AD...

ADD REPLYlink written 11 months ago by Pierre Lindenbaum112k

Ok here is a solution using vcfilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

First I use awk to insert the new VCF header for DP . Then I loop over the genotypes and I create a new variant.

 curl -s "http://www.icbi.at/svnsimplex/CommonBioCommands/tags/simplex-1.0/CommonBioCommands/testdata/vcf/AMS1_converted_filtered_short_chr1.vcf" |\
 awk '/^#CHROM/ {printf("##FORMAT=<ID=DP,Number=1,Type=Integer,Description=\"\">\n");} {print;}' | \
java -jar dist/vcffilterjdk.jar  -e 'List<Genotype> gl = variant.getGenotypes().stream().map(G->{int ad[]=G.getAD();if(ad==null || ad.length==0) return G; int c= Arrays.stream(ad).sum(); return new GenotypeBuilder(G).DP(c).make();}).collect(Collectors.toList());return new VariantContextBuilder(variant).genotypes(gl).make();'
ADD REPLYlink written 11 months ago by Pierre Lindenbaum112k

thank you very much. unfortunately vcffilterjdk constantly gives errors, in my hands :) Is it possible just write a home-made script to extract AD values for each line in vcf-file? I read that its better not to mess with vcf parcing and rather use the ready software. but right now it seems easier just code something by our own...

ADD REPLYlink written 10 months ago by victoria_aleks30

unfortunately vcffilterjdk constantly gives errors, in my hands

ADD REPLYlink written 10 months ago by Pierre Lindenbaum112k

:) basically my question was: does it worth it to write a script myself or better not to mess? anyway, the error is: 2 errors

[SEVERE][InMemoryCompiler]compile.call() failed for custom class VcfFilterJdkCustom887993132 java.lang.RuntimeException: compile.call() failed for custom class VcfFilterJdkCustom887993132 at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:230) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory.initialize(VcfFilterJdk.java:509) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:593) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1183) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:612) [SEVERE][VcfFilterJdk]Cannot compile custom class VcfFilterJdkCustom887993132 java.lang.RuntimeException: Cannot compile custom class VcfFilterJdkCustom887993132 at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:234) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory.initialize(VcfFilterJdk.java:509) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:593) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1183) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:612) Caused by: java.lang.RuntimeException: compile.call() failed for custom class VcfFilterJdkCustom887993132 at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:230) ... 5 more [SEVERE][VcfFilterJdk]Cannot initialize [INFO][Launcher]vcffilterjdk Exited with failure (-1) thank you:)
ADD REPLYlink modified 10 months ago by Pierre Lindenbaum112k • written 10 months ago by victoria_aleks30

does it worth it to write a script myself or better not to mess?

I do understand that.

But it's a pity for the developer (aka 'me') to see that his tool is not used just because 'it doesn't work'. :-)

It's hard to debug without the context. what is exactly your command line please ?

ADD REPLYlink written 10 months ago by Pierre Lindenbaum112k

i see:) honestly i repeated exactly what you wrote. except, first i ran awk command (i checked the resulting file, and DP appeared there. only: i do not understand why do I need incorporate DP in the header if I want to use AD anyway?). then I set this awk-mdified file as the input for vcffilterjdk:

vcffilterjdk awk.test -e 'List<Genotype> gl = variant.getGenotypes().stream().map(G->{int ad[]=G.getAD();if(ad==null || ad.length==0) return G; int c= Arrays.stream(ad).sum(); return new GenotypeBuilder(G).DP(c).make();}).collect(Collectors.toList());return new VariantContextBuilder(variant).genotypes(gl).make();'

I also tried to use -f instead -e; the error was:

[SEVERE][VcfFilterJdk]List<Genotype> (No such file or directory)
java.io.FileNotFoundException: List<Genotype> (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at htsjdk.samtools.util.IOUtil.slurp(IOUtil.java:999)
    at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory.initialize(VcfFilterJdk.java:436)
    at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:593)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1183)
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341)
    at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:612)
[SEVERE][VcfFilterJdk]Cannot initialize
[INFO][Launcher]vcffilterjdk Exited with failure (-1)

(in case, i tried to pipe awk into vcffilterjdk, as you have shown - didnt help)

I hope it gives some useful info

ADD REPLYlink modified 10 months ago by Pierre Lindenbaum112k • written 10 months ago by victoria_aleks30
1

why do I need incorporate DP in the header if I want to use AD anyway?

because my script produces a new attribute DP and it must be declared in the VCF header if it wasn't declared before.

 new GenotypeBuilder(G).DP(c).make()
ADD REPLYlink modified 10 months ago • written 10 months ago by Pierre Lindenbaum112k

you say

vcffilterjdk awk.test -e '...'

it should be something like

   java -jar dist/vcffilterjdk.jar   -e '...'  file.vcf
ADD REPLYlink modified 10 months ago • written 10 months ago by Pierre Lindenbaum112k

its a "wrapper", as our IT guy likes to use short words:) so, "java -jar ..." is included in "vcffilterjdk"

ADD REPLYlink written 10 months ago by victoria_aleks30

but I never use it and your messages show me that there is an error in the script (thanks for pointing this out!) the script ends with

$*

while I think I should I written it as (with double quotes)

"$@"
ADD REPLYlink written 10 months ago by Pierre Lindenbaum112k

you mean that you want to correct the script?

ADD REPLYlink written 10 months ago by victoria_aleks30

I've just fixed this https://github.com/lindenb/jvarkit/commit/7f811475a7648d24289702f49d53c89fb53761c9#diff-b67911656ef5d18c4ae36cb6741b7965R133

but anyway, I've never said you should use this wrapper, use the java -jar ... syntax please.

ADD REPLYlink written 10 months ago by Pierre Lindenbaum112k
1

now it worked:) as I understand the script summated AD for ref and alt and wrote it into DP parameter. but now i try to use vcftools to extract this DP (for example, --depth or --geno-depth) and it reports on 'missing entries'. and i still cannot use gatk, i think, as i have only vcf files, no fasta or bam

is there something that can be done here?:)

ADD REPLYlink modified 10 months ago • written 10 months ago by victoria_aleks30

I'm glad it's working now !

the script summated AD for ref and alt and wrote it into DP parameter. but now i try to use vcftools to extract this DP (for example)

piping my example into 'bcftools view' produces no error on my side

you can use another program bioalcidaejdk similar to vcffilterjdk:

java -jar dist/bioalcidaejdk.jar -e 'stream().forEach(V->{V.getGenotypes().stream().forEach(G->{println(V.getContig()+"\t"+V.getStart()+"\t"+G.getSampleName()+"\t"+G.getAlleles()+"\t"+G.getDP());});});' input.vcf
ADD REPLYlink written 10 months ago by Pierre Lindenbaum112k

thank you a lot:) it all finally seem to work now!

ADD REPLYlink written 10 months ago by victoria_aleks30

but also: can I extract exactly AD values, not only DP?? I want to know how many reads contain ref and how many contain alt nucleotides.

ADD REPLYlink modified 10 months ago • written 10 months ago by victoria_aleks30

ok, at the end we had to write our own script:)

ADD REPLYlink written 10 months ago by victoria_aleks30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1718 users visited in the last hour