Dear all, I need to know the "coverage" of sites in vcf files, but the files i am working with do not contain DP info, only AD. Most of the tools for vcf parcing use DP parameter. I understand that I can use AD to estimate depth, too. Is there any known ways of how to? Will be happy to any help:)
Use gatk variantAnnotator https://software.broadinstitute.org/gatk/gatkdocs/3.8-0/org_broadinstitute_gatk_tools_walkers_annotator_VariantAnnotator.php with Coverage https://software.broadinstitute.org/gatk/gatkdocs/3.8-0/org_broadinstitute_gatk_tools_walkers_annotator_Coverage.php
ls *.bam > bam.list
java -jar GenomeAnalysisTK.jar \
-R reference.fasta \
-T VariantAnnotator \
-I bam.list \
-V input.vcf \
-o output.vcf \
-A Coverage \
-L input.vcf
and what if i dont have neither bams nor reference fasta? I only downloaded vcf files from the source
and what if i dont have neither bams
you cannot get the depth without the bams...
nor reference fasta?
a reference sequence is always needed to run gatk. https://gatkforums.broadinstitute.org/gatk/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference
ah, sorry, now I see why you're asking: you already have AD...
Ok here is a solution using vcfilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html
First I use awk to insert the new VCF header for DP . Then I loop over the genotypes and I create a new variant.
curl -s "http://www.icbi.at/svnsimplex/CommonBioCommands/tags/simplex-1.0/CommonBioCommands/testdata/vcf/AMS1_converted_filtered_short_chr1.vcf" |\
awk '/^#CHROM/ {printf("##FORMAT=<ID=DP,Number=1,Type=Integer,Description=\"\">\n");} {print;}' | \
java -jar dist/vcffilterjdk.jar -e 'List<Genotype> gl = variant.getGenotypes().stream().map(G->{int ad[]=G.getAD();if(ad==null || ad.length==0) return G; int c= Arrays.stream(ad).sum(); return new GenotypeBuilder(G).DP(c).make();}).collect(Collectors.toList());return new VariantContextBuilder(variant).genotypes(gl).make();'
thank you very much. unfortunately vcffilterjdk constantly gives errors, in my hands :) Is it possible just write a home-made script to extract AD values for each line in vcf-file? I read that its better not to mess with vcf parcing and rather use the ready software. but right now it seems easier just code something by our own...
:) basically my question was: does it worth it to write a script myself or better not to mess? anyway, the error is: 2 errors
[SEVERE][InMemoryCompiler]compile.call() failed for custom class VcfFilterJdkCustom887993132 java.lang.RuntimeException: compile.call() failed for custom class VcfFilterJdkCustom887993132 at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:230) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory.initialize(VcfFilterJdk.java:509) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:593) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1183) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:612) [SEVERE][VcfFilterJdk]Cannot compile custom class VcfFilterJdkCustom887993132 java.lang.RuntimeException: Cannot compile custom class VcfFilterJdkCustom887993132 at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:234) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory.initialize(VcfFilterJdk.java:509) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:593) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1183) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:612) Caused by: java.lang.RuntimeException: compile.call() failed for custom class VcfFilterJdkCustom887993132 at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:230) ... 5 more [SEVERE][VcfFilterJdk]Cannot initialize [INFO][Launcher]vcffilterjdk Exited with failure (-1) thank you:)
does it worth it to write a script myself or better not to mess?
I do understand that.
But it's a pity for the developer (aka 'me') to see that his tool is not used just because 'it doesn't work'. :-)
It's hard to debug without the context. what is exactly your command line please ?
i see:) honestly i repeated exactly what you wrote. except, first i ran awk command (i checked the resulting file, and DP appeared there. only: i do not understand why do I need incorporate DP in the header if I want to use AD anyway?). then I set this awk-mdified file as the input for vcffilterjdk:
vcffilterjdk awk.test -e 'List<Genotype> gl = variant.getGenotypes().stream().map(G->{int ad[]=G.getAD();if(ad==null || ad.length==0) return G; int c= Arrays.stream(ad).sum(); return new GenotypeBuilder(G).DP(c).make();}).collect(Collectors.toList());return new VariantContextBuilder(variant).genotypes(gl).make();'
I also tried to use -f instead -e; the error was:
[SEVERE][VcfFilterJdk]List<Genotype> (No such file or directory)
java.io.FileNotFoundException: List<Genotype> (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at htsjdk.samtools.util.IOUtil.slurp(IOUtil.java:999)
at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory.initialize(VcfFilterJdk.java:436)
at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:593)
at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1183)
at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341)
at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:612)
[SEVERE][VcfFilterJdk]Cannot initialize
[INFO][Launcher]vcffilterjdk Exited with failure (-1)
(in case, i tried to pipe awk into vcffilterjdk, as you have shown - didnt help)
I hope it gives some useful info
why do I need incorporate DP in the header if I want to use AD anyway?
because my script produces a new attribute DP and it must be declared in the VCF header if it wasn't declared before.
new GenotypeBuilder(G).DP(c).make()
you say
vcffilterjdk awk.test -e '...'
it should be something like
java -jar dist/vcffilterjdk.jar -e '...' file.vcf
its a "wrapper", as our IT guy likes to use short words:) so, "java -jar ..." is included in "vcffilterjdk"
but I never use it and your messages show me that there is an error in the script (thanks for pointing this out!) the script ends with
$*
while I think I should I written it as (with double quotes)
"$@"
you mean that you want to correct the script?
I've just fixed this https://github.com/lindenb/jvarkit/commit/7f811475a7648d24289702f49d53c89fb53761c9#diff-b67911656ef5d18c4ae36cb6741b7965R133
but anyway, I've never said you should use this wrapper, use the java -jar ...
syntax please.
now it worked:) as I understand the script summated AD for ref and alt and wrote it into DP parameter. but now i try to use vcftools to extract this DP (for example, --depth or --geno-depth) and it reports on 'missing entries'. and i still cannot use gatk, i think, as i have only vcf files, no fasta or bam
is there something that can be done here?:)
I'm glad it's working now !
the script summated AD for ref and alt and wrote it into DP parameter. but now i try to use vcftools to extract this DP (for example)
piping my example into 'bcftools view' produces no error on my side
you can use another program bioalcidaejdk similar to vcffilterjdk:
java -jar dist/bioalcidaejdk.jar -e 'stream().forEach(V->{V.getGenotypes().stream().forEach(G->{println(V.getContig()+"\t"+V.getStart()+"\t"+G.getSampleName()+"\t"+G.getAlleles()+"\t"+G.getDP());});});' input.vcf
but also: can I extract exactly AD values, not only DP?? I want to know how many reads contain ref and how many contain alt nucleotides.