extracting depth info from vcf
1
0
Entering edit mode
4.7 years ago

Dear all, I need to know the "coverage" of sites in vcf files, but the files i am working with do not contain DP info, only AD. Most of the tools for vcf parcing use DP parameter. I understand that I can use AD to estimate depth, too. Is there any known ways of how to? Will be happy to any help:)

vcf AD depth • 3.9k views
0
Entering edit mode
4.7 years ago
 ls *.bam > bam.list
java -jar GenomeAnalysisTK.jar \
-R reference.fasta \
-T VariantAnnotator \
-I bam.list \
-V input.vcf \
-o output.vcf \
-A Coverage \
-L input.vcf

0
Entering edit mode

and what if i dont have neither bams nor reference fasta? I only downloaded vcf files from the source

0
Entering edit mode

and what if i dont have neither bams

you cannot get the depth without the bams...

nor reference fasta?

a reference sequence is always needed to run gatk. https://gatkforums.broadinstitute.org/gatk/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference

0
Entering edit mode

0
Entering edit mode

Ok here is a solution using vcfilterjdk http://lindenb.github.io/jvarkit/VcfFilterJdk.html

First I use awk to insert the new VCF header for DP . Then I loop over the genotypes and I create a new variant.

 curl -s "http://www.icbi.at/svnsimplex/CommonBioCommands/tags/simplex-1.0/CommonBioCommands/testdata/vcf/AMS1_converted_filtered_short_chr1.vcf" |\
awk '/^#CHROM/ {printf("##FORMAT=<ID=DP,Number=1,Type=Integer,Description=\"\">\n");} {print;}' | \

0
Entering edit mode

thank you very much. unfortunately vcffilterjdk constantly gives errors, in my hands :) Is it possible just write a home-made script to extract AD values for each line in vcf-file? I read that its better not to mess with vcf parcing and rather use the ready software. but right now it seems easier just code something by our own...

0
Entering edit mode
0
Entering edit mode

:) basically my question was: does it worth it to write a script myself or better not to mess? anyway, the error is: 2 errors

[SEVERE][InMemoryCompiler]compile.call() failed for custom class VcfFilterJdkCustom887993132 java.lang.RuntimeException: compile.call() failed for custom class VcfFilterJdkCustom887993132 at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:230) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory.initialize(VcfFilterJdk.java:509) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:593) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1183) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:612) [SEVERE][VcfFilterJdk]Cannot compile custom class VcfFilterJdkCustom887993132 java.lang.RuntimeException: Cannot compile custom class VcfFilterJdkCustom887993132 at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:234) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory.initialize(VcfFilterJdk.java:509) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:593) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1183) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:612) Caused by: java.lang.RuntimeException: compile.call() failed for custom class VcfFilterJdkCustom887993132 at com.github.lindenb.jvarkit.lang.InMemoryCompiler.compileClass(InMemoryCompiler.java:230) ... 5 more [SEVERE][VcfFilterJdk]Cannot initialize [INFO][Launcher]vcffilterjdk Exited with failure (-1) thank you:)

0
Entering edit mode

does it worth it to write a script myself or better not to mess?

I do understand that.

But it's a pity for the developer (aka 'me') to see that his tool is not used just because 'it doesn't work'. :-)

It's hard to debug without the context. what is exactly your command line please ?

0
Entering edit mode

i see:) honestly i repeated exactly what you wrote. except, first i ran awk command (i checked the resulting file, and DP appeared there. only: i do not understand why do I need incorporate DP in the header if I want to use AD anyway?). then I set this awk-mdified file as the input for vcffilterjdk:

vcffilterjdk awk.test -e 'List<Genotype> gl = variant.getGenotypes().stream().map(G->{int ad[]=G.getAD();if(ad==null || ad.length==0) return G; int c= Arrays.stream(ad).sum(); return new GenotypeBuilder(G).DP(c).make();}).collect(Collectors.toList());return new VariantContextBuilder(variant).genotypes(gl).make();'


I also tried to use -f instead -e; the error was:

[SEVERE][VcfFilterJdk]List<Genotype> (No such file or directory)
java.io.FileNotFoundException: List<Genotype> (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at htsjdk.samtools.util.IOUtil.slurp(IOUtil.java:999)
at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk$CtxWriterFactory.initialize(VcfFilterJdk.java:436) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:593) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1183) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:612) [SEVERE][VcfFilterJdk]Cannot initialize [INFO][Launcher]vcffilterjdk Exited with failure (-1)  (in case, i tried to pipe awk into vcffilterjdk, as you have shown - didnt help) I hope it gives some useful info ADD REPLY 1 Entering edit mode why do I need incorporate DP in the header if I want to use AD anyway? because my script produces a new attribute DP and it must be declared in the VCF header if it wasn't declared before.  new GenotypeBuilder(G).DP(c).make()  ADD REPLY 0 Entering edit mode you say vcffilterjdk awk.test -e '...'  it should be something like  java -jar dist/vcffilterjdk.jar -e '...' file.vcf  ADD REPLY 0 Entering edit mode its a "wrapper", as our IT guy likes to use short words:) so, "java -jar ..." is included in "vcffilterjdk" ADD REPLY 0 Entering edit mode but I never use it and your messages show me that there is an error in the script (thanks for pointing this out!) the script ends with $*


while I think I should I written it as (with double quotes)

"\$@"

0
Entering edit mode

you mean that you want to correct the script?

0
Entering edit mode

but anyway, I've never said you should use this wrapper, use the java -jar ... syntax please.

1
Entering edit mode

now it worked:) as I understand the script summated AD for ref and alt and wrote it into DP parameter. but now i try to use vcftools to extract this DP (for example, --depth or --geno-depth) and it reports on 'missing entries'. and i still cannot use gatk, i think, as i have only vcf files, no fasta or bam

is there something that can be done here?:)

0
Entering edit mode

I'm glad it's working now !

the script summated AD for ref and alt and wrote it into DP parameter. but now i try to use vcftools to extract this DP (for example)

piping my example into 'bcftools view' produces no error on my side

you can use another program bioalcidaejdk similar to vcffilterjdk:

java -jar dist/bioalcidaejdk.jar -e 'stream().forEach(V->{V.getGenotypes().stream().forEach(G->{println(V.getContig()+"\t"+V.getStart()+"\t"+G.getSampleName()+"\t"+G.getAlleles()+"\t"+G.getDP());});});' input.vcf

0
Entering edit mode

thank you a lot:) it all finally seem to work now!

0
Entering edit mode

but also: can I extract exactly AD values, not only DP?? I want to know how many reads contain ref and how many contain alt nucleotides.

0
Entering edit mode

ok, at the end we had to write our own script:)