Question: How to extract SNPs between samples and not between samples and reference
0
gravatar for TrentGenomics
15 months ago by
TrentGenomics30 wrote:

Hello,

I have used samtools mpileup, and bcftools call/filter (v1.5) to call SNPs between 2 samples and a de novo reference transcriptome.

Here is my pipeline:

bcftools mpileup -Ou -f ref.fa aln.bam1 aln.bam2 | \
    bcftools call -Ou -mv | \
    bcftools filter -s LowQual -e '%QUAL<20 || DP>100' > var.flt.vcf

I'm interested in extracting SNPs that occur between individual samples (aln.bam1 and aln.bam2) , and not between the samples and the reference (aln.bam 1 and ref.fa, aln.bam2 and ref.fa) . I would like to see how different each sample is different, or similar, from one another. Is this possible? I haven't been able to find much documentation on how one would approach this.

Any info would be greatly appreciated.

Cheers.

rna-seq snp • 752 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by TrentGenomics30

Hello michbrown!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=76923

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 15 months ago by Pierre Lindenbaum113k

Thanks for your reply, Pierre. Currently running vcffilterjs now. Would the command be the same as the examples posted above, if there were more than 2 samples? For example, if there were 4 samples?

Thanks again.

ADD REPLYlink written 15 months ago by TrentGenomics30

of course not. But it's not clear from your question how you would filter out a VCF with N-Samples. However vcffilterjs can run some loops over the genotypes. see the example '''Sample having a unique genotype:'''

ADD REPLYlink written 15 months ago by Pierre Lindenbaum113k

My mistake, I'm quite new to bioinformatics. Thanks for your help so far. I've been running

 $ java -jar src/jvarkit-git/dist/vcffilterjs.jar -e 'variant.getGenotype(0).sameGenotype(variant.getGenotype(1))' input.var.flt.vcf

for ~24 hours now on a filtered .vcf file. What run times should I expect with the vcfffiltersjs program?

Thanks again

ADD REPLYlink modified 15 months ago by Pierre Lindenbaum113k • written 15 months ago by TrentGenomics30

if the vcf input is huge, the processing is slow. you should see an estimation of the remaining time: something like:

[INFO][SAMSequenceDictionaryProgress]Count: 1994978 Elapsed: 3 hours(42.96%) Remains: 5 hours(57.04%) Last: 8:99026525
[INFO][SAMSequenceDictionaryProgress]Count: 2221249 Elapsed: 3 hours(49.37%) Remains: 4 hours(50.63%) Last: 9:138417471
[INFO][SAMSequenceDictionaryProgress]Count: 2200757 Elapsed: 3 hours(48.96%) Remains: 4 hours(51.04%) Last: 9:125920376
[INFO][SAMSequenceDictionaryProgress]Count: 2075688 Elapsed: 4 hours(45.47%) Remains: 4 hours(54.53%) Last: 9:17599369
[INFO][SAMSequenceDictionaryProgress]Count: 2051805 Elapsed: 3 hours(44.30%) Remains: 5 hours(55.70%) Last: 8:140497775
[INFO][SAMSequenceDictionaryProgress]Count: 1942671 Elapsed: 3 hours(41.35%) Remains: 5 hours(58.65%) Last: 8:48889823
[INFO][SAMSequenceDictionaryProgress]Count: 1996074 Elapsed: 4 hours(43.00%) Remains: 5 hours(57.00%) Last: 8:100040432
[INFO][SAMSequenceDictionaryProgress]Count: 2222632 Elapsed: 3 hours(49.39%) Remains: 4 hours(50.61%) Last: 9:139322248
[INFO][SAMSequenceDictionaryProgress]Count: 2202309 Elapsed: 3 hours(49.00%) Remains: 4 hours(51.00%) Last: 9:127025691

if not something is wrong.

Check your have really provided a input vcf, because vcfilterjs might try to wait for a VCF provided on standard input.

A faster alternative is my new tool: http://lindenb.github.io/jvarkit/VcfFilterJdk.html

 $ java -jar src/jvarkit-git/dist/vcffilterjdk.jar -e 'return variant.getGenotype(0).sameGenotype(variant.getGenotype(1));' input..vcf
ADD REPLYlink modified 15 months ago • written 15 months ago by Pierre Lindenbaum113k

Thanks for getting back to me, this is a big help. The tool that you've copied above will return sample [0] not same genotype as sample [1]?

ADD REPLYlink written 15 months ago by TrentGenomics30

no the SAME genotype

for a different , that would be

$ java -jar src/jvarkit-git/dist/vcffilterjdk.jar -e 'return !variant.getGenotype(0).sameGenotype(variant.getGenotype(1));' input..vcf

note the '!'

ADD REPLYlink written 15 months ago by Pierre Lindenbaum113k

Thank you so much, Pierre! I appreciate the help and insight.

ADD REPLYlink written 15 months ago by TrentGenomics30

Again, forgive me for my inexperience. I would like to output the results to a .txt file. Is this as simple as adding '-o somefile.txt' argument to the command? I tried to add an output argument but received an error:

java -jar /scratch/$USER/jvarkit/dist/vcffilterjdk.jar -e 'return !variant.getGenotype(0).sameGenotype(variant.getGenotype(1));' /scratch/$USER/SNPs/NFS_SFS_alignment.var.flt.vcf -o vcffilterjdk_different_genotype_NFS_SFS.txt

[SEVERE][Launcher]Must specify file or stream output type. java.lang.IllegalArgumentException: Must specify file or stream output type. at htsjdk.variant.variantcontext.writer.VariantContextWriterBuilder.build(VariantContextWriterBuilder.java:423) at com.github.lindenb.jvarkit.util.vcf.VCFUtils.createVariantContextWriter(VCFUtils.java:391) at com.github.lindenb.jvarkit.util.jcommander.Launcher.openVariantContextWriter(Launcher.java:946) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:981) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:1002) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:322) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1155) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:336) [INFO][Launcher]vcffilterjdk Exited with failure (-1)

Thanks again for guiding me through this.

ADD REPLYlink written 15 months ago by TrentGenomics30
java -jar /scratch/$USER/jvarkit/dist/vcffilterjdk.jar -e 'return !variant.getGenotype(0).sameGenotype(variant.getGenotype(1));' /scratch/$USER/SNPs/NFS_SFS_alignment.var.flt.vcf > vcffilterjdk_different_genotype_NFS_SFS.vcf
ADD REPLYlink written 15 months ago by Pierre Lindenbaum113k
0
gravatar for Pierre Lindenbaum
15 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum113k wrote:

I'm interested in extracting SNPs that occur between individual samples

using vcffilterjs: http://lindenb.github.io/jvarkit/VCFFilterJS.html

sample[0] same genotype as sample[1]

java -jar src/jvarkit-git/dist/vcffilterjs.jar -e 'variant.getGenotype(0).sameGenotype(variant.getGenotype(1))' input.vcf

sample[0] not same genotype as sample[1]

java -jar src/jvarkit-git/dist/vcffilterjs.jar -e '!variant.getGenotype(0).sameGenotype(variant.getGenotype(1))' input.vcf
ADD COMMENTlink written 15 months ago by Pierre Lindenbaum113k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1604 users visited in the last hour