How to extract SNPs between samples and not between samples and reference
1
0
Entering edit mode
6.8 years ago

Hello,

I have used samtools mpileup, and bcftools call/filter (v1.5) to call SNPs between 2 samples and a de novo reference transcriptome.

Here is my pipeline:

bcftools mpileup -Ou -f ref.fa aln.bam1 aln.bam2 | \
    bcftools call -Ou -mv | \
    bcftools filter -s LowQual -e '%QUAL<20 || DP>100' > var.flt.vcf

I'm interested in extracting SNPs that occur between individual samples (aln.bam1 and aln.bam2) , and not between the samples and the reference (aln.bam 1 and ref.fa, aln.bam2 and ref.fa) . I would like to see how different each sample is different, or similar, from one another. Is this possible? I haven't been able to find much documentation on how one would approach this.

Any info would be greatly appreciated.

Cheers.

SNP RNA-Seq • 2.4k views
ADD COMMENT
0
Entering edit mode

Hello michbrown!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=76923

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Thanks for your reply, Pierre. Currently running vcffilterjs now. Would the command be the same as the examples posted above, if there were more than 2 samples? For example, if there were 4 samples?

Thanks again.

ADD REPLY
0
Entering edit mode

of course not. But it's not clear from your question how you would filter out a VCF with N-Samples. However vcffilterjs can run some loops over the genotypes. see the example '''Sample having a unique genotype:'''

ADD REPLY
0
Entering edit mode

My mistake, I'm quite new to bioinformatics. Thanks for your help so far. I've been running

 $ java -jar src/jvarkit-git/dist/vcffilterjs.jar -e 'variant.getGenotype(0).sameGenotype(variant.getGenotype(1))' input.var.flt.vcf

for ~24 hours now on a filtered .vcf file. What run times should I expect with the vcfffiltersjs program?

Thanks again

ADD REPLY
0
Entering edit mode

if the vcf input is huge, the processing is slow. you should see an estimation of the remaining time: something like:

[INFO][SAMSequenceDictionaryProgress]Count: 1994978 Elapsed: 3 hours(42.96%) Remains: 5 hours(57.04%) Last: 8:99026525
[INFO][SAMSequenceDictionaryProgress]Count: 2221249 Elapsed: 3 hours(49.37%) Remains: 4 hours(50.63%) Last: 9:138417471
[INFO][SAMSequenceDictionaryProgress]Count: 2200757 Elapsed: 3 hours(48.96%) Remains: 4 hours(51.04%) Last: 9:125920376
[INFO][SAMSequenceDictionaryProgress]Count: 2075688 Elapsed: 4 hours(45.47%) Remains: 4 hours(54.53%) Last: 9:17599369
[INFO][SAMSequenceDictionaryProgress]Count: 2051805 Elapsed: 3 hours(44.30%) Remains: 5 hours(55.70%) Last: 8:140497775
[INFO][SAMSequenceDictionaryProgress]Count: 1942671 Elapsed: 3 hours(41.35%) Remains: 5 hours(58.65%) Last: 8:48889823
[INFO][SAMSequenceDictionaryProgress]Count: 1996074 Elapsed: 4 hours(43.00%) Remains: 5 hours(57.00%) Last: 8:100040432
[INFO][SAMSequenceDictionaryProgress]Count: 2222632 Elapsed: 3 hours(49.39%) Remains: 4 hours(50.61%) Last: 9:139322248
[INFO][SAMSequenceDictionaryProgress]Count: 2202309 Elapsed: 3 hours(49.00%) Remains: 4 hours(51.00%) Last: 9:127025691

if not something is wrong.

Check your have really provided a input vcf, because vcfilterjs might try to wait for a VCF provided on standard input.

A faster alternative is my new tool: http://lindenb.github.io/jvarkit/VcfFilterJdk.html

 $ java -jar src/jvarkit-git/dist/vcffilterjdk.jar -e 'return variant.getGenotype(0).sameGenotype(variant.getGenotype(1));' input..vcf
ADD REPLY
0
Entering edit mode

Thanks for getting back to me, this is a big help. The tool that you've copied above will return sample [0] not same genotype as sample [1]?

ADD REPLY
0
Entering edit mode

no the SAME genotype

for a different , that would be

$ java -jar src/jvarkit-git/dist/vcffilterjdk.jar -e 'return !variant.getGenotype(0).sameGenotype(variant.getGenotype(1));' input..vcf

note the '!'

ADD REPLY
0
Entering edit mode

Thank you so much, Pierre! I appreciate the help and insight.

ADD REPLY
0
Entering edit mode

Again, forgive me for my inexperience. I would like to output the results to a .txt file. Is this as simple as adding '-o somefile.txt' argument to the command? I tried to add an output argument but received an error:

java -jar /scratch/$USER/jvarkit/dist/vcffilterjdk.jar -e 'return !variant.getGenotype(0).sameGenotype(variant.getGenotype(1));' /scratch/$USER/SNPs/NFS_SFS_alignment.var.flt.vcf -o vcffilterjdk_different_genotype_NFS_SFS.txt

[SEVERE][Launcher]Must specify file or stream output type. java.lang.IllegalArgumentException: Must specify file or stream output type. at htsjdk.variant.variantcontext.writer.VariantContextWriterBuilder.build(VariantContextWriterBuilder.java:423) at com.github.lindenb.jvarkit.util.vcf.VCFUtils.createVariantContextWriter(VCFUtils.java:391) at com.github.lindenb.jvarkit.util.jcommander.Launcher.openVariantContextWriter(Launcher.java:946) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:981) at com.github.lindenb.jvarkit.util.jcommander.Launcher.doVcfToVcf(Launcher.java:1002) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.doWork(VcfFilterJdk.java:322) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:1155) at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:1341) at com.github.lindenb.jvarkit.tools.vcffilterjs.VcfFilterJdk.main(VcfFilterJdk.java:336) [INFO][Launcher]vcffilterjdk Exited with failure (-1)

Thanks again for guiding me through this.

ADD REPLY
0
Entering edit mode
java -jar /scratch/$USER/jvarkit/dist/vcffilterjdk.jar -e 'return !variant.getGenotype(0).sameGenotype(variant.getGenotype(1));' /scratch/$USER/SNPs/NFS_SFS_alignment.var.flt.vcf > vcffilterjdk_different_genotype_NFS_SFS.vcf
ADD REPLY
0
Entering edit mode
6.8 years ago

I'm interested in extracting SNPs that occur between individual samples

using vcffilterjs: http://lindenb.github.io/jvarkit/VCFFilterJS.html

sample[0] same genotype as sample[1]

java -jar src/jvarkit-git/dist/vcffilterjs.jar -e 'variant.getGenotype(0).sameGenotype(variant.getGenotype(1))' input.vcf

sample[0] not same genotype as sample[1]

java -jar src/jvarkit-git/dist/vcffilterjs.jar -e '!variant.getGenotype(0).sameGenotype(variant.getGenotype(1))' input.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6