Finding unique variants in a vcf
0
0
Entering edit mode
3.7 years ago

Hello,

I'm dealing with a quite large vcf file of 10 individuals called with freebayes. I'm willing to find unique variants for 1 of the individuals and then compare it with 3 other individuals. I tried searching for quite some time and i can't find the correct answer myself. I would appreciate your help.

SNP • 1.8k views
ADD COMMENT
0
Entering edit mode

see GATK selectvariants with a JEXL expression.

e.g:

!vc.getGenotype("sample1").isHomRef()  && (vc.getGenotype("sample2").isHomRef() && vc.getGenotype("sample3").isHomRef())
ADD REPLY
0
Entering edit mode

Have you looked at bcftools view -x? https://samtools.github.io/bcftools/bcftools.html#view

ADD REPLY
0
Entering edit mode

To be honest selecting individuals wasn't that big of a problem, I managed to do it with vcftools --indv option. The issue is I dont know how to select unique variants of 1 individuals vs 3 others (with --diff-site from vcftools i managed to compare one individual to another).

ADD REPLY
0
Entering edit mode

Further to Pierres' comment, another option I've used extensively is snpSift filter (from snpEff). It's available in Galaxy so is quite simple to use in the cloud, we use it locally and pass on full lists of filters and results.

For two samples, I use

(isHom( GEN[1] ) & isVariant( GEN[0] ) & isRef( GEN[1] ))

https://toolshed.g2.bx.psu.edu/repository/display_tool?repository_id=65063aa2c697f935&render_repository_actions_for=tool_shed&tool_config=%2Fsrv%2Ftoolshed%2Fmain%2Fvar%2Fdata%2Frepos%2F001%2Frepo_1363%2FsnpSift_filter.xml&changeset_revision=2b3e65a4252f

ADD REPLY
0
Entering edit mode

As I understand this filter: 1st individual [1] is a reference homozygote and 2nd [0] any variant ? So these types of variants will be moved to a new file or erased? Plus will long expressions work like (isHom( GEN[1] ) & isVariant( GEN[0] ) & isRef( GEN[1] ) | (isHom( GEN[3] ) & isVariant( GEN[2] ))

ADD REPLY
0
Entering edit mode

Yep, have a play with it, be careful, have a positive and negative control etc. Make sure you generate summaries of SNVs common to all, and then iteratively improve your queries to where you're happy with it. Very easy to make mistakes with wide-reaching consequences.

ADD REPLY

Login before adding your answer.

Traffic: 2675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6