Question: Finding unique variants in a vcf
0
gravatar for tomasz.szmatola
7 weeks ago by
tomasz.szmatola0 wrote:

Hello,

I'm dealing with a quite large vcf file of 10 individuals called with freebayes. I'm willing to find unique variants for 1 of the individuals and then compare it with 3 other individuals. I tried searching for quite some time and i can't find the correct answer myself. I would appreciate your help.

snp • 159 views
ADD COMMENTlink modified 7 weeks ago by Pierre Lindenbaum130k • written 7 weeks ago by tomasz.szmatola0

see GATK selectvariants with a JEXL expression.

e.g:

!vc.getGenotype("sample1").isHomRef()  && (vc.getGenotype("sample2").isHomRef() && vc.getGenotype("sample3").isHomRef())
ADD REPLYlink written 7 weeks ago by Pierre Lindenbaum130k

Have you looked at bcftools view -x? https://samtools.github.io/bcftools/bcftools.html#view

ADD REPLYlink written 7 weeks ago by RamRS30k

To be honest selecting individuals wasn't that big of a problem, I managed to do it with vcftools --indv option. The issue is I dont know how to select unique variants of 1 individuals vs 3 others (with --diff-site from vcftools i managed to compare one individual to another).

ADD REPLYlink written 7 weeks ago by tomasz.szmatola0

Further to Pierres' comment, another option I've used extensively is snpSift filter (from snpEff). It's available in Galaxy so is quite simple to use in the cloud, we use it locally and pass on full lists of filters and results.

For two samples, I use

(isHom( GEN[1] ) & isVariant( GEN[0] ) & isRef( GEN[1] ))

https://toolshed.g2.bx.psu.edu/repository/display_tool?repository_id=65063aa2c697f935&render_repository_actions_for=tool_shed&tool_config=%2Fsrv%2Ftoolshed%2Fmain%2Fvar%2Fdata%2Frepos%2F001%2Frepo_1363%2FsnpSift_filter.xml&changeset_revision=2b3e65a4252f

ADD REPLYlink written 7 weeks ago by colindaven2.3k

As I understand this filter: 1st individual [1] is a reference homozygote and 2nd [0] any variant ? So these types of variants will be moved to a new file or erased? Plus will long expressions work like (isHom( GEN[1] ) & isVariant( GEN[0] ) & isRef( GEN[1] ) | (isHom( GEN[3] ) & isVariant( GEN[2] ))

ADD REPLYlink written 7 weeks ago by tomasz.szmatola0

Yep, have a play with it, be careful, have a positive and negative control etc. Make sure you generate summaries of SNVs common to all, and then iteratively improve your queries to where you're happy with it. Very easy to make mistakes with wide-reaching consequences.

ADD REPLYlink written 7 weeks ago by colindaven2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1517 users visited in the last hour