Question: Filter multisample vcf for denovo variant
0
gravatar for finswimmer
20 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

Hello,

I have a multisample vcf file (3 samples) and I'd like to get all denovo variants for a specific sample. I tried bcftools:

bcftools view -s SAMPLE_ID -x All-final.vcf

The problem is, that some sites can be multiallelic. So the above command would e.g. not find this line (The third sample is the one I'm interested in):

chr5    38528951    rs762238623 GACAC   GAC,G   1204.93 PASS    .   GT:DP:AD:RO:QR:AO:QA:GL 0/1:10:2,6,0:2:75:6,0:212,0:-15.9529,0,-3.87444,-16.555,-5.68062,-22.6541   0/1:40:10,21,3:10:343:21,3:677,105:-51.4565,0,-21.4531,-45.9266,-19.2432,-72.8601   1/2:39:0,19,10:0:0:19,10:622,279:-72.2747,-22.0435,-16.3239,-50.201,0,-47.1907

Because the requirements are not fullfilled:

-x, --private print sites where only the subset samples carry an non-reference allele. Requires --samples or --samples-file.

So, what's the best way here to find all denovo variants in a given sample?

Thanks.

fin swimmer

snp bcftools vcf • 871 views
ADD COMMENTlink modified 7 months ago by Chadi Saad60 • written 20 months ago by finswimmer11k
2
gravatar for Pierre Lindenbaum
20 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

try GATK SelectVariants https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php

Generating a VCF of all the variants that are mendelian violations. The optional argument '-mvq' restricts the selection to sites that have a QUAL score of 50 or more

 java -jar GenomeAnalysisTK.jar \
   -T SelectVariants \
   -R reference.fasta \
   -V input.vcf \
   -ped family.ped \
   -mv -mvq 50 \
   -o violations.vcf

I've also written: http://lindenb.github.io/jvarkit/VCFTrios.html

ADD COMMENTlink modified 20 months ago • written 20 months ago by Pierre Lindenbaum120k
1

Hello Pierre,

checking for mendelian violation is not exactly what I ask, but for more needs this is also very good.

Thanks a lot.

fin swimmer

ADD REPLYlink written 20 months ago by finswimmer11k

what's your definition of l 'denovo variants ' without the context of a trio ?

ADD REPLYlink written 20 months ago by Pierre Lindenbaum120k
1

Without the context of a trio my definition for denovo is a non-reference allel that only occur in on specific sample compared to other samples in the multisample vcf.

But within the context of the trio your are absolutly right, that every mendelian violation is at least suspicious.

fin swimmer

ADD REPLYlink written 20 months ago by finswimmer11k

denovo is a non-reference allel that only occur in on specific sample compared to other samples in the multisample vcf.

i would say it's a "rare variant" :-)

ADD REPLYlink written 20 months ago by Pierre Lindenbaum120k

Ok, if this is the right term :)

Even if my initial problem is solved, I'm still interested in how to filter those rare variants for a given sample within a multisample, multiallelic vcf.

fin swimmer

ADD REPLYlink written 20 months ago by finswimmer11k

still with GATJ SelectVariants using the option -select someting like '-select "AC<1" see the GATK doc/ JEXL.

ADD REPLYlink written 20 months ago by Pierre Lindenbaum120k
0
gravatar for Chadi Saad
7 months ago by
Chadi Saad60
France
Chadi Saad60 wrote:

use genmod to annotate your variants with genetic models:

ADD COMMENTlink written 7 months ago by Chadi Saad60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2154 users visited in the last hour