Filtering Multi-sample VCF file for all except one Genotype
1
0
Entering edit mode
11 days ago
schmince • 0

Hello,

I am relatively new to the field of bioinformatics and I am currently working on a small program which should, among other things, filter a multisample VCF file for all genotypes except one of them. Seven genotpyes have been sampled and all variants, which belong to one of those genotpyes are to be "erased" (or every other variant except those should be copied to a new file).

A few lines from my file:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Dom HOR2932 HOR3036 HOR3726 KWSBambina  Rec S42IL_124
chr1H   58025   .   A   G   387.19  .   AC=6;AF=0.429;AN=14;DP=114;ExcessHet=8.2628;MLEAC=6;MLEAF=0.429;QD=29.78    GT:AD:AF:DP:GQ:PL   0/1:4,9:0.6000:15:27:34,0,27    0/1:24,0:.:24:47:47,0,527   0/1:12,0:.:12:23:23,0,263   0/1:8,0:.:8:65:65,0,125 0/1:49,0:.:49:99:212,0,962  0/1:5,0:.:5:14:14,0,104 0/0:1,0:.:1:3:0,3,29
chr1H   58051   .   T   C   82.02   .   AC=4;AF=0.286;AN=14;DP=109;ExcessHet=0.0921;MLEAC=4;MLEAF=0.286;QD=2.93 GT:AD:AF:DP:GQ:PL   1/1:1,17:0.8947:19:33:77,33,0   0/0:26,0:.:26:18:0,18,659   0/0:12,0:.:12:6:0,6,299 0/1:2,3:0.4286:7:3:8,0,3    0/0:39,0:.:39:99:0,117,1169 0/1:2,3:0.6000:5:16:16,0,41 0/0:1,0:.:1:3:0,3,29
chr1H   58057   .   T   C   89.43   .   AC=3;AF=0.214;AN=14;DP=112;ExcessHet=1.1394;MLEAC=3;MLEAF=0.214;QD=17.89    GT:AD:AF:DP:GQ:PL   0/0:19,0:.:19:57:0,57,569   0/0:26,0:.:26:51:0,51,749   0/0:12,0:.:12:6:0,6,299 0/1:7,0:.:7:8:8,0,158   0/1:42,0:.:42:83:83,0,923   0/1:3,2:0.4000:5:13:13,0,46 0/0:1,0:.:1:3:0,3,29

What I got from my research so far is, that the QUAL column doesn't help, since I have a multisample VCF.

I thought of filtering for the phred-score of each Genotype. Also there is a lot of posts talking about bcftools, which I never used before, so I don't know if that would be the right tool to use.

I don't expect code or anything, I just need an idea to get on the right track.

Thanks!

variant SNP VCF • 724 views
ADD COMMENT
0
Entering edit mode

, filter a multisample VCF file for all genotypes except one of them

filter for what ?

ADD REPLY
0
Entering edit mode

Seven genotpyes have been sampled and all variants, which belong to one of those genotpyes are to be "erased" (or every other variant except those should be copied to a new file)

I think OP wants to remove one genotype for all samples from a multisample VCF.

ADD REPLY
0
Entering edit mode

The seven genotypes being 58025AA, 58025AG, 58051TT, 58051TC, 58051CC, 58057TT, 58057TC

ADD REPLY
0
Entering edit mode

I probably got some of the vocabulary wrong. I thought that "Dom HOR2932 HOR3036 HOR3726 KWSBambina Rec S42IL_124" were representing my genotpyes.

Anyway, what i want to remove from my file is all Variants which "belong" to KWSBambina.

Is that possible? How do I identify, which of the Variants belong to Bambina?

ADD REPLY
1
Entering edit mode

Those are samples. If a sample has a 0/1 or a 1/1 genotype for that variant, they have the variant.

Your question is ambiguous because you haven't provided an example of the end result would look like.

You want to remove a sample?

You want to remove a variant that is unique to a certain sample?

You want to remove any variant for which a particular sample is a carrier?

Show us what the inputs and outputs are for a given example.

ADD REPLY
0
Entering edit mode

Ok so I definitely didn't understand at first what my goal was.

The goal is to remove all the variants which are unique to the KWSBambina sample.

The input is my normal VCF, output should only be the variants which are not unique to KWSBambina, copied to a new file.

ADD REPLY
1
Entering edit mode
9 days ago

The goal is to remove all the variants which are unique to the KWSBambina sample.

using jvarkit vcffilterjdk https://jvarkit.readthedocs.io/en/latest/VcfFilterJdk/

java -jar jvarkit.jar vcffilterjdk -e 'final String sn="KWSBambina";if(!variant.getGenotype(sn).hasAltAllele()) return true; return variant.getGenotypes().stream().filter(G->!sn.equals(G.getSampleName())).anyMatch(G->G.hasAltAllele());' in.vcf
ADD COMMENT
0
Entering edit mode

Thanks, that was exactly what i needed

ADD REPLY
0
Entering edit mode

Don't forget to follow up on your threads. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 1504 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6