Filter multisample vcf by group
1
0
Entering edit mode
23 months ago
Theo • 0

Hi,

I have a multisample VCF file with variable mean sequencing depths for subsets of the samples as follows. 60 samples @ 8X; 15 samples @ 20X; 1 sample @ 2X mean coverage and 45% missing genotypes.

I normally filtered my VCF with minimum and maximum depth range based on half and double the mean depth. In this case that criteria will be different for the different groups.

Is there a way to filter the VCF based only on one subset of samples at a time without splitting the VCF? For example keep all samples but filter variants so the 60 samples @ 8X have minimum depth of 4X and maximum of 16X.

The sample at 2X is an old museum sample and quite important. How can I filter the VCF to keep all samples but only this one sample has no missing variants?

I've searched the forums but can't quite find the answers I'm looking for.

vcf • 559 views
ADD COMMENT
0
Entering edit mode
23 months ago

using vcffilterjdk https://jvarkit.readthedocs.io/en/latest/VcfFilterJdk/

something like (not tested)

 java -jar dist/vcffilterjdk.jar -e 'final Set<String> samples1=new HashSet<>(Arrays.asList("S2","S3")); final Set<String> samples2=new HashSet<>(Arrays.asList("S1","S4")); return samples1.stream().map(S->variant.getGenotype(S)).allMatch(G->G.getDP()>10) && samples2.stream().map(S->variant.getGenotype(S)).allMatch(G->G.getDP()>20) ;'  in.vcf.gz
ADD COMMENT

Login before adding your answer.

Traffic: 3499 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6