How to subset a GVCF using GATK SelectVariants
1
0
Entering edit mode
5 weeks ago
ttom ▴ 220

Hi All,

I am trying to subset a GVCF with multiple samples to a GVCF with smaller number of samples and I am not getting the results as expected.

First command used

gatk SelectVariants --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' -R Homo_sapiens_assembly38.fasta --variant combined.g.vcf --sample-name subset_samples.txt -O subset_combined.g.vcf


And the error received was:. Even though all the samples listed in the file subset_samples.txt are present in the input VCF

   A USER ERROR has occurred: Bad input: Samples entered on command line (through -sf or -sn) that are not present in the VCF
A list of these samples: subset_samples.txt

To ignore these samples, run with --allow-nonoverlapping-command-line-samples


Second command used:

gatk SelectVariants --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' -R Homo_sapiens_assembly38.fasta --variant combined.g.vcf --sample-name subset_amples.txt -O subset_combined.g.vcf--allow-nonoverlapping-command-line-samples


And the issue:

The output VCF still has all sample names

I am not sure, what I am missing in the commands to get the right output

GATK SelectVariants • 271 views
0
Entering edit mode

are you sure it's safe reduce the number of samples from a g.vcf file (!= vcf) ?

1
Entering edit mode
5 weeks ago

--sample-name / -sn This argument can be specified multiple times in order to provide multiple sample names, or to specify the name of one or more files containing sample names. File names must use the extension ".args", and the expected file format is simply plain text with one sample name per line. Note that sample exclusion takes precedence over inclusion, so that if a sample is in both lists it will be excluded.

did you try subset_samples.args instead of subset_samples.txt

0
Entering edit mode

Yes, the wrong file extension was the problem. It works with subset_samples.args Thank you !!