Hi All,
I am trying to subset a GVCF with multiple samples to a GVCF with smaller number of samples and I am not getting the results as expected.
First command used
gatk SelectVariants --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' -R Homo_sapiens_assembly38.fasta --variant combined.g.vcf --sample-name subset_samples.txt -O subset_combined.g.vcf
And the error received was:. Even though all the samples listed in the file subset_samples.txt
are present in the input VCF
A USER ERROR has occurred: Bad input: Samples entered on command line (through -sf or -sn) that are not present in the VCF
A list of these samples: subset_samples.txt
To ignore these samples, run with --allow-nonoverlapping-command-line-samples
Second command used:
gatk SelectVariants --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true' -R Homo_sapiens_assembly38.fasta --variant combined.g.vcf --sample-name subset_amples.txt -O subset_combined.g.vcf--allow-nonoverlapping-command-line-samples
And the issue:
The output VCF still has all sample names
I am not sure, what I am missing in the commands to get the right output
are you sure it's safe reduce the number of samples from a g.vcf file (!= vcf) ?