Question: How to include/keep only the samples in a list in VCF.gz file?
gravatar for DanielC
10 months ago by
DanielC80 wrote:

Dear Friends,

I have a list of 8000 samples in a file "samples.txt":


I am using bcftools to only keep these samples in the vcf.gz file. The vcf.gz file has 10000 samples. Hence, I am trying to use bcftools to keep only the 8000 samples in the "samples.txt" file in the vcf.gz file and remove the remaining 2000 samples. I did:

bcftools -S samples.txt vcf.gz -o filtered-vcf.vcf

it gives me error:

[E::main] unrecognized command -S

Could you please suggest me what could be the issue here, and how you think I can do the above? Thanks much.

bcftools samples vcf • 682 views
ADD COMMENTlink modified 4 months ago • written 10 months ago by DanielC80
gravatar for Pierre Lindenbaum
10 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

subcommand 'view' is missing:

bcftools view -S samples.txt  -o filtered-vcf.vcf  vcf.gz
ADD COMMENTlink written 10 months ago by Pierre Lindenbaum120k

Thanks much Pierre! I ran this. however, it showed one error saying:

Error: subset called for sample that does not exist in header "TCGA..."

If am right, this means that the mentioned "TCGA.." sample in "samples.txt" is not present in the vcf.gz file? So, I used "--force-samples" to ignore this warning and it runs now.

ADD REPLYlink written 10 months ago by DanielC80

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.


ADD REPLYlink written 10 months ago by Pierre Lindenbaum120k

I used similar command as bcftools view -S samplelist.txt input_file.vcf.gz -o newfiltered.vcf.gz to subset sample data from compressed vcf file. but got error message [w::bcf_sr_add_reader] No BGZF EOF markers; file 'input_file.vcf.gz' may be truncated. I have to abort the execution since I don't understand what this error message means. Could someone help me please.

ADD REPLYlink written 4 months ago by mab65830
gravatar for DanielC
4 months ago by
DanielC80 wrote:

Hi mab658,

This issue basically arises when the vcf files are not properly uploaded or dowloaded from the source; could be due to internet issue or some other technical problem. Try to download the file again completely and run the command again. It should work.


ADD COMMENTlink written 4 months ago by DanielC80
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2015 users visited in the last hour