Dear biostars,
I have .vcf.gz files containing a few dozen samples. From here I wish to identify the HET and HOM variants to better understand the population I am interested in.
I was informed that bcftools is a good option and I have tried following instructions from the documentation, however I believe I am not understanding some syntax. Below I have my attempts and the common error I am stumbling on.
First to check my vcf.gz files have the fields on interest here is output from query.
bcftools query -f '%CHROM %POS %REF %ALT\n' $In | head -3
chrM 285 C CA
chrM 299 CA C
chrM 302 AC A
Then I try to pipe query to fill-tags to retrieve HET and HOM genotypes. Here is the code and the error.
bcftools query -f '%CHROM %POS %REF %ALT\n' $In |\
bcftools +fill-tags -o $Out -- -t AC_Het,AC_Hom
Failed to open -: unknown file type
I have even tried to skip piping and make an intermediary .bcf file instead. Below is code and the error.
bcftools query -f '%CHROM %POS %REF %ALT\n' $In -o $Temp
bcftools +fill-tags $Temp -Ou -o $Out -- -t AC_Het,AC_Hom
Failed to open X.Indel.tmp.bcf: unknown file type
I am clearly not importing the .bcf file correctly. Appreciate any help I can get with this issue.
Hi @4galaxy77, thanks I think your suggestion has helped! Well the idea was to eventually import the data into R. I thought to use query -> +fill-tags, but based on your description I may have had the tools in the wrong order. :\