Entering edit mode
6.8 years ago
Famf
▴
30
Hi there,
I am trying to indexing some bcf files which I got with the following command:
vcftools --recode-bcf --bcf myfile.bcf --max-meanDP 37 --out outputfile
The indexing function I am using is as follow but I am getting an error massage
bcftools index -c outputfile.recode.bcf
[W::bcf_record_check] Bad BCF record: Invalid FORMAT type 15 (unknown)
I already have googled this error message but there is nothing that help me to solve it. I will appreciate any help with this.
One should no longer use VCFtools.
Please try:
...or:
Thanks for reply, I've tried one of the options you suggested above but I am getting much less SNPs than the ones I getting with my previous command in vcftools.
Actually before to use my current command I was trying with this (below) command in bcftools but I was losing a lot of SNPs. That is many SNPs whose DP value was lower than cutoff (1514 in this case) were also removed I don't know why.
Just to be sure, can you try:
I get this output:
I don't know what is going on with this
Oh, wait a second. You do not require the -o flag. Just do:
Your previous command may also work, i.e.:
In the case of
query
I am getting the same messagequery:invalid option -- 'O'
In the case of
filter
it works but I am getting a file with less than 1 million of SNPs and I know there are around 3 millions of SNPs with DP<1514 in the original file (myfile.bcf) which has in total ~3.2 millions of SNPs.Do any of these have a FILTER (the FILTER column in the VCF) other than PASS?
I don't sure whether I did it in the proper manner, but I saw in the first 1000 lines of one of my VCF files and in the column FILTER there are only dots, no PASS there.
What about just:
bcftools filter
is generally used for other purposes, i.e., specifically looking at or setting the FILTER column value.The same result as with
filter
function.My VCF files have 40 samples, I understand that DP is the sum of the depth values of all those 40 samples as the header states:
I previously know the number of SNP that have DP<1514 from the output file obtained with
--site-depth
in vcftools. So, I really don't understand why I am getting a different filtered output by using bcftools. Also I have manually inspected the output yielded either withfilter
andview
options and many SNPs that do not match the given expression have been filtered out. It is really a mystery to me.No, as I understand, DP will just relate to the first sample in your VCF. If you want the filter to apply to all samples, make use of the indexing:
Please take a look here: https://samtools.github.io/bcftools/bcftools.html#expressions
Thank you very much to help me to solve this issue! This is the command I finally used:
By looking at the error message, it seems to me that your VCF is ill-formated, probably in the FORMAT field of VCF. Could you validate your VCF?
Yeah, that could be a possibility, how can I validate my VCF file? Thanks
https://github.com/EBIvariation/vcf-validator