Error when indexing a bcf file
0
0
Entering edit mode
5.7 years ago
Famf ▴ 30

Hi there,

I am trying to indexing some bcf files which I got with the following command:

vcftools --recode-bcf --bcf myfile.bcf --max-meanDP 37 --out outputfile

The indexing function I am using is as follow but I am getting an error massage

bcftools index -c outputfile.recode.bcf
[W::bcf_record_check] Bad BCF record: Invalid FORMAT type 15 (unknown)

I already have googled this error message but there is nothing that help me to solve it. I will appreciate any help with this.

vcf bcf • 4.6k views
ADD COMMENT
0
Entering edit mode

One should no longer use VCFtools.

Please try:

bcftools view -i 'MAX(FMT/DP)>37' -Ob myfile.bcf > myfile_maxDP37.bcf ;

bcftools index myfile_maxDP37.bcf ;

...or:

bcftools norm -m-any myfile.bcf | bcftools view -i 'MAX(FMT/DP)>37' -Ob > myfile_maxDP37.bcf ;

bcftools index myfile_maxDP37.bcf ;
ADD REPLY
0
Entering edit mode

Thanks for reply, I've tried one of the options you suggested above but I am getting much less SNPs than the ones I getting with my previous command in vcftools.

Actually before to use my current command I was trying with this (below) command in bcftools but I was losing a lot of SNPs. That is many SNPs whose DP value was lower than cutoff (1514 in this case) were also removed I don't know why.

bcftools filter -e "DP>1514" -Ob -o my_filtered_file.bcf myfile.bcf
ADD REPLY
0
Entering edit mode

Just to be sure, can you try:

bcftools query -e'FMT/DP>1514' -Ob -o my_filtered_file.bcf myfile.bcf
ADD REPLY
0
Entering edit mode

I get this output:

query: invalid option -- 'O'

About: Extracts fields from VCF/BCF file and prints them in user-defined format Usage: bcftools query [options] <a.vcf.gz> [<b.vcf.gz> [...]] ...

I don't know what is going on with this

ADD REPLY
0
Entering edit mode

Oh, wait a second. You do not require the -o flag. Just do:

bcftools query -e'FMT/DP>1514' -Ob myfile.bcf > my_filtered_file.bcf

Your previous command may also work, i.e.:

bcftools filter -e "DP>1514" -Ob myfile.bcf > my_filtered_file.bcf
ADD REPLY
0
Entering edit mode

In the case of query I am getting the same message query:invalid option -- 'O'

In the case of filter it works but I am getting a file with less than 1 million of SNPs and I know there are around 3 millions of SNPs with DP<1514 in the original file (myfile.bcf) which has in total ~3.2 millions of SNPs.

ADD REPLY
0
Entering edit mode

Do any of these have a FILTER (the FILTER column in the VCF) other than PASS?

ADD REPLY
0
Entering edit mode

I don't sure whether I did it in the proper manner, but I saw in the first 1000 lines of one of my VCF files and in the column FILTER there are only dots, no PASS there.

ADD REPLY
0
Entering edit mode

What about just:

bcftools view -e "FORMAT/DP > 1514" -Ob myfile.bcf > my_filtered_file.bcf

bcftools filter is generally used for other purposes, i.e., specifically looking at or setting the FILTER column value.

ADD REPLY
0
Entering edit mode

The same result as with filter function.

My VCF files have 40 samples, I understand that DP is the sum of the depth values of all those 40 samples as the header states:

 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Combined depth across samples">

I previously know the number of SNP that have DP<1514 from the output file obtained with --site-depth in vcftools. So, I really don't understand why I am getting a different filtered output by using bcftools. Also I have manually inspected the output yielded either with filter and view options and many SNPs that do not match the given expression have been filtered out. It is really a mystery to me.

ADD REPLY
1
Entering edit mode

No, as I understand, DP will just relate to the first sample in your VCF. If you want the filter to apply to all samples, make use of the indexing:

bcftools view -e "FORMAT/DP[0-39] > 1514" -Ob myfile.bcf

Please take a look here: https://samtools.github.io/bcftools/bcftools.html#expressions

ADD REPLY
1
Entering edit mode

Thank you very much to help me to solve this issue! This is the command I finally used:

bcftools view -e "SUM(FORMAT/DP[0-39])>1514" -Ob myfile.bcf > my_filtered_file.bcf
ADD REPLY
0
Entering edit mode

By looking at the error message, it seems to me that your VCF is ill-formated, probably in the FORMAT field of VCF. Could you validate your VCF?

ADD REPLY
0
Entering edit mode

Yeah, that could be a possibility, how can I validate my VCF file? Thanks

ADD REPLY
0

Login before adding your answer.

Traffic: 1468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6