Greetings,
I have recently downloaded the last version (4.2a) of dbNSFP in zip format. As usual, I have unzipped the content, the chromosomes files, and converted them all to hg19 coordinates by running the script "dbNSFP_sort.pl" provided on the SnpSift dbNSFP website. This is the command I ran for each chromosome:
version=4.2a
cat dbNSFP${version}_variant.chrNUM | ../dbnsfp_sort.pl 7 8 > dbNSFP${version}_hg19_variant.chrNUM
As I obtained each chromosome file in its hg19 coordinates version, I created the single file version of the database by running:
(head -n 1 dbNSFP${version}_hg19_variant.chr1 ; cat dbNSFP${version}_hg19_variant.chr* | grep -v "^#" ) > dbNSFP${version}_hg19.txt
At this point, I bgzipped the result:
bgzip dbNSFP${version}_hg19.txt
and obtained dbNSFP${version}_hg19.txt.gz. This is when the problem arise. When I try to run tabix to this product, at every line I get an error like:
[E::get_intv] Failed to parse TBX_GENERIC,, was wrong -p [type] used? The offending line was...
and then, after a number of lines, the program stops to work with the phrase:
[E::hts_idx_push] Unsorted positions on sequence #1: 249212562 followed by 1 tbx_index_build failed: dbNSFP4.2a_hg19.txt.gz
In particular, I ran:
tabix -s 1 -b 2 -e 2 dbNSFP${version}_hg19.txt.gz
This does not seems to happen to the single chromosome file version (even if converted to hg19 coordinates), so I think there migth be something wrong in the building of the single file version of the database. Does someone know what the problem could be and what could I do to succesfully create a .tbi index of my single file version of dbNSFP? I need that file for the last step of my pipeline. Thanks a lot in advance