Question

Deleted:Building of dbNSFP 4.2a in a single file version with hg19 coordinates

0

Entering edit mode

3.0 years ago

slowbromain • 0

Greetings,

I have recently downloaded the last version (4.2a) of dbNSFP in zip format. As usual, I have unzipped the content, the chromosomes files, and converted them all to hg19 coordinates by running the script "dbNSFP_sort.pl" provided on the SnpSift dbNSFP website. This is the command I ran for each chromosome:

version=4.2a
cat dbNSFP${version}_variant.chrNUM | ../dbnsfp_sort.pl 7 8 > dbNSFP${version}_hg19_variant.chrNUM

As I obtained each chromosome file in its hg19 coordinates version, I created the single file version of the database by running:

(head -n 1 dbNSFP${version}_hg19_variant.chr1 ; cat dbNSFP${version}_hg19_variant.chr* | grep -v "^#" ) > dbNSFP${version}_hg19.txt

At this point, I bgzipped the result:

bgzip dbNSFP${version}_hg19.txt

and obtained dbNSFP${version}_hg19.txt.gz. This is when the problem arise. When I try to run tabix to this product, at every line I get an error like:

[E::get_intv] Failed to parse TBX_GENERIC,, was wrong -p [type] used? The offending line was...

and then, after a number of lines, the program stops to work with the phrase:

[E::hts_idx_push] Unsorted positions on sequence #1: 249212562 followed by 1 tbx_index_build failed: dbNSFP4.2a_hg19.txt.gz

In particular, I ran:

tabix -s 1 -b 2 -e 2 dbNSFP${version}_hg19.txt.gz

This does not seems to happen to the single chromosome file version (even if converted to hg19 coordinates), so I think there migth be something wrong in the building of the single file version of the database. Does someone know what the problem could be and what could I do to succesfully create a .tbi index of my single file version of dbNSFP? I need that file for the last step of my pipeline. Thanks a lot in advance

hg19 dbNSFP4.2a dbNSFP tabix bgzip • 958 views

ADD COMMENT • link 3.0 years ago by slowbromain • 0