Dear all,
I'm currently writing a program that need random access to large files. After looking up a bit I found BGZIP, largely used on SAMTOOLS.
I tried to implement it into my program but I'm getting an error: "Error: invalid block header"
Also, if I set start_pos = 0 it works.
I've tried to decompress it with bgzip (compiled from samtools 1.18) and it works fine! here is the code I'm using:
    BGZF* in_glf_fh;
    unsigned int total_bytes_read = 0;
    // Define chunk start and end positions
    unsigned int start_pos = 2203 * 10000;
    unsigned int end_pos = start_pos + 10000 - 1;
    unsigned int chunk_size = end_pos - start_pos + 1;
    // Open input file
    in_glf_fh = bgzf_open(pars->in_glf, "rb");
    if( in_glf_fh == NULL )
        error("ERROR: cannot open GLF file!");
    // Search start position
    if( bgzf_seek(in_glf_fh, start_pos * pars->n_ind * 3 * sizeof(double), SEEK_SET) < 0 )
        error("ERROR: cannot seek GLF file!");
    // Read data from file
    for(unsigned int c = 0; c < chunk_size; c++) {
        int bytes_read = bgzf_read(in_glf_fh, chunk_data[c], sizeof(double) * pars->n_ind * 3);
        if( (unsigned int) bytes_read != sizeof(double) * pars->n_ind * 3 )
            fprintf(stderr, "Error: %s\n", in_glf_fh->error);
        total_bytes_read += bytes_read;
    }
    bgzf_close(in_glf_fh);
thanks in adv,
FGV
what are
start_pos,end_pos,chunk_size? how do you know if your offset inbgzf_readis not "out of bounds"?chunk_sizeis the amount of data I want to read (10000 in this case)start_posandend_posis the interval I want to read from the BGZIP file...cross-posted on the samtools-dev mailing list: http://sourceforge.net/mailarchive/message.php?msg_id=29974208