HTSlib threads when ingesting large bgzipped FASTA
1
0
Entering edit mode
2.3 years ago
Oleksii • 0

Is it possible to use HTSlib thread pool when loading large (genomic) FASTA sequences?

Here's a sample code:

#include <htslib/hts.h>
#include <htslib/sam.h>
#include <htslib/faidx.h>
#include <htslib/bgzf.h>
#include <htslib/thread_pool.h>

char fn[] = "/path/to/bgzipped.fasta.fa.gz";
int nthreads = 4;

faidx_t *faidx = fai_load(fn);

hts_tpool *tpool = hts_tpool_init(nthreads);
bgzf_thread_pool(faidx->bgzf, tpool, 0);

int fetched = 0;

for (int i=0; i<faidx_nseq(faidx); i++) {
  const char *name = faidx_iseq(faidx, i);
  int64_t length = faidx_seq_len(faidx, name);

  char *sequence = faidx_fetch_seq(faidx, name, 0, length-1, &fetched);

  // Do something

  free(sequence);
}

fai_destroy(faidx);
hts_tpool_destroy(tpool);

It won't compile because faidx_t is opaque and compiler doesn't know that this structure has bgzf field.

Does anyone know how to do it correctly?

Update on 13.07.2023:

It is now possible using

fai_thread_pool(faidx, tpool, 0);
htslib samtools bgzip c • 2.1k views
ADD COMMENT
1
Entering edit mode

My first thought is how much of the CPU time is in Deflate and how much is in parsing the data? The thread pool will aid decompression only. A perf record on fetching chr2 from a bgzf compressed human ref showed:

  36.22%  samtools  samtools             [.] fai_retrieve.isra.4
  25.93%  samtools  libdeflate.so.0      [.] deflate_decompress_default
  15.09%  samtools  samtools             [.] bgzf_getc
   4.38%  samtools  [kernel]             [k] 0xffffffff87a00a47
   3.88%  samtools  libc-2.27.so         [.] __ctype_b_loc
   3.75%  samtools  libdeflate.so.0      [.] crc32_x86_pclmul_avx
   3.19%  samtools  samtools             [.] __ctype_b_loc@plt
   2.26%  samtools  [kernel]             [k] 0xffffffff87a00190
   1.73%  samtools  libc-2.27.so         [.] fwrite
   0.94%  samtools  libc-2.27.so         [.] __memmove_sse2_unaligned_erms

So if you could parallelise the decompression step away to nothing it'd still be under 30% quicker. I haven't checked out your change in anger though. Maybe I'm misremembering this and there is parallel parsing as well. (I think not, but... maybe)

In answer to your question though, not that I'm aware of. There isn't a trivial way of controlling the bgzf component of the faidx struct. Although I note that bgzf is the first component, so you could (ghastly!) do bgzf_thread_pool(*(bgzf **)faidx, tpool, 0);.

Obviously that totally breaks the opaque type and isn't not my recommended solution! However it should permit you to compile and test the performance change. If it's significant, then please raise an issue on the htslib github site so we can look into a formal API, in a similar vein to the fai_set_cache_size function.

ADD REPLY
0
Entering edit mode

Thanks very much! (and sorry if I caused an anger)

Only computer at hand is ARM64 Mac (fast SSD, fast memory) and complete ingestion and processing of hg38 (889MB, bgzipped) takes as follows:

  1. no additional threads - 21.14s
  2. 1 additional thread - 15.39s
  3. 2 additional threads - 15.38s

Reading only takes 14.48s for no threads or 8.67s for a single additional thread.

For me, 30% of speedup is great anyway. I can check 5yo Linux PC tomorrow.

Update from i7-7700, 64GB, Ubuntu 20.04. Reading and processing:

  1. no additional threads - 36.62s (reading only - 24.56s)
  2. 1 additional thread - 28.57s (reading only - 15.80s)
  3. 2 additional threads - 28.42s

Do you think it may qualify for an API change?

ADD REPLY
0
Entering edit mode

It won't compile because faidx_t is opaque and compiler doesn't know that this structure has bgzf field

uh ? how is related to your problem ? why cannot it compile ?

ADD REPLY
0
Entering edit mode

Compiler throws an error at faidx->bgzf, because internal structure of faidx is not exported. Question is how to bind thread pool to fasta filehandle otherwise.

ADD REPLY
0
Entering edit mode

ah ok, I see now , thanks

ADD REPLY
1
Entering edit mode
7 months ago
Erik Garrison ★ 2.4k

I've implemented a reentrant/thread-safe single header extension to htslib that allows efficient multithreaded use of a single static fai/bgzf index structure: https://github.com/waveygang/faigz

ADD COMMENT
0
Entering edit mode

That looks interesting and we (the samtools team) did see your comments on GitHub.

ADD REPLY

Login before adding your answer.

Traffic: 2578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6