Hello,
I wanted to ask what solutions are out there use for random accessing BAM files via http.
Of course, the first answer here is samtools/htslib/pysam, but the current version of the htslib creates open range GET requests, those request lead to inflated egress costs when working on the S3 infrastructure.
I described this behavior here:
I was curious, If anybody else experienced this behavior and maybe has an work around for this.
IGV/IGVjs creates clean range requests when accessing data via http, but I don’t see an option to use this functionality outside of the programs for example in a pipeline or a command line tool.
A solution could be to parse the .bai file and define the range for the requested bytes from this data, maybe somebody has some code to share.
Happy about any feedback on this topic.
Best,
Stephan
I will have a look, thanks for the reply, I am happy about any input :)
If you try this out, I would be curious to know if you see the same open range issue with this library. I use this library for moving a fair bit of data and knowing if this issue requires fixing would be helpful.
Hey tested bamjs and monitored my traffic, rthe ange requests look good to me. Thanks for the help, if I have more updates I keep you in the loop
Good to know, thanks for following up
Really appreciate your feedback, do you know if there is a function in bam-js to lift the raw data to the server without parsing the reads? I am trying to create like sub BAM containing the header and the blocks from the regions of interest.
I don't know. Perhaps you could fork the repository and use the BAI parser to get the desired byte range from the index. Then you can just fetch for raw bytes from the BAM file using a generic fetcher (
fetch
,axios
, etc.).