Is there a single header file somewhere I can use to read bam?
2
0
Entering edit mode
3.0 years ago

I am writing a library many people are likely to use. I do not want them to have to install any additional libraries or know where their libraries are installed.

Therefore I want to include a single header for reading bams. Are there any single file headers for reading bams I can include in my project?

Edit: I only need the chromosome, start, (end or length) and strand.

bam c++ c • 966 views
0
Entering edit mode

Not sure about the language you are using for your code but it is easy to use a range of Python or Java libraries in your program. If you are using Python then you could use pysam (it uses htslib) and if you are using Java then you could use htsjdk.

0
Entering edit mode
0
Entering edit mode

Sorry, I updated my comment accordingly. Is there a specific reason why you are not planning to use htslib?

0
Entering edit mode

Compile errors like I showed in the linked to question.

1
Entering edit mode

pysam can easily be installed via conda or pip. There is no need for from-scratch compilation. Trying to write code for standard tasks like reading a BAM is IMHO not only unnecessary but a wrong investment of resources. htslib (or its analoga in other languages) is an on-going project for years now developed from experts in the field. It contains features for quality and integrity control of the BAM files that you should exploit rather than dismiss. Do not reinvent the wheel. Use existing code and solutions and build your tool around it, focusing on the novelity of your tool.

0
Entering edit mode

But pysam is slow as each record is a Python object which needs to be parsed.

0
Entering edit mode

I do not want them to have to install any additional libraries or know where their libraries are installed.

what's wrong with my previous answer: C: Can I read chrom, strand, pos, len from bam files without htslib? ?

0
Entering edit mode

Every user who installs my software would have to know where their htslib is. And update their setup.py to reflect the location.

1
Entering edit mode

Your users would probably be using conda anyway, so that becomes a non-issue.

0
Entering edit mode

Every user who installs my software would have to know where their htslib is

no, because, as it's a git submodule, the libraries would be under your main folder.

Oh please, tell me you know how to compile a C/C++ program with make.

0
Entering edit mode

MACS2 seems to be able to read bam without any special utilities: https://github.com/taoliu/MACS/blob/33187eae605081c8ddad9313a886bd01d2c654cd/MACS2/IO/Parser.pyx#L732

However, I do not know if the code is brittle, or especially fast. It used the struct library so it cannot be compiled down to pure C/C++.

0
Entering edit mode

MACS2 likes to break in many places with exceptionally cryptic error messages. You need to just use htslib and get on with it.

5
Entering edit mode
3.0 years ago

BAM files are binary structs compressed in a gzip-compatible way. For non-random streaming read access, if you are willing to use the system zlib library to handle the decompression, it is pretty simple to decode BAM records read via gzread() in your own C or C++ code. Refer to the SAM/BAM specification and see for example Velvet's readBAMFile().

If you want to write or do random access or to also accept SAM or especially CRAM files, avoiding external library dependencies is more trouble than it's worth. Then you should just use HTSlib and get on with it :-).

Compilation errors and trouble finding an externally-supplied HTSlib really come down to inflexibility in your project's configuration scripts. Done properly (see GNU autotools and samtools/htslib) there are very few problems.

0
Entering edit mode

I just need to stream the file. I guess the error is on my part as I have not used C much, just Cython.

2
Entering edit mode
3.0 years ago

I will submit to all of your expertise. Added htslib in a lib folder in my project.

3
Entering edit mode

The HTSlib maintainers would love it if you did not do that, as it's just one more place where in future people will be stuck with an outdated version of HTSlib. At least provide configure or make options to use an already-installed HTSlib.

0
Entering edit mode

But I am using cython. How can I use compile flags to point to the system installation of htslib? Users have no way to point to an installed version when pip installing my library.

0
Entering edit mode

Users are unlikely to ever pip install your program. If you ever get any users they'll probably use their system's package manager or conda. Worry about getting your program properly working before you spend so much time worrying that it's easier for people to install things in arbitrary ways.

0
Entering edit mode

But I have the htslib conda installed and still #include <stdlib.h> errors. I think I will just use pysam, but give a warning that it is slower than the C++ bed reader.

0
Entering edit mode

But I am using cython.

Who knew? You said in your question tags that you were using C and/or C++…

If you are using Python and are overly concerned about external dependencies, you might look at the pure Python techniques used in BAMnostic: see https://sourceforge.net/p/samtools/mailman/message/36343993/.