Question: Is there a single header file somewhere I can use to read bam?
0
gravatar for Click downvote
13 months ago by
Germany
Click downvote670 wrote:

I am writing a library many people are likely to use. I do not want them to have to install any additional libraries or know where their libraries are installed.

Therefore I want to include a single header for reading bams. Are there any single file headers for reading bams I can include in my project?

Edit: I only need the chromosome, start, (end or length) and strand.

c++ C bam • 472 views
ADD COMMENTlink modified 13 months ago by John Marshall1.8k • written 13 months ago by Click downvote670

Not sure about the language you are using for your code but it is easy to use a range of Python or Java libraries in your program. If you are using Python then you could use pysam (it uses htslib) and if you are using Java then you could use htsjdk.

ADD REPLYlink modified 13 months ago • written 13 months ago by Sej Modha4.5k

Pysam requires htslib. See pysam: How do I type an AlignedSegment in Cython?

ADD REPLYlink modified 13 months ago • written 13 months ago by Click downvote670

Sorry, I updated my comment accordingly. Is there a specific reason why you are not planning to use htslib?

ADD REPLYlink modified 13 months ago • written 13 months ago by Sej Modha4.5k

Compile errors like I showed in the linked to question.

ADD REPLYlink modified 13 months ago • written 13 months ago by Click downvote670
1

pysam can easily be installed via conda or pip. There is no need for from-scratch compilation. Trying to write code for standard tasks like reading a BAM is IMHO not only unnecessary but a wrong investment of resources. htslib (or its analoga in other languages) is an on-going project for years now developed from experts in the field. It contains features for quality and integrity control of the BAM files that you should exploit rather than dismiss. Do not reinvent the wheel. Use existing code and solutions and build your tool around it, focusing on the novelity of your tool.

ADD REPLYlink modified 13 months ago • written 13 months ago by ATpoint26k

But pysam is slow as each record is a Python object which needs to be parsed.

ADD REPLYlink written 13 months ago by Click downvote670

I do not want them to have to install any additional libraries or know where their libraries are installed.

what's wrong with my previous answer: C: Can I read chrom, strand, pos, len from bam files without htslib? ?

ADD REPLYlink written 13 months ago by Pierre Lindenbaum124k

Every user who installs my software would have to know where their htslib is. And update their setup.py to reflect the location.

ADD REPLYlink modified 13 months ago • written 13 months ago by Click downvote670
1

Your users would probably be using conda anyway, so that becomes a non-issue.

ADD REPLYlink written 13 months ago by Devon Ryan93k

Every user who installs my software would have to know where their htslib is

no, because, as it's a git submodule, the libraries would be under your main folder.

Oh please, tell me you know how to compile a C/C++ program with make.

ADD REPLYlink written 13 months ago by Pierre Lindenbaum124k

MACS2 seems to be able to read bam without any special utilities: https://github.com/taoliu/MACS/blob/33187eae605081c8ddad9313a886bd01d2c654cd/MACS2/IO/Parser.pyx#L732

However, I do not know if the code is brittle, or especially fast. It used the struct library so it cannot be compiled down to pure C/C++.

ADD REPLYlink modified 13 months ago • written 13 months ago by Click downvote670

MACS2 likes to break in many places with exceptionally cryptic error messages. You need to just use htslib and get on with it.

ADD REPLYlink written 13 months ago by Devon Ryan93k
5
gravatar for John Marshall
13 months ago by
John Marshall1.8k
Glasgow, Scotland
John Marshall1.8k wrote:

BAM files are binary structs compressed in a gzip-compatible way. For non-random streaming read access, if you are willing to use the system zlib library to handle the decompression, it is pretty simple to decode BAM records read via gzread() in your own C or C++ code. Refer to the SAM/BAM specification and see for example Velvet's readBAMFile().

If you want to write or do random access or to also accept SAM or especially CRAM files, avoiding external library dependencies is more trouble than it's worth. Then you should just use HTSlib and get on with it :-).

Compilation errors and trouble finding an externally-supplied HTSlib really come down to inflexibility in your project's configuration scripts. Done properly (see GNU autotools and samtools/htslib) there are very few problems.

ADD COMMENTlink modified 13 months ago • written 13 months ago by John Marshall1.8k

I just need to stream the file. I guess the error is on my part as I have not used C much, just Cython.

ADD REPLYlink written 13 months ago by Click downvote670
2
gravatar for Click downvote
13 months ago by
Germany
Click downvote670 wrote:

I will submit to all of your expertise. Added htslib in a lib folder in my project.

ADD COMMENTlink modified 13 months ago • written 13 months ago by Click downvote670
3

The HTSlib maintainers would love it if you did not do that, as it's just one more place where in future people will be stuck with an outdated version of HTSlib. At least provide configure or make options to use an already-installed HTSlib.

ADD REPLYlink written 13 months ago by John Marshall1.8k

But I am using cython. How can I use compile flags to point to the system installation of htslib? Users have no way to point to an installed version when pip installing my library.

ADD REPLYlink modified 13 months ago • written 13 months ago by Click downvote670

Users are unlikely to ever pip install your program. If you ever get any users they'll probably use their system's package manager or conda. Worry about getting your program properly working before you spend so much time worrying that it's easier for people to install things in arbitrary ways.

ADD REPLYlink written 13 months ago by Devon Ryan93k

But I have the htslib conda installed and still #include <stdlib.h> errors. I think I will just use pysam, but give a warning that it is slower than the C++ bed reader.

ADD REPLYlink written 13 months ago by Click downvote670

But I am using cython.

Who knew? You said in your question tags that you were using C and/or C++…

If you are using Python and are overly concerned about external dependencies, you might look at the pure Python techniques used in BAMnostic: see https://sourceforge.net/p/samtools/mailman/message/36343993/.

ADD REPLYlink modified 13 months ago by Pierre Lindenbaum124k • written 13 months ago by John Marshall1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1378 users visited in the last hour