Rust-HTSlib Script Only Outputs BAM Header, Records Are Missing
1
1
Entering edit mode
7 weeks ago
plusone ▴ 50

Hi all,

I am currently learning rust-htslib and working on a Rust script to process BAM files using the rust-htslib crate and expose the functionality to Python using pyo3. My goal is to read an input BAM file (which includes both a header and records) and write the entire content (header + records) to an output BAM file. However, the output BAM file only contains the header, and all the records are missing. My code:

use pyo3::prelude::*;
use rust_htslib::{bam, bam::Read};

#[pyfunction]
fn parse_bam(input_path: &str, output_path: &str) -> PyResult<()> {
    // Open BAM file for reading
    let mut bam = bam::Reader::from_path(input_path).unwrap();

    // Create a header for the output BAM
    let header = bam::Header::from_template(bam.header());

    // Open BAM file for writing
    let mut out = bam::Writer::from_path(output_path, &header, bam::Format::Bam)
        .map_err(|e| PyErr::new::<pyo3::exceptions::PyIOError, _>(format!("{}", e)))?;

    for r in bam.records() {
        let record = r.unwrap();
        out.write(&record).unwrap();
    }
    Ok(())
}

#[pymodule]
fn deidentify_utils(m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(parse_bam, m)?)
}

The input.bam file contains both a header and records (I verified this using samtools view input.bam & pysam). However, the output.bam file only contains the header, and all the records are missing.
Is there something I am missing in how rust-htslib handles BAM writing? Any help or suggestions would be greatly appreciated!

Rust BAM rust_htslib • 7.8k views
ADD COMMENT
0
Entering edit mode

Cross-posted to: https://stackoverflow.com/questions/79737920/rust-htslib-script-only-outputs-bam-header-records-are-missing

If you get an answer there please come back and post it here.

ADD REPLY
4
Entering edit mode
7 weeks ago
plusone ▴ 50

I figured out why all BAM records were missing. It turned out to be a configuration issue rather than a coding bug.

I’m using maturin + PyO3 to build a Rust-based Python extension module, and uv to manage the Python environment. My workflow was:

maturin develop --uv

This builds the Rust module without error. However, when running Python scripts, uv kept using an old build of the Rust code instead of picking up my changes. The issue is that uv uses aggressive caching to avoid re-downloading and re-building dependencies it has already seen. As a result, uv kept serving the old build of my Rust extension, no matter what code changes I made.

The fix was to tell uv when it should invalidate the cache and rebuild. I added the following to my pyproject.toml:

[tool.uv]
# Rebuild if config or any Rust source changes
cache-keys = [
  {file = "pyproject.toml"},
  {file = "rust/Cargo.toml"},
  {file = "**/*.rs"}
]

# Build Rust code in development mode
config-settings = { build-args = "--profile=dev" }

Now, when I run:

uv run test_script.py

uv detects changes in the Rust sources and rebuilds the Rust-based extension automatically. With this in place, the Rust functions are correctly exposed to Python and work as expected.

ADD COMMENT

Login before adding your answer.

Traffic: 3044 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6