when I use htslib to write a bam. Error of "truncated file" shows by samtools
1
1
Entering edit mode
8 weeks ago
Aaron ▴ 10

when I use htslib to write a bam. Error of "truncated file" shows by samtools following is the code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <htslib/sam.h>

int main() {
    // Open BAM file for writing
    htsFile *fp = hts_open("output.bam", "wb");
    if (fp == NULL) {
        fprintf(stderr, "Error opening file for writing.\n");
        return 1;
    }
// Create and populate BAM header
bam_hdr_t *hdr = bam_hdr_init();
hdr->n_targets = 1;
hdr->target_len = (uint32_t*)malloc(hdr->n_targets * sizeof(uint32_t));
hdr->target_name = (char**)malloc(hdr->n_targets * sizeof(char*));
hdr->target_name[0] = strdup("chr1");
hdr->target_len[0] = 248956422;
hdr->text = strdup("@HD\tVN:1.0\tSO:coordinate\n@SQ\tSN:chr1\tLN:248956422\n");
hdr->l_text = strlen(hdr->text);

// Write BAM header
if (sam_hdr_write(fp, hdr) < 0) {
    fprintf(stderr, "Error writing header.\n");
    return 1;
}

// Create and populate BAM alignment record
bam1_t *b = bam_init1();
b->core.tid = 0;     // Reference sequence ID
b->core.pos = 100;   // 0-based leftmost coordinate
b->core.qual = 60;   // Mapping quality
b->core.l_qname = 7; // Length of the query name
b->core.flag = 0;    // Bitwise flag
b->core.n_cigar = 1; // Number of CIGAR operations
b->core.l_qseq = 10; // Length of the query sequence
b->core.mtid = -1;   // Mate reference sequence ID (-1 if no mate)
b->core.mpos = -1;   // Mate position (-1 if no mate)
b->core.isize = 0;   // Insert size

// Calculate required data length correctly
int data_len = b->core.l_qname + b->core.n_cigar * 4 + (b->core.l_qseq + 1) / 2 + b->core.l_qseq;
b->data = (uint8_t*)malloc(data_len);
b->m_data = data_len;

// Set query name
strcpy((char*)b->data, "query1");

// Set CIGAR (1 match operation)
uint32_t *cigar = (uint32_t*)(b->data + b->core.l_qname);
cigar[0] = bam_cigar_gen(b->core.l_qseq, BAM_CMATCH);

// Set query sequence (e.g., "AGCTTAGCTA")
uint8_t *seq = bam_get_seq(b);
bam_seq_seti(seq, 0, 1); // A
bam_seq_seti(seq, 1, 2); // G
bam_seq_seti(seq, 2, 1); // C
bam_seq_seti(seq, 3, 4); // T
bam_seq_seti(seq, 4, 4); // T
bam_seq_seti(seq, 5, 1); // A
bam_seq_seti(seq, 6, 2); // G
bam_seq_seti(seq, 7, 1); // C
bam_seq_seti(seq, 8, 4); // T
bam_seq_seti(seq, 9, 1); // A

// Set quality scores (dummy values here)
uint8_t *qual = bam_get_qual(b);
for (int i = 0; i < b->core.l_qseq; ++i) qual[i] = 30;

// Write BAM alignment record
if (sam_write1(fp, hdr, b) < 0) {
    fprintf(stderr, "Error writing alignment.\n");
    return 1;
}

// Write the EOF marker
if (bgzf_write(fp->fp.bgzf, BAM_EOF, sizeof(BAM_EOF)) != sizeof(BAM_EOF)) {
    fprintf(stderr, "Error writing EOF marker.\n");
    return 1;
}

// Clean up
bam_destroy1(b);
bam_hdr_destroy(hdr);
hts_close(fp);

return 0;
}
bam htslib • 359 views
ADD COMMENT
3
Entering edit mode
8 weeks ago

There are several problems with this code:

  1. You have not set b->l_data, which here should be set to the same value as b->m_data. This is the cause of your truncated output: m_data measures the maximum size of the b->data buffer and l_data measures the amount used by this BAM record. As l_data is zero, sam_write1() has not written your whole record.
  2. You never need to write fp->fp.bgzf or the like to peer into an htsFile, so your code to write the EOF marker is wrong. In fact the EOF marker is automatically written for you by hts_close().
  3. The values in your bam_set_seqi() calls (which somehow have been typoed in your listing) do not correspond to your AGCTTAGCTA sequence. For this function, 1=A 2=C 4=G 8=T. Admittedly this is not well documented.
  4. You have not set b->core.bin, though admittedly that is infrequently used.

But the main problem is that you are very much doing this the hard way. The recommended way to create a sam_hdr_t from existing text is sam_hdr_parse() and the recommended way to cons up a bam1_t from a bunch of fields is bam_set1(). So your code could be much simpler:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "htslib/sam.h"

int main() {
    // Open BAM file for writing
    htsFile *fp = hts_open("output3.bam", "wb");
    if (fp == NULL) {
        fprintf(stderr, "Error opening file for writing.\n");
        return 1;
    }

// Create and populate BAM header
const char text[] = "@HD\tVN:1.0\tSO:coordinate\n@SQ\tSN:chr1\tLN:248956422\n";
bam_hdr_t *hdr = sam_hdr_parse(strlen(text), text);
if (hdr == NULL) { … }

// Write BAM header
if (sam_hdr_write(fp, hdr) < 0) {
    fprintf(stderr, "Error writing header.\n");
    return 1;
}

// Create and populate BAM alignment record
bam1_t *b = bam_init1();
if (b == NULL) { … }

// Set CIGAR (1 match operation)
uint32_t cigar[1];
cigar[0] = bam_cigar_gen(10, BAM_CMATCH);

// Set quality scores (dummy values here)
char qual[10];
for (int i = 0; i < 10; ++i) qual[i] = 30;

if (bam_set1(b, 6, "query1", 0, 0, 100, 60, 1, cigar, -1, -1, 0,
             10, "ACAGGACAGA", qual, 0) < 0) { … }

// Write BAM alignment record
if (sam_write1(fp, hdr, b) < 0) {
    fprintf(stderr, "Error writing alignment.\n");
    return 1;
}

// Clean up
bam_destroy1(b);
bam_hdr_destroy(hdr);
hts_close(fp);

return 0;
}
ADD COMMENT
0
Entering edit mode

Thanks very much.

ADD REPLY
0
Entering edit mode

Aaron : Please accept this answer (green check mark) to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 1131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6