Bam files not accepted in dba.count
1
0
Entering edit mode
7 months ago

The script I had working 2 years ago for DiffBind no longer works. Dba.count seems to reject files it accepted before.

   sets_counted_g=dba.count(sets_g)
Computing summits...
Sample: H:/(filepath truncated)/file_sorted.bam125
Re-centering peaks...
Indexing H:/(filepath truncated)/file_sorted.bam
[E::hts_idx_push] NO_COOR reads not in a single block at the end 2 -1
Error in FUN(X[[i]], ...) : failed to build index
file: H:/CH_DATA/ATAC/2020_Second_postprocessing/bam/PoolCH-7_sorted.bam
Parallel execution unavailable: executing serially.


I thought I'd sorted this file by position, but I went ahead and re-sorted it using

samtools sort file_sorted.bam -o file_sorted_byposition.bam


The result still fails to count:

sets_counted_g=dba.count(sets_g)
Computing summits...
Sample: H:/(filepath truncated)/file_sorted_byposition.bam125
Re-centering peaks...
Error in colnames(result)[4:ncol(result)] <- colnames(classes) :
replacement has length zero
Parallel execution unavailable: executing serially


Tagged this with both samtools and Diffbind because it looks like a samtools problem, but the script worked as-is 2 years ago for DiffBind. So maybe DiffBind is involved. Picard ValidateSamFile returned no errors on the file.

samtools diffbind dba.count bam • 701 views
0
Entering edit mode

what is the output of

samtools quickcheck file_sorted_byposition.bam


is file_sorted_byposition.bam indexed with samtools index ?

0
Entering edit mode

Thanks Pierre,

Quickcheck exited without error. I did generate an index for file_sorted_byposition.bam, but DiffBind doesn't call for indexes.

0
Entering edit mode

The same error appears when a sampleSheet with only one line is used.

Reproducible from the vignette data:

library(DiffBind)
library(QuantitativeChIPseqWorkshop)

setwd(paste(system.file("extdata", package="QuantitativeChIPseqWorkshop")))

tam.peaks <- dba(sampleSheet=samples[1,])

tam.counts <- dba.count(tam.peaks)

Computing summits...
Re-centering peaks...
Error in colnames(result)[4:ncol(result)] <- colnames(classes) :    replacement has length zero


Is it the expected behaviour?

0
Entering edit mode

I can't say I'm surprised, as the package is meant to compare samples, so there's not much point in running it with only 1. You might try posting over on the Bioconductor support forum, as the package dev is active there. They may want to catch that error so that it's more informative.

0
Entering edit mode
5 months ago
Rory Stark ★ 1.2k

I can verify that this is a bug and not expected behavior. One of those edge cases in R where something that is usually a matrix gets turned into a vector when there is only one column. I'll address this as a bugfix after the Bioconductor 2.13 release. I'll update this issue here when a fix is checked in.

Regarding index files, DiffBind does expect there to be matching index files for each of the bam files. If they are missing, it will attempt to create new ones; if this fails, so will read counting.