Bam files not accepted in dba.count
1
0
Entering edit mode
4 months ago

The script I had working 2 years ago for DiffBind no longer works. Dba.count seems to reject files it accepted before.

   sets_counted_g=dba.count(sets_g)
Computing summits...
Sample: H:/(filepath truncated)/file_sorted.bam125  
Re-centering peaks...
Indexing H:/(filepath truncated)/file_sorted.bam
[E::hts_idx_push] NO_COOR reads not in a single block at the end 2 -1
Error in FUN(X[[i]], ...) : failed to build index
  file: H:/CH_DATA/ATAC/2020_Second_postprocessing/bam/PoolCH-7_sorted.bam
In addition: Warning message:
Parallel execution unavailable: executing serially.

I thought I'd sorted this file by position, but I went ahead and re-sorted it using

samtools sort file_sorted.bam -o file_sorted_byposition.bam

The result still fails to count:

sets_counted_g=dba.count(sets_g)
Computing summits...
Sample: H:/(filepath truncated)/file_sorted_byposition.bam125 
Re-centering peaks...
Error in colnames(result)[4:ncol(result)] <- colnames(classes) : 
  replacement has length zero
In addition: Warning message:
Parallel execution unavailable: executing serially

Tagged this with both samtools and Diffbind because it looks like a samtools problem, but the script worked as-is 2 years ago for DiffBind. So maybe DiffBind is involved. Picard ValidateSamFile returned no errors on the file.

samtools diffbind dba.count bam • 461 views
ADD COMMENT
0
Entering edit mode

what is the output of

samtools quickcheck file_sorted_byposition.bam

is file_sorted_byposition.bam indexed with samtools index ?

ADD REPLY
0
Entering edit mode

Thanks Pierre,

Quickcheck exited without error. I did generate an index for file_sorted_byposition.bam, but DiffBind doesn't call for indexes.

ADD REPLY
0
Entering edit mode

The same error appears when a sampleSheet with only one line is used.

Reproducible from the vignette data:

library(DiffBind)
library(QuantitativeChIPseqWorkshop)

setwd(paste(system.file("extdata", package="QuantitativeChIPseqWorkshop")))

samples <- read.csv("tamoxifen.csv")

tam.peaks <- dba(sampleSheet=samples[1,])

tam.counts <- dba.count(tam.peaks)

Computing summits... 
Re-centering peaks... 
Error in colnames(result)[4:ncol(result)] <- colnames(classes) :    replacement has length zero

Is it the expected behaviour?

ADD REPLY
0
Entering edit mode

I can't say I'm surprised, as the package is meant to compare samples, so there's not much point in running it with only 1. You might try posting over on the Bioconductor support forum, as the package dev is active there. They may want to catch that error so that it's more informative.

ADD REPLY
0
Entering edit mode
9 weeks ago
Rory Stark ★ 1.1k

I can verify that this is a bug and not expected behavior. One of those edge cases in R where something that is usually a matrix gets turned into a vector when there is only one column. I'll address this as a bugfix after the Bioconductor 2.13 release. I'll update this issue here when a fix is checked in.

Regarding index files, DiffBind does expect there to be matching index files for each of the bam files. If they are missing, it will attempt to create new ones; if this fails, so will read counting.

ADD COMMENT

Login before adding your answer.

Traffic: 1592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6