ATAC seq Normalization and analysis pipeline problems
1
0
Entering edit mode
18 months ago

Hi all, I am following the ATACseq normalization and analysis pipeline outlined in this manuscript Reske et. al, 2020, and I am at the point of processing the MACS2 output in R. I used the following commands to specify my bamfiles, set the discard parameter to the blacklist for mm10, restrict the regions to the standard chromosomes, and set the parameter for the command:

# specify paired end bams
pe.bams.DN3 <- c("plusminus_DN3_1_sorted.sub.filt.noMT.bam", "plusminus_DN3_2_sorted.sub.filt.noMT.bam",
"mutminus_DN3_1_sorted.sub.filt.noMT.bam", "mutminus_DN3_2_sorted.sub.filt.noMT.bam")

# read blacklist
blacklist <- read.table("mm10-blacklist.v2.bed", sep="\t")
colnames(blacklist) <- c("chrom", "start", "end")
blacklist <- GRanges(blacklist)

# define read parameters
standard.chr <- paste0("chr", c(1:19, "X", "Y")) # only use standard chromosomes
param <- readParam(max.frag=1000, pe="both", discard=blacklist, restrict=standard.chr)

# count reads in windows specified by MACS2
peak.counts.DN3 <- regionCounts(pe.bams.DN3, all.peaks.DN3, param=param)

However, when I run the regionCounts command I get the following error:

peak.counts.DN3 <- regionCounts(pe.bams.DN3, all.peaks.DN3, param=param)

Error: BiocParallel errors
1 remote errors, element index: 1
3 unevaluated and other errors
first remote error:
Error in .extractPE(bam.file, where = where, param = param): position vector should be 1-based

When I enter traceback() I get the following:

traceback()

7: stop(.error_bplist(res))

6: .bpinit(manager = manager, X = X, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM,
BPOPTIONS = BPOPTIONS, BPREDO = BPREDO)

5: bplapply(X = seq_along(ddd[[1L]]), wrap, .FUN = FUN, .ddd = ddd,
.MoreArgs = MoreArgs, BPREDO = BPREDO, BPPARAM = BPPARAM,
BPOPTIONS = BPOPTIONS)

4: bplapply(X = seq_along(ddd[[1L]]), wrap, .FUN = FUN, .ddd = ddd,
.MoreArgs = MoreArgs, BPREDO = BPREDO, BPPARAM = BPPARAM,
BPOPTIONS = BPOPTIONS)

3: bpmapply(FUN = .region_counts, bam.file = bam.files, init.ext = ext.data$ext,
MoreArgs = list(where = where, param = param, final.ext = ext.data$final,
outlen = outlen, regions = regions, chosen = chosen),
BPPARAM = BPPARAM, SIMPLIFY = FALSE)

2: bpmapply(FUN = .region_counts, bam.file = bam.files, init.ext = ext.data$ext,
MoreArgs = list(where = where, param = param, final.ext = ext.data$final,
outlen = outlen, regions = regions, chosen = chosen),
BPPARAM = BPPARAM, SIMPLIFY = FALSE)

1: regionCounts(pe.bams.DN3, all.peaks.DN3, param = param)

Is this an issue with my bam files or with BiocParallel? Do you have any suggestions for how to address this problem? Thank you in advance for your help.

Below is my session info:

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] csaw_1.30.1 SummarizedExperiment_1.26.1 Biobase_2.56.0 MatrixGenerics_1.8.1
[5] matrixStats_0.62.0 GenomicRanges_1.48.0 GenomeInfoDb_1.32.4 IRanges_2.30.1
[9] S4Vectors_0.34.0 BiocGenerics_0.42.0

loaded via a namespace (and not attached):
[1] Rcpp_1.0.9 rstudioapi_0.14 edgeR_3.38.4 XVector_0.36.0 zlibbioc_1.42.0
[6] BiocParallel_1.30.3 lattice_0.20-45 tools_4.2.1 parallel_4.2.1 grid_4.2.1
[11] crayon_1.5.2 Matrix_1.5-1 GenomeInfoDbData_1.2.8 BiocManager_1.30.18 bitops_1.0-7
[16] codetools_0.2-18 RCurl_1.98-1.8 limma_3.52.4 DelayedArray_0.22.0 compiler_4.2.1
[21] Biostrings_2.64.1 Rsamtools_2.12.0 metapod_1.4.0 locfit_1.5-9.6

Thank you in advance for any help!

csaw R ATAC-seq • 1.6k views
ADD COMMENT
0
Entering edit mode

Hi there! I am encountering the same problem. I used this command start(Gr) <- start(gr)+1 on both my blacklist and all.peaks GRanges object but i am still getting the following error:

peak.counts <- regionCounts(pe.bams, all.peaks, param = param) Error: BiocParallel errors 1 remote errors, element index: 1 2 unevaluated and other errors first remote error: Error: position vector should be 1-based

Your help will be greatly appreciated!

ADD REPLY
0
Entering edit mode

Like in the other conversation, this is the problem. Somewhere there is a zero as start coordinate that is not supposed to be there. Can you show full code?

ADD REPLY
0
Entering edit mode

Thanks for your prompt response!

 all.peaks <- GRanges(all.peaks)
 start(all.peaks) <- start(all.peaks) + 1

pe.bams <- c('ATAC31_sub.sorted.noDups.filt.noMT.bam ', 'ATAC32_sub.sorted.noDups.filt.noMT.bam', 'ATAC33_sub.sorted.noDups.filt.noMT.bam')

#read hg38 blacklist
blacklist <- read.table('/scratch/vgrjus001/Genrich/hg38-blacklist.v2.bed', sep = "\t")
colnames(blacklist) <- c('chrom', 'start', 'end')
blacklist <- GRanges(blacklist)
start(blacklist) <- start(blacklist)+1

#define read parameters
standard.chr <- paste0("chr", c(1:22, "X", "Y"))
param <- readParam(max.frag = 1000, pe = "both", discard = blacklist, restrict = standard.chr)

#count reads in windows specified by MACS2
peak.counts <- regionCounts(pe.bams, all.peaks, param = param)
ADD REPLY
0
Entering edit mode
18 months ago
ATpoint 82k

position vector should be 1-based

The GRanges world is 1-based so start coordinates can be 1 at minimum. Your imported bed files might have zero as start coordinates somewhere. You would need to fix that by adding 1 to the start to convert from zero to one-based coordinate system.

Edit: Ah here, googling the error yields https://github.com/LTLA/csaw/issues/6 with the same explanation from the developer of csaw.

ADD COMMENT
0
Entering edit mode

Great, thank you ATpoint! Now the follow up question is: Do you know if there is a straightforward way to do this conversion, either in R or on the command line?

ADD REPLY
0
Entering edit mode

Just add 1 to the start column, or if it is already GRanges, do something like start(gr) <- start(gr)+1.

ADD REPLY
0
Entering edit mode

Gotcha, but these are actually bam files, not bed files. The regionCounts command in csaw asks for bam files as input; is there a way to add 1 to the position column of bam files?

Alternatively, you could provide regionCounts with a list of BamFile objects, which I would assume are GRanges objects of the bam files, but I may be wrong about that assumption? I tried importing the bam files into R as a GRanges object using the following command:

plusminus_DN3_1_sorted.sub.filt.noMT.bam <- granges(import("plusminus_DN3_1_sorted.sub.filt.noMT.bam"))

but I'm not sure if I did it right as I got the following error when I tried to run the regionCounts command:

Error in getListElement(x, i, ...) : GRanges objects don't support [[, as.list(), lapply(), or unlist() at the moment

ADD REPLY
0
Entering edit mode

The issue is not bam files. It is the peak files. Don’t convert bam to granges, that makes little sense.

ADD REPLY
1
Entering edit mode

Got it, I see what you're saying, sorry I misunderstood at first. There was indeed a 0 in the start column of the blacklist bed file. Problem solved, thank you!!

ADD REPLY

Login before adding your answer.

Traffic: 2685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6