I'm trying to use the cageR package to analyze some CAGE-seq data (paired-end), however, the clusterCTSS function doesn't seem to work for me.
Here is the code that I used:
cage_bam <- new("CAGEset", genomeName = "BSgenome.Hsapiens.UCSC.hg19", inputFiles = bam_file, inputFilesType = "bamPairedEnd", sampleLabels = c('CTR','A','B','C')) getCTSS(cage_bam, mappingQualityThreshold=20, sequencingQualityThreshold = 20) #also tried this without these thresholds, but it still doesn't work :( cage_bam_ctss <- CTSStagCount(cage_bam) normalizeTagCount(cage_bam, method = "simpleTpm") #also tried this without this normalization step, still doesn't work :( clusterCTSS(object = cage_bam, method = "paraclu") I got the following error message: Filtering CTSSs below threshold... Clustering... -> CTR -> A Error in `[.data.frame`(data, , c("chr", "pos", "strand", s)) : undefined columns selected
I tried changing the method to distclu, but then I got this error message:
Filtering CTSSs below threshold... Clustering... Error in .validate_names(colnames, ans_colnames, "assay colnames()", "colData rownames()") : assay colnames() must be NULL or identical to colData rownames()
If anyone has any experience with this, or knows what I could try to get this to work, your help would be really appreciated!!!
EDITED to add what cage_bam looks like after running get_ctss:
cage_bam S4 Object of class CAGEset ======================================= Input data information ======================================= Reference genome (organism): BSgenome.Hsapiens.UCSC.hg19 Input file type: bamPairedEnd Input file names: /Users/odancu/Downloads/CAGE/BK_CTR-ready.bam, /Users/odancu/Downloads/CAGE/BK_A-ready.bam, /Users/odancu/Downloads/CAGE/BK_B-ready.bam, /Users/odancu/Downloads/CAGE/C-ready.bam Sample labels: CTR, A, B, C ======================================= CTSS information ======================================= CTSS chromosome: chr1, chr1, chr1, ... CTSS position: 15797, 17556, 29322, ... CTSS strand: +, -, +, ... Tag count: -> CTR: 0, 0, 0, ... -> A: 0, 0, 0, ... -> B: 0, 1, 0, ... -> C: 1, 0, 1, ... Normalized tpm: ======================================= Tag cluster (TC) information ======================================= CTSS clustering method: Number of TCs per sample: ======================================= Consensus cluster information ======================================= Number of consensus clusters: Consensus cluster chromosome: Consensus cluster start: Consensus cluster end: Consensus cluster strand: Normalized tpm: ======================================= Expression profiling ======================================= Expression clustering method: Expression clusters for consensus clusters: ======================================= Promoter shifting ======================================= GroupX: GroupY: Shifting scores: KS p-values (FDR adjusted):
I tried applying
clusterCTSS to the zebrafish sample dataset provided with the package, following the vignette, and Paraclu still has issues (albeit different ones), but distclu works well.
clusterCTSS(object = ce, method = "paraclu") Filtering out CTSSs below threshold... Clustering... -> Zf.30p.dome Error in .SummarizedExperiment.charbound(j, colnames(x), fmt) : <RangedSummarizedExperiment>[,j] index out of bounds: chr pos strand clusterCTSS(object = ce, method = "distclu") Filtering out CTSSs below threshold... Clustering... -> Zf.30p.dome -> Zf.high -> Zf.prim6.rep1 -> Zf.prim6.rep2 -> Zf.unfertilized.egg
My original code generated a CAGEset object, but I made a CAGEexp object and tried clusterCTSS again, and while paraclu gave the following error message:
clusterCTSS(object = cage_bam, method = "paraclu") Filtering out CTSSs below threshold... Clustering... -> CTR Error in .SummarizedExperiment.charbound(j, colnames(x), fmt) : <RangedSummarizedExperiment>[,j] index out of bounds: chr pos strand
Distclu worked well. I think the issue might be with how clusterCTSS handles CAGEset objects, or how paraclu handles inputs. While this now works, it's somewhat less than ideal, given that later analyses that I want to do on this clustered data, like running the function
getExpressionProfiles, don't work on CAGEexp objects. If anyone had success using clusterCTSS on CAGEset objects, please let me know!