Question: cageR: errors when trying to use clusterCTSS
0
gravatar for octaviamdl
8 days ago by
octaviamdl0
octaviamdl0 wrote:

Hi everyone!

I'm trying to use the cageR package to analyze some CAGE-seq data (paired-end), however, the clusterCTSS function doesn't seem to work for me.

Here is the code that I used:

cage_bam <- new("CAGEset", genomeName = "BSgenome.Hsapiens.UCSC.hg19", inputFiles = bam_file, inputFilesType = "bamPairedEnd", sampleLabels = c('CTR','A','B','C'))

getCTSS(cage_bam, mappingQualityThreshold=20, sequencingQualityThreshold = 20)

#also tried this without these thresholds, but it still doesn't work :(

cage_bam_ctss <- CTSStagCount(cage_bam)
normalizeTagCount(cage_bam, method = "simpleTpm")

#also tried this without this normalization step, still doesn't work :(

clusterCTSS(object = cage_bam, method = "paraclu")


I got the following error message:

Filtering CTSSs below threshold...

Clustering...

    -> CTR
    -> A
Error in `[.data.frame`(data, , c("chr", "pos", "strand", s)) : 
  undefined columns selected

I tried changing the method to distclu, but then I got this error message:

Filtering CTSSs below threshold...

Clustering...

Error in .validate_names(colnames, ans_colnames, "assay colnames()", "colData rownames()") : 
  assay colnames() must be NULL or identical to colData rownames()

If anyone has any experience with this, or knows what I could try to get this to work, your help would be really appreciated!!!

EDITED to add what cage_bam looks like after running get_ctss:

       cage_bam

    S4 Object of class CAGEset

=======================================
Input data information
=======================================
Reference genome (organism): BSgenome.Hsapiens.UCSC.hg19
Input file type: bamPairedEnd
Input file names: /Users/odancu/Downloads/CAGE/BK_CTR-ready.bam, /Users/odancu/Downloads/CAGE/BK_A-ready.bam, /Users/odancu/Downloads/CAGE/BK_B-ready.bam, /Users/odancu/Downloads/CAGE/C-ready.bam
Sample labels: CTR, A, B, C
=======================================
CTSS information
=======================================
CTSS chromosome: chr1, chr1, chr1, ...
CTSS position: 15797, 17556, 29322, ...
CTSS strand: +, -, +, ...
Tag count:
    -> CTR: 0, 0, 0, ...
    -> A: 0, 0, 0, ...
    -> B: 0, 1, 0, ...
    -> C: 1, 0, 1, ...
Normalized tpm:
=======================================
Tag cluster (TC) information
=======================================
CTSS clustering method: 
Number of TCs per sample:
=======================================
Consensus cluster information
=======================================
Number of consensus clusters:
Consensus cluster chromosome:
Consensus cluster start:
Consensus cluster end:
Consensus cluster strand:
Normalized tpm:
=======================================
Expression profiling
=======================================
Expression clustering method: 
Expression clusters for consensus clusters:
=======================================
Promoter shifting
=======================================
GroupX:
GroupY:
Shifting scores:
KS p-values (FDR adjusted):

UPDATE: I tried applying clusterCTSS to the zebrafish sample dataset provided with the package, following the vignette, and Paraclu still has issues (albeit different ones), but distclu works well.

clusterCTSS(object = ce, method = "paraclu")

Filtering out CTSSs below threshold...
Clustering...
    -> Zf.30p.dome
Error in .SummarizedExperiment.charbound(j, colnames(x), fmt) : 
  <RangedSummarizedExperiment>[,j] index out of bounds: chr pos strand

clusterCTSS(object = ce, method = "distclu")

Filtering out CTSSs below threshold...
Clustering...
    -> Zf.30p.dome
    -> Zf.high
    -> Zf.prim6.rep1
    -> Zf.prim6.rep2
    -> Zf.unfertilized.egg

My original code generated a CAGEset object, but I made a CAGEexp object and tried clusterCTSS again, and while paraclu gave the following error message:

clusterCTSS(object = cage_bam, method = "paraclu")

Filtering out CTSSs below threshold...
Clustering...
    -> CTR
Error in .SummarizedExperiment.charbound(j, colnames(x), fmt) : 
  <RangedSummarizedExperiment>[,j] index out of bounds: chr pos strand

Distclu worked well. I think the issue might be with how clusterCTSS handles CAGEset objects, or how paraclu handles inputs. While this now works, it's somewhat less than ideal, given that later analyses that I want to do on this clustered data, like running the function getExpressionProfiles, don't work on CAGEexp objects. If anyone had success using clusterCTSS on CAGEset objects, please let me know!

rna-seq cage-seq R • 110 views
ADD COMMENTlink modified 7 days ago • written 8 days ago by octaviamdl0

Hi, could you show what cage_bam looks like after running getCTSS? I believe the fields chr, pos and strand are populated as part of it so the issue might be coming from this step. Are your input files publicly available?

ADD REPLYlink written 7 days ago by lu.ne70

Hi lu.ne! No, my input files aren't publicly available. I've edited my post above to update what it looks like right after running getCTSS, prior to tpm normalization.

ADD REPLYlink modified 7 days ago • written 7 days ago by octaviamdl0
1

It looks OK, I'm afraid I can't really reproduce your error... Are you using CAGEr_1.26.0 (it should be the last version)?

ADD REPLYlink written 7 days ago by lu.ne70

It looks like it's CAGEr_1.24.0 . I'll try to update the version and see if the issue goes away. Thanks! :)

ADD REPLYlink written 7 days ago by octaviamdl0

If it does not work you could try changing the values of mappingQualityThreshold and sequencingQualityThreshold, it might be that too many things are filtered. Otherwise, if you can reproduce the error with a small dummy dataset and share it here I'm sure we could help.

ADD REPLYlink written 7 days ago by lu.ne70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1335 users visited in the last hour