cageR: errors when trying to use clusterCTSS
0
0
Entering edit mode
4.7 years ago
octaviamdl ▴ 10

Hi everyone!

I'm trying to use the cageR package to analyze some CAGE-seq data (paired-end), however, the clusterCTSS function doesn't seem to work for me.

Here is the code that I used:

cage_bam <- new("CAGEset", genomeName = "BSgenome.Hsapiens.UCSC.hg19", inputFiles = bam_file, inputFilesType = "bamPairedEnd", sampleLabels = c('CTR','A','B','C'))

getCTSS(cage_bam, mappingQualityThreshold=20, sequencingQualityThreshold = 20)

#also tried this without these thresholds, but it still doesn't work :(

cage_bam_ctss <- CTSStagCount(cage_bam)
normalizeTagCount(cage_bam, method = "simpleTpm")

#also tried this without this normalization step, still doesn't work :(

clusterCTSS(object = cage_bam, method = "paraclu")


I got the following error message:

Filtering CTSSs below threshold...

Clustering...

    -> CTR
    -> A
Error in `[.data.frame`(data, , c("chr", "pos", "strand", s)) : 
  undefined columns selected

I tried changing the method to distclu, but then I got this error message:

Filtering CTSSs below threshold...

Clustering...

Error in .validate_names(colnames, ans_colnames, "assay colnames()", "colData rownames()") : 
  assay colnames() must be NULL or identical to colData rownames()

If anyone has any experience with this, or knows what I could try to get this to work, your help would be really appreciated!!!

EDITED to add what cage_bam looks like after running get_ctss:

       cage_bam

    S4 Object of class CAGEset

=======================================
Input data information
=======================================
Reference genome (organism): BSgenome.Hsapiens.UCSC.hg19
Input file type: bamPairedEnd
Input file names: /Users/odancu/Downloads/CAGE/BK_CTR-ready.bam, /Users/odancu/Downloads/CAGE/BK_A-ready.bam, /Users/odancu/Downloads/CAGE/BK_B-ready.bam, /Users/odancu/Downloads/CAGE/C-ready.bam
Sample labels: CTR, A, B, C
=======================================
CTSS information
=======================================
CTSS chromosome: chr1, chr1, chr1, ...
CTSS position: 15797, 17556, 29322, ...
CTSS strand: +, -, +, ...
Tag count:
    -> CTR: 0, 0, 0, ...
    -> A: 0, 0, 0, ...
    -> B: 0, 1, 0, ...
    -> C: 1, 0, 1, ...
Normalized tpm:
=======================================
Tag cluster (TC) information
=======================================
CTSS clustering method: 
Number of TCs per sample:
=======================================
Consensus cluster information
=======================================
Number of consensus clusters:
Consensus cluster chromosome:
Consensus cluster start:
Consensus cluster end:
Consensus cluster strand:
Normalized tpm:
=======================================
Expression profiling
=======================================
Expression clustering method: 
Expression clusters for consensus clusters:
=======================================
Promoter shifting
=======================================
GroupX:
GroupY:
Shifting scores:
KS p-values (FDR adjusted):

UPDATE: I tried applying clusterCTSS to the zebrafish sample dataset provided with the package, following the vignette, and Paraclu still has issues (albeit different ones), but distclu works well.

clusterCTSS(object = ce, method = "paraclu")

Filtering out CTSSs below threshold...
Clustering...
    -> Zf.30p.dome
Error in .SummarizedExperiment.charbound(j, colnames(x), fmt) : 
  <RangedSummarizedExperiment>[,j] index out of bounds: chr pos strand

clusterCTSS(object = ce, method = "distclu")

Filtering out CTSSs below threshold...
Clustering...
    -> Zf.30p.dome
    -> Zf.high
    -> Zf.prim6.rep1
    -> Zf.prim6.rep2
    -> Zf.unfertilized.egg

My original code generated a CAGEset object, but I made a CAGEexp object and tried clusterCTSS again, and while paraclu gave the following error message:

clusterCTSS(object = cage_bam, method = "paraclu")

Filtering out CTSSs below threshold...
Clustering...
    -> CTR
Error in .SummarizedExperiment.charbound(j, colnames(x), fmt) : 
  <RangedSummarizedExperiment>[,j] index out of bounds: chr pos strand

Distclu worked well. I think the issue might be with how clusterCTSS handles CAGEset objects, or how paraclu handles inputs. While this now works, it's somewhat less than ideal, given that later analyses that I want to do on this clustered data, like running the function getExpressionProfiles, don't work on CAGEexp objects. If anyone had success using clusterCTSS on CAGEset objects, please let me know!

CAGE-Seq RNA-Seq R • 1.4k views
ADD COMMENT
0
Entering edit mode

Hi, could you show what cage_bam looks like after running getCTSS? I believe the fields chr, pos and strand are populated as part of it so the issue might be coming from this step. Are your input files publicly available?

ADD REPLY
0
Entering edit mode

Hi lu.ne! No, my input files aren't publicly available. I've edited my post above to update what it looks like right after running getCTSS, prior to tpm normalization.

ADD REPLY
1
Entering edit mode

It looks OK, I'm afraid I can't really reproduce your error... Are you using CAGEr_1.26.0 (it should be the last version)?

ADD REPLY
0
Entering edit mode

It looks like it's CAGEr_1.24.0 . I'll try to update the version and see if the issue goes away. Thanks! :)

ADD REPLY
0
Entering edit mode

If it does not work you could try changing the values of mappingQualityThreshold and sequencingQualityThreshold, it might be that too many things are filtered. Otherwise, if you can reproduce the error with a small dummy dataset and share it here I'm sure we could help.

ADD REPLY

Login before adding your answer.

Traffic: 1569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6