Question: Performance of circRNA detection tools
4
gravatar for Graslevy
5.5 years ago by
Graslevy240
UK
Graslevy240 wrote:

Hey Guys,

Using simulated data, I have compared 2 recent methods for identifying circRNAs, CIRI & CIRCexplorer (tophat fusion version). I find both to have weird behaviors and would like to know reasons for these.

Firstly, simulated data was generated from previously identified structures in circbase.org at depths of 2, 10 and 25 (100bp reads). Both methods were used to analyse these datasets to assess their performance, results are below:

Data CIRI CIRCexplorer
depth - 2 3195 725
depth - 10 4397 121
depth - 25 4400 109

As you can see, CIRCexplorer appears less sensitive but very specific. However, the behavior that concerns me is that: with high coverage, less circRNAs were identified. Is there a reason for this? Parameters used are those shown in: https://github.com/YangLab/CIRCexplorer. Please advise on suitable parameters that rectifies this behavior. 

CIRI is however not perfect. It currently appears impossible to run CIRI on real data with over 300 million reads. It halts every time (after about 10 hrs), apparently due to high memory consumption, does anyone know a way round this? I must add that it is also less specific, identifying circRNAs not expected in the simulated dataset. 

Please find simulated data here: https://www.dropbox.com/s/r5ms1zymngk0oyy/simulated_data.tar.gz?dl=0

Note: Other methods were also assessed (including methods using STAR aligner). My question is specific to these two methods, because of the weird behaviors mentioned above.

rna-seq circrna • 3.3k views
ADD COMMENTlink modified 5.3 years ago by Biostar ♦♦ 20 • written 5.5 years ago by Graslevy240
2

I have also tested CIRI, CIRCexplorer (STAR mapping), and find_circ, a about half a year ago with real data (Jeck et al 2013). I have to go back to my notes to provide some more insight, but I can confirm this:

It currently appears impossible to run CIRI on real data with over 300 million reads. It halts every time (after about 10 hrs), apparently due to high memory consumption, does anyone know a way round this?

I tried giving it ~150GB RAM, and it still crashed after running for a couple of days, and I do not have a way round it. According to the authors, which I contacted, they have developed CIRI using RNA-seq datasets not enriched for circRNAs, and thus the reasonable memory requirements mentioned in their paper. It seems that when using circRNA-enriched (RNAseR treated) samples, the memory consumption shots up.

ADD REPLYlink modified 14 months ago by _r_am32k • written 5.5 years ago by A. Domingues2.4k

I agree with you. The CIRI authors mention memory consumption ~20% of the size of the SAM file. I find that this is not the case. For a SAM file of about 135GB, it is currently using up to 70GB of RAM and was halted overnight on a node with 100GB of RAM.

It is a good tool, but the memory consumption makes it impossible to use. Quick question, did you ever use CIRCexplorer (tophat fusion version)? I am curious to know why sensitivity drops with increased coverage, thank you.

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by Graslevy240

CIRCexplorer (tophat fusion version)? I am curious to know why sensitivity drops with increased coverage, thank you.

No. I tried it but kept running into mapping errors. In the meantime the developers of CIRCexplorer added an option for STAR and I went with that. Also I did not test sensitivity in that manner. I was more interested in whether the tools could find the experimentally validated circRNAs (they did). That said, CIRCexplorer found a reasonably large, and comparable number of circRNAs to find_circ.

ADD REPLYlink written 5.5 years ago by A. Domingues2.4k

Have u simulated the data or taken simulated data from any publication ? 

ADD REPLYlink written 5.5 years ago by geek_y11k

Hello Geek_y,  I have generated simulated data from 4500 published circRNAs. CIRI found most of the structures with up to 125 false positives (structures not expected). I have linked some of the FASTQ data for anyone to re-analyse, I can send more privately.

ADD REPLYlink written 5.5 years ago by Graslevy240

From my analysis results from simulation data, the depth has no big influence on the CIRCexplorer final results. However if the depth is too low, the bias would be great. In addition, I think you should use some experimentally validated circRNAs to do the simulation (for example, circRNAs detected in RNaseR RNAseq). The development of CIRCexplorer is to increasingly  improve sensitivity on the premise of high specificity.

ADD REPLYlink written 5.3 years ago by kepbod90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1100 users visited in the last hour