Using simulated data, I have compared 2 recent methods for identifying circRNAs, CIRI & CIRCexplorer (tophat fusion version). I find both to have weird behaviors and would like to know reasons for these.
Firstly, simulated data was generated from previously identified structures in circbase.org at depths of 2, 10 and 25 (100bp reads). Both methods were used to analyse these datasets to assess their performance, results are below:
Data CIRI CIRCexplorer depth - 2 3195 725 depth - 10 4397 121 depth - 25 4400 109
As you can see, CIRCexplorer appears less sensitive but very specific. However, the behavior that concerns me is that: with high coverage, less circRNAs were identified. Is there a reason for this? Parameters used are those shown in: https://github.com/YangLab/CIRCexplorer. Please advise on suitable parameters that rectifies this behavior.
CIRI is however not perfect. It currently appears impossible to run CIRI on real data with over 300 million reads. It halts every time (after about 10 hrs), apparently due to high memory consumption, does anyone know a way round this? I must add that it is also less specific, identifying circRNAs not expected in the simulated dataset.
Please find simulated data here: https://www.dropbox.com/s/r5ms1zymngk0oyy/simulated_data.tar.gz?dl=0
Note: Other methods were also assessed (including methods using STAR aligner). My question is specific to these two methods, because of the weird behaviors mentioned above.