Question

CIRCexplorer2 only looks for exonic circRNA

1

Entering edit mode

5.5 years ago

t.blasco95 ▴ 30

I'm working on comparing circRNAs obtained via microarray analysis and RNA-Seq for a given group of multiple sclerosis human samples.

Microarray analysis categorize the different circRNAs as exonic, intronic or sense overlapping (if the circRNA transcribed from the same gene locus).

For RNA-Seq analysis I align the reads with STAR and annotate the circRNAs with CIRCexplorer2 (just how it is described in the CIRCexplorer2 home page: https://circexplorer2.readthedocs.io/en/latest/ . I downloaded the hg19 reference and annotation files from USCS (fasta, gtf and genePred).

When I compared both results, I found out that no circRNA categorized as intronic or sense overlapping was detected by CIRCexplorer2 algorithm.

Does anyone have ever a problem like this? If yes, does anyone know if it is because of CIRCexplorer2 itself or maybe different annotation files are needed? Also, if somebody knows about other circrna prediction tools that avoid this problem could help me.

Thanks

RNA-Seq circRNA CIRCexplorer circular • 2.2k views

ADD COMMENT • link updated 5.5 years ago by IP ▴ 760 • written 5.5 years ago by t.blasco95 ▴ 30

score 2 · Answer 1 · 2018-10-22

2

Entering edit mode

5.5 years ago

IP ▴ 760

kaixo t.blasco95!

I have used CIRCexplorer2 a couple of times in the past and it did provide me information about exonic circRNA and intronic circRNA. I have check in my old outputs and that iinformation is encoded in the 14th column of the output (the one called circType). You can use the bash one-liner below to check it. It will tell you the number of intronic circRNA (encoded as ciRNA) and circRNAs in your file

 cat some_file_known_circrna.bed | gawk '{print $14}' | sort | uniq -c

PS: say hi to rest of the group :)

ADD COMMENT • link 5.5 years ago by IP ▴ 760

1

Entering edit mode

Thanks! I was just comparing microarray and RNASeq results and didn't notice that CIRCexplorer2 detects intronic circRNA. Another related thing, should I understand that CIRCexplorer2 compares the obtained back spliced junctions with CIRCpedia?

ADD REPLY • link 5.5 years ago by t.blasco95 ▴ 30

1

Entering edit mode

Hi again :)

No, CIRCexplorer2 takes two inputs:

- The backspliced junctions identified  by STAR ( Backspliced junction coordinates)
-  A gene annotation file ( file with gene coordinates)

In this step, what CIRCexplorer2 does is assigning you raw signal (the backspliced read coordinates) to the genes on the genome (the gene annotation file). To do so, it is basically comparing the alignment coordinates of your reads, to the annotation of the genome. If they match, the number of backspliced junctions is assigned to the matched transcript

Let me now if this helps, if not we can chat using other means

cheers,

ADD REPLY • link 5.5 years ago by IP ▴ 760

1

Entering edit mode

Hi! Yes, it helps.

Only one more doubt. Since my gene annotation file only stores information about exon's start and end position I would never find any circRNA that splits an exon, would I?

For example, RPPH1 is a gene with only one exon that has many circRNA annotated inside it (Circbase provides many splice sites inside it that form circRNA). In fact, in the back_spliced_junction file generated by CIRCExplorer2 I have found some of them that will not be annotated as circRNA in the next step.

Is there any gene annotation file that could solve this problem? Maybe the best solution is just compare those back spliced junctions with circRNA databases and classify them as circRNA.

cheers,

ADD REPLY • link 5.5 years ago by t.blasco95 ▴ 30

1

Entering edit mode

Is there any gene annotation file that could solve this problem? Maybe the best solution is just compare those back spliced junctions with circRNA databases and classify them as circRNA.

In this case I would go for De novo indentification of circRNA: In this scenario, you scan the reads for backspliced reads without no use of information in the genome annotation. The pros of this are that you will indentify circRNAs that do not follow the reference (obvious).The cons is that you will select for a lot of noise and might end up on a rabbit hole, and eventually going back to CIRCexplorer2. The use of this tools depends on which scientific question you want to answer.

As far as I am aware, the tools that do this are CIRI and segemehl. But there might be a recent software. You should check that.

I have never use CIRI, but my experience with segemehl is that it has a really good sensitivity, with a cost of a high false positive rate, see this paper

Hope this helps,

ADD REPLY • link 5.5 years ago by IP ▴ 760