Question: How to extract the circRNA sequences based on the the back-spliced junctions detected?
1
gravatar for zhuofei.xu
3.4 years ago by
zhuofei.xu10
zhuofei.xu10 wrote:

Dear All,

I have used several tools (findcirc, Circexplorer, DCC) to detect the circRNAs from mouse tissue. I have generated a list containing back-spliced junctions for each circRNA detected. An example for some circRNAs is given below

Chr Start End Strand

chr1 5089009 5098133 -

chr1 5093363 5133262 -

chr1 7120194 7120615 -

chr1 8414203 8448132 +

chr1 8554725 8595542 +

chr1 8554725 8607152 +

chr1 8583205 8595542 +

chr1 8583205 8607152 +

chr1 8624779 8682029 +

I'm wondering if there is a script or tool that can be used to extract the exon sequences of circRNAs detected. These circRNA sequences will be used for scanning miRNA binding sites.

The reference genome is mm10 and the gene annotation I used is gencode.vM14.annotation.gtf.

Any advice is very appreciated. Thank you very much!

Zhuofei

rna-seq • 2.0k views
ADD COMMENTlink modified 12 months ago by davidebarbagallo0 • written 3.4 years ago by zhuofei.xu10

Are you sure that the output is from circexplorer? circexplorer outputs exon information. Then, you could code a little bit or use existing utilities for dealing with that

ADD REPLYlink written 3.4 years ago by IP700

The output is from DCC. According to your suggestion, I have found the related output from circexplorer and used bedtools getfasta to get the circRNA sequence. Thanks a million!

ADD REPLYlink written 3.4 years ago by zhuofei.xu10

I am new in circRNA research. Could you please help me how I extract circRNA Sequences from the circexplorer output.

Thanks in advance.

ADD REPLYlink written 2.9 years ago by tofazzal.stat0

do you know how to program in python or use bash?

ADD REPLYlink written 2.9 years ago by IP700

Thanks for your reply. Yes I know python.

ADD REPLYlink written 2.9 years ago by tofazzal.stat0

Then, you can use pysam module,example:

import pysam as ps
genome_fa = '/path/to/genome/fasta'
fastafile = ps.FastaFile(genome_fa) 
sequence = fastafile.fetch(chr1,100, 200)

Of course, then you can get the complementary if it is the minus strand using biopython or coding by scratch

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by IP700

The sample output look like this: Chr Start of junction End of junction Circular RNA/Junction reads score Strand chrY 150833 159885 circular_RNA/2 1 + chrY 256250 258428 circular_RNA/1 0 - chrY 272139 273067 circular_RNA/2 0 - chrY 1455672 1456171 circular_RNA/1 1 - chrY 1490550 1497014 circular_RNA/1 1 - chrY 2111063 2111271 circular_RNA/4 0 - chrY 2134780 2159644 circular_RNA/2 0 - chrX 299512 302131 circular_RNA/1 0 - chrX 322139 323067 circular_RNA/1 0 - chrX 1505672 1506171 circular_RNA/1 1 -

That is, in you code 100 is the Start of junction and 200 is the End of Junction. Thank you very much.

ADD REPLYlink written 2.9 years ago by tofazzal.stat0

Hi, probably, I'm out of topic, but I'm not familiar with python. Do you know if a database with FASTA sequences of circRNAs backsplice junction does exist? If it is available, please, could you kind let me have the web address where I can retrieve these FASTA sequences? I would be grateful for your help. Best,

Davide

ADD REPLYlink written 12 months ago by davidebarbagallo0

Hi Davide, I have no idea. I encourage you to create a new post with that

ADD REPLYlink written 12 months ago by IP700
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1948 users visited in the last hour