Question: How can I classify circRNA as exonic, intronic or intergenic from the output of find_circ
0
gravatar for tofazzal.stat
2.7 years ago by
tofazzal.stat0 wrote:

I have a list of circRNAs identified by circRNA identification tool find_circ. How can I classify these circRNAs as exonic, intronic or intergenic? Is there any tools or script? some lines of output from find_circ are given below:

# chrom start   end name    n_reads strand  n_uniq  best_qual_A best_qual_B

chr4    166006737   166024248   Sy5y_D0_circ_000001 2   -   1   5   40
chr7    101950003   101952188   Sy5y_D0_circ_000002 1   +   1   5   5
chr5    619104  620376  Sy5y_D0_circ_000003 2   +   2   5   40

Thanks in advance.

rna-seq circrna • 1.1k views
ADD COMMENTlink modified 2.7 years ago by Kevin Blighe69k • written 2.7 years ago by tofazzal.stat0
1
gravatar for Kevin Blighe
2.7 years ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

If you wanted comprehensive annotation for your circular RNAs, I would use BEDTools to overlap your regions with the GENCODE comprehensive GTF annotation. This has the co-ordinates of the upward of 200,000 transcripts (and their isoforms) identified by the Encode project.

  1. Download the Comprehensive gene annotation from https://www.gencodegenes.org/releases/current.html (hg38) (for hg19: https://www.gencodegenes.org/releases/grch37_mapped_releases.html )
  2. overlap your find_circ output with these regions using BEDTools:

.

bedtools intersect -a find_circ.output.txt -b gencode.v28.annotation.gtf.gz

For more information on BEDTools intersect, see: http://bedtools.readthedocs.io/en/latest/content/tools/intersect.html

Kevin

ADD COMMENTlink modified 2.4 years ago • written 2.7 years ago by Kevin Blighe69k

Thank you for your response. Running the above command I got the following type of results.

chr4    166007942   166008048   Sy5y_D0_circ_000001 2   -   1   5   40
chr4    166007942   166008048   Sy5y_D0_circ_000001 2   -   1   5   40
chr4    166014435   166014560   Sy5y_D0_circ_000001 2   -   1   5   40

But I want the results as follows:

chr4    166006737   166024248     exon
chr7    101950003   101952188     intron
chr5    619104  620376    intergenic

Any suggestions will be appreciated.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by tofazzal.stat0

I see, please try this, instead:

bedtools intersect -a -circ_rna.bed -b gencode.v28.annotation.gtf -wb
chr4    166007942   166008048   Sy5y_D0_circ_000001 2   -   1   5   40  chr4    HAVANA  exon    166007943   166008048   .   +   .   gene_id "ENSG00000038295.7"; transcript_id "ENST00000061240.6"; gene_type "protein_coding"; gene_name "TLL1"; transcript_type "protein_coding"; transcript_name "RP11-624O16.1-001"; exon_number 7; exon_id "ENSE00003485218.1"; level 2; protein_id "ENSP00000061240.2"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS3811.1"; havana_gene "OTTHUMG00000161112.3"; havana_transcript "OTTHUMT00000363821.1";
chr4    166007942   166008048   Sy5y_D0_circ_000001 2   -   1   5   40  chr4    HAVANA  CDS 166007943   166008048   .   +   2   gene_id "ENSG00000038295.7"; transcript_id "ENST00000061240.6"; gene_type "protein_coding"; gene_name "TLL1"; transcript_type "protein_coding"; transcript_name "RP11-624O16.1-001"; exon_number 7; exon_id "ENSE00003485218.1"; level 2; protein_id "ENSP00000061240.2"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS3811.1"; havana_gene "OTTHUMG00000161112.3"; havana_transcript "OTTHUMT00000363821.1";
chr4    166014435   166014560   Sy5y_D0_circ_000001 2   -   1   5   40  chr4    HAVANA  exon    166014436   166014560   .   +   .   gene_id "ENSG00000038295.7"; transcript_id "ENST00000061240.6"; gene_type "protein_coding"; gene_name "TLL1"; transcript_type "protein_coding"; transcript_name "RP11-624O16.1-001"; exon_number 8; exon_id "ENSE00003496842.1"; level 2; protein_id "ENSP00000061240.2"; transcript_support_level "1"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS3811.1"; havana_gene "OTTHUMG00000161112.3"; havana_transcript "OTTHUMT00000363821.1";

If you want to tidy this output, then pipe it into the cut BASH command.

This will only return all UTR, CDS, and exons, though, because that is what is included in the GENCODE GTF files. However, it contains all currently known non-coding RNA species. If you want introns and intergenic regions, then I suggest different options:

An issue that you face with these regions is that they overlap both introns and exons concurrently, i.e., they are very large circicular RNAs.

Kevin

ADD REPLYlink written 2.7 years ago by Kevin Blighe69k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1060 users visited in the last hour