Percentage of RNA-seq reads in exonic, intronic, intergenic and splice junction regions
2
1
Entering edit mode
4.5 years ago

Dear all, how could I know proportion of different genomic features for my RNA sequence data? like I want to know that what %age of my data falls in exonic, intronic, intergenic and splice junction regions? I have multiple analysis files that looks like this:

1) genome annotation file that give information about mRNA and CDS:

chr4    GLEAN   mRNA    123284514   123288477   0.999991    -   .   ID=Cotton_A_18927_BGI-A2_v1.0;Name=Cotton_A_18927;source_id=CottonA_GLEAN_10022228;identical_support_id=CUFF67.1103.1;evid_id=Cot030308.1
chr4    GLEAN   CDS 123288376   123288477   .   -   0   Parent=Cotton_A_18927_BGI-A2_v1.0
chr4    GLEAN   CDS 123287662   123287826   .   -   0   Parent=Cotton_A_18927_BGI-A2_v1.0

2) a splice junction determination file that gives information for the doner and accepter splice site coordinates:

chr1    329728  329839  -
chr1    330066  330757  -
chr1    581256  581357

3) a transcriptome assembly that give information about exons and transcript:

chr1    StringTie   transcript  328635  330943  1000    -   .   "gene_id ""STRG.1"""    " transcript_id ""STRG.1.1"""   " cov ""70.023491"""    " FPKM ""28.098141"""   " TPM ""24.855738"""    
chr1    StringTie   exon    328635  329729  1000    -   .   "gene_id ""STRG.1"""    " transcript_id ""STRG.1.1"""   " exon_number ""1"""    " cov ""88.673470"""        
chr1    StringTie   exon    329840  330067  1000    -   .   "gene_id ""STRG.1"""    " transcript_id ""STRG.1.1"""   " exon_number ""2"""    " cov ""22.850203"""        
chr1    StringTie   exon    330758  330943  1000    -   .   "gene_id ""STRG.1"""    " transcript_id ""STRG.1.1"""   " exon_number ""3"""    " cov ""18.054590"""

needs kind help.

thank you so much

rna-seq • 2.0k views
ADD COMMENT
2
Entering edit mode
4.5 years ago
Martombo ★ 3.0k

You can use read_distribution.py from RSeQC for this. It works with a bed format as input.

ADD COMMENT
1
Entering edit mode

(Moved to an answer because seems to solve the question of OP)

ADD REPLY
0
Entering edit mode

thank you so much for the response. I have read read_distribution.py from RSeQC. if I provide it with alignment.sam file and gene_annotation.bed it will give me the read distribution only for the TSS TES exon etc but will not provide my any information that what %age of data consists of splice junctions.

let me repharase what I need: 1) %gae of SJ, % of intron, % of inergenic, % exon

2) what % of splice junction resides in: 5'UTR, 3'UTR-CDS, CDS, CDS-3'UTR, 3'UTR,

kindly guid me in this regard

thank you so much

ADD REPLY
2
Entering edit mode
4.5 years ago
michael.ante ★ 3.7k

RSeQC has also tools for analysing junctions. It counts and classifies junction into known, partial-known, and novel junctions. You can use this as a starting point for your junction-analysis.

ADD COMMENT
0
Entering edit mode

(Moved to an answer because seems to solve the question of OP)

ADD REPLY

Login before adding your answer.

Traffic: 1561 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6