Question: Percentage of RNA-seq reads in exonic, intronic, intergenic and splice junction regions
1
gravatar for blooming.daisy333
15 months ago by
blooming.daisy33370 wrote:

Dear all, how could I know proportion of different genomic features for my RNA sequence data? like I want to know that what %age of my data falls in exonic, intronic, intergenic and splice junction regions? I have multiple analysis files that looks like this:

1) genome annotation file that give information about mRNA and CDS:

chr4    GLEAN   mRNA    123284514   123288477   0.999991    -   .   ID=Cotton_A_18927_BGI-A2_v1.0;Name=Cotton_A_18927;source_id=CottonA_GLEAN_10022228;identical_support_id=CUFF67.1103.1;evid_id=Cot030308.1
chr4    GLEAN   CDS 123288376   123288477   .   -   0   Parent=Cotton_A_18927_BGI-A2_v1.0
chr4    GLEAN   CDS 123287662   123287826   .   -   0   Parent=Cotton_A_18927_BGI-A2_v1.0

2) a splice junction determination file that gives information for the doner and accepter splice site coordinates:

chr1    329728  329839  -
chr1    330066  330757  -
chr1    581256  581357

3) a transcriptome assembly that give information about exons and transcript:

chr1    StringTie   transcript  328635  330943  1000    -   .   "gene_id ""STRG.1"""    " transcript_id ""STRG.1.1"""   " cov ""70.023491"""    " FPKM ""28.098141"""   " TPM ""24.855738"""    
chr1    StringTie   exon    328635  329729  1000    -   .   "gene_id ""STRG.1"""    " transcript_id ""STRG.1.1"""   " exon_number ""1"""    " cov ""88.673470"""        
chr1    StringTie   exon    329840  330067  1000    -   .   "gene_id ""STRG.1"""    " transcript_id ""STRG.1.1"""   " exon_number ""2"""    " cov ""22.850203"""        
chr1    StringTie   exon    330758  330943  1000    -   .   "gene_id ""STRG.1"""    " transcript_id ""STRG.1.1"""   " exon_number ""3"""    " cov ""18.054590"""

needs kind help.

thank you so much

rna-seq • 699 views
ADD COMMENTlink modified 15 months ago by WouterDeCoster40k • written 15 months ago by blooming.daisy33370
2
gravatar for Martombo
15 months ago by
Martombo2.5k
Seville, ES
Martombo2.5k wrote:

You can use read_distribution.py from RSeQC for this. It works with a bed format as input.

ADD COMMENTlink written 15 months ago by Martombo2.5k
1

(Moved to an answer because seems to solve the question of OP)

ADD REPLYlink written 15 months ago by WouterDeCoster40k

thank you so much for the response. I have read read_distribution.py from RSeQC. if I provide it with alignment.sam file and gene_annotation.bed it will give me the read distribution only for the TSS TES exon etc but will not provide my any information that what %age of data consists of splice junctions.

let me repharase what I need: 1) %gae of SJ, % of intron, % of inergenic, % exon

2) what % of splice junction resides in: 5'UTR, 3'UTR-CDS, CDS, CDS-3'UTR, 3'UTR,

kindly guid me in this regard

thank you so much

ADD REPLYlink modified 15 months ago • written 15 months ago by blooming.daisy33370
2
gravatar for michael.ante
15 months ago by
michael.ante3.4k
Austria/Vienna
michael.ante3.4k wrote:

RSeQC has also tools for analysing junctions. It counts and classifies junction into known, partial-known, and novel junctions. You can use this as a starting point for your junction-analysis.

ADD COMMENTlink written 15 months ago by michael.ante3.4k

(Moved to an answer because seems to solve the question of OP)

ADD REPLYlink written 15 months ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1061 users visited in the last hour