Question

How to extract intron counts from total RNA Sequencing?

3

Entering edit mode

20 months ago

PK ▴ 130

Hi all,

Sorry for the very basic question. Since i mapped my sequencing files to genome i could see more than 40% reads mapping to the intronic region. So i would like to extract the counts of introns. So what i did was, i used featurecounts to extract the gene counts with following command

featureCounts -F GTF -p -t gene -g gene_id -p -P -B -C -s 2  -d 1 -D 25000  -T 6 -a Homo_sapiens.GRCh38.86.gtf -o genecounts.csv file.bam

And the for the exons

featureCounts -F GTF -p -t exon -g gene_id -p -P -B -C -s 2  -d 1 -D 25000  -T 6 -a Homo_sapiens.GRCh38.86.gtf -o exoncounts.csv file.bam

Now i have gene body counts and exon counts so my idea is genecounts - exoncounts = introncounts. To verify this i build intron file by merging overlapping exons of the transcripts for a given gene. Then i extracted the counts.

featureCounts -F SAF -p -P -B -C -s 2  -d 1 -D 25000  -T 6 -a Intron.txt -o introncounts.csv file.bam

When i subtracted the genecounts - exoncounts i'm getting different counts compare to introncounts. Why it is like this? I would assume to have same counts like genecounts - exoncounts = counts. The one possible problem what i could think of is that during the exon count the overlapping exons might have been discarded by featurecounts right? What else could be else the problems are? How can i sort this out

Example

ENSG00000279457 515 - Exoncount

ENSG00000279457 1262 - Genecount

ENSG00000279457 938 - Introncount

featurecounts RNAseq htseq STAR • 3.0k views

ADD COMMENT • link 20 months ago by PK ▴ 130

1

Entering edit mode

It's probably easier to add introns to your GTF using AGAT and then quantify on those introns instead of exons with featureCounts.

ADD REPLY • link 20 months ago by rpolicastro 13k

0

Entering edit mode

Actually the tools is missing certain exons in the middle and calling as a intron. This is the file i constructed as a intron

1       12227   12612   ENSG00000223972 DDX11L1         +
1       12721   12974   ENSG00000223972 DDX11L1         + 
1       13052   13220   ENSG00000223972 DDX11L1         +

But when i use AGAT tool

1       havana  intron  12228   12612   .       +   
1       havana  intron  12722   13220   .       +

I have attached a IGV screen shot for your reference. Is that normal? enter image description here

ADD REPLY • link 20 months ago by PK ▴ 130

0

Entering edit mode

It sounds like the third line is an intergenic region, not intronic

ADD REPLY • link 20 months ago by Juke34 8.6k

0

Entering edit mode

It is intronic, Can i ask you why? From my point of view, there is a small exon-ENSE00001799933 missing in the refseq annotation. There are 2 transcripts for the given gene and overlapping exons were mergerd.

ADD REPLY • link 20 months ago by PK ▴ 130

0

Entering edit mode

I don't get how you ended with this result so I checked using Ensembl Homo_sapiens.GRCh38.86.gtf file.
The gene is like that:

1   havana  gene    11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2";
1   havana  transcript  11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; tag "basic"; transcript_support_level "1";
1   havana  exon    11869   12227   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1";
1   havana  exon    12613   12721   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00003582793"; exon_version "1"; tag "basic"; transcript_support_level "1";
1   havana  exon    13221   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; exon_id "ENSE00002312635"; exon_version "1"; tag "basic"; transcript_support_level "1";
1   havana  transcript  12010   13670   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; tag "basic"; transcript_support_level "NA";
1   havana  exon    12010   12057   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001948541"; exon_version "1"; tag "basic"; transcript_support_level "NA";
1   havana  exon    12179   12227   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001671638"; exon_version "2"; tag "basic"; transcript_support_level "NA";
1   havana  exon    12613   12697   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001758273"; exon_version "2"; tag "basic"; transcript_support_level "NA";
1   havana  exon    12975   13052   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "4"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001799933"; exon_version "2"; tag "basic"; transcript_support_level "NA";
1   havana  exon    13221   13374   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001746346"; exon_version "2"; tag "basic"; transcript_support_level "NA";
1   havana  exon    13453   13670   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; exon_number "6"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-001"; transcript_source "havana"; transcript_biotype "transcribed_unprocessed_pseudogene"; havana_transcript "OTTHUMT00000002844"; havana_transcript_version "2"; exon_id "ENSE00001863096"; exon_version "1"; tag "basic"; transcript_support_level "NA";

There are two transcript and what you show for AGAT seems to correspond to the first one, 3 exons => 2 introns

ADD REPLY • link 20 months ago by Juke34 8.6k

0

Entering edit mode

Here the result with AGAT v1.0.0

1   havana  gene    11869   14409   .   +   .   ID=ENSG00000223972;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2
1   havana  transcript  11869   14409   .   +   .   ID=ENST00000456328;Parent=ENSG00000223972;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000362751;havana_transcript_version=1;tag=basic;transcript_biotype=processed_transcript;transcript_id=ENST00000456328;transcript_name=DDX11L1-002;transcript_source=havana;transcript_support_level=1;transcript_version=2
1   havana  exon    11869   12227   .   +   .   ID=ENSE00002234944;Parent=ENST00000456328;exon_id=ENSE00002234944;exon_number=1;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000362751;havana_transcript_version=1;tag=basic;transcript_biotype=processed_transcript;transcript_id=ENST00000456328;transcript_name=DDX11L1-002;transcript_source=havana;transcript_support_level=1;transcript_version=2
1   havana  exon    12613   12721   .   +   .   ID=ENSE00003582793;Parent=ENST00000456328;exon_id=ENSE00003582793;exon_number=2;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000362751;havana_transcript_version=1;tag=basic;transcript_biotype=processed_transcript;transcript_id=ENST00000456328;transcript_name=DDX11L1-002;transcript_source=havana;transcript_support_level=1;transcript_version=2
1   havana  exon    13221   14409   .   +   .   ID=ENSE00002312635;Parent=ENST00000456328;exon_id=ENSE00002312635;exon_number=3;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000362751;havana_transcript_version=1;tag=basic;transcript_biotype=processed_transcript;transcript_id=ENST00000456328;transcript_name=DDX11L1-002;transcript_source=havana;transcript_support_level=1;transcript_version=2
1   havana  intron  12228   12612   .   +   .   ID=intron_added-1;Parent=ENST00000456328;exon_id=ENSE00002312635;exon_number=3;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000362751;havana_transcript_version=1;tag=basic;transcript_biotype=processed_transcript;transcript_id=ENST00000456328;transcript_name=DDX11L1-002;transcript_source=havana;transcript_support_level=1;transcript_version=2
1   havana  intron  12722   13220   .   +   .   ID=intron_added-2;Parent=ENST00000456328;exon_id=ENSE00002312635;exon_number=3;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000362751;havana_transcript_version=1;tag=basic;transcript_biotype=processed_transcript;transcript_id=ENST00000456328;transcript_name=DDX11L1-002;transcript_source=havana;transcript_support_level=1;transcript_version=2

ADD REPLY • link 20 months ago by Juke34 8.6k

0

Entering edit mode

and (it does not fit in one message)

1   havana  transcript  12010   13670   .   +   .   ID=ENST00000450305;Parent=ENSG00000223972;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  exon    12010   12057   .   +   .   ID=ENSE00001948541;Parent=ENST00000450305;exon_id=ENSE00001948541;exon_number=1;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  exon    12179   12227   .   +   .   ID=ENSE00001671638;Parent=ENST00000450305;exon_id=ENSE00001671638;exon_number=2;exon_version=2;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  exon    12613   12697   .   +   .   ID=ENSE00001758273;Parent=ENST00000450305;exon_id=ENSE00001758273;exon_number=3;exon_version=2;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  exon    12975   13052   .   +   .   ID=ENSE00001799933;Parent=ENST00000450305;exon_id=ENSE00001799933;exon_number=4;exon_version=2;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  exon    13221   13374   .   +   .   ID=ENSE00001746346;Parent=ENST00000450305;exon_id=ENSE00001746346;exon_number=5;exon_version=2;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  exon    13453   13670   .   +   .   ID=ENSE00001863096;Parent=ENST00000450305;exon_id=ENSE00001863096;exon_number=6;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  intron  12058   12178   .   +   .   ID=intron_added-3;Parent=ENST00000450305;exon_id=ENSE00001863096;exon_number=6;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  intron  12228   12612   .   +   .   ID=intron_added-4;Parent=ENST00000450305;exon_id=ENSE00001863096;exon_number=6;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  intron  12698   12974   .   +   .   ID=intron_added-5;Parent=ENST00000450305;exon_id=ENSE00001863096;exon_number=6;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  intron  13053   13220   .   +   .   ID=intron_added-6;Parent=ENST00000450305;exon_id=ENSE00001863096;exon_number=6;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2
1   havana  intron  13375   13452   .   +   .   ID=intron_added-7;Parent=ENST00000450305;exon_id=ENSE00001863096;exon_number=6;exon_version=1;gene_biotype=transcribed_unprocessed_pseudogene;gene_id=ENSG00000223972;gene_name=DDX11L1;gene_source=havana;gene_version=5;havana_gene=OTTHUMG00000000961;havana_gene_version=2;havana_transcript=OTTHUMT00000002844;havana_transcript_version=2;tag=basic;transcript_biotype=transcribed_unprocessed_pseudogene;transcript_id=ENST00000450305;transcript_name=DDX11L1-001;transcript_source=havana;transcript_support_level=NA;transcript_version=2

ADD REPLY • link 20 months ago by Juke34 8.6k

0

Entering edit mode

You guys bring tears in my eyes..Anyway to the point i may not explained properly. for the given gene if transcripts exons overlaps, i merged those exons for the simplification. so ,

1   havana  transcript_1  11869   14409   
1   havana  transcript_1-exon-1    11869   12227  
1   havana  transcript_1-exon-2    12613   12721
1   havana  transcript_1-exon-3    13221   14409 
1   havana  transcript_2  12010   13670 
1   havana  transcript_2-exon-1    12010   12057  
1   havana  transcript_2-exon-2    12179   12227 
1   havana  transcript_2-exon-3    12613   12697  
1   havana  transcript_2-exon-4    12975   13052  
1   havana  transcript_2-exon-5    13221   13374  
1   havana  transcript_2-exon-6    13453   13670

The transcript_1-exon-1 can overlap with transcript_2-exon-1, transcript_2-exon-2 and

transcript_1-exon-2 takes transcript_2-exon-3

transcript_1-exon-3 again overlap with transcript_2-exon-5, transcript_2-exon-6

Among these transcript_2-exon-4 is missing so in my file that is considered and eventually became 1 12721 12974 ENSG00000223972 DDX11L1 +

Is that correct? am i wrong?

ADD REPLY • link 20 months ago by PK ▴ 130

0

Entering edit mode

The first rule of any counting process in bioinformatics is that:

Numbers never add up.

When I do any counting with one method, samtools view -c over a range, then I do it with a different tool, bedtools, then I do with featurecounts, the numbers will be close but never precisely the same. Each tool may apply various correction, makes assumptions etc depending on the input. I would not lose too much sleep over it.

ADD REPLY • link 20 months ago by Istvan Albert 101k

0

Entering edit mode

Thanks for your suggestion, i wanna ask one more question as well. I divided my RNA-Seq data into three parts as Intron counts, Exon counts and Gene counts. and i made DESeq counts for all of them in a WT and KO situation. when we looked the intron counts , My post-doc is saying that introns are much longer than exons. In most of the situation introns can accommodate more reads in horizontal way but not in vertical way like exons (like a peak). He said that, it may not be significant because the reads are distributed across the intron so he wanted me to normalize the intron by length. In general i assumed introns are another feature of gene and it can be treated like exons, since it is a total RNA seq i can have both. What would be the legitimate way to normalize the introns? can you please suggest something..Thanks

ADD REPLY • link 20 months ago by PK ▴ 130

0

Entering edit mode

this is a different question and not related to the main question, and it would best be asked separately. RNA-Seq data, in general, would not normally contain introns at all.

also, the statement about accommodating reads "vertically" vs "horizontally" seems not quite the right interpretation of what happens.

Each copy of the transcript, once fragmented, will produce several reads. A longer sequence would produce more reads, but it is also proportionally longer. Hence the length of a sequence has no effect on the coverage of that sequence. Thus there is no such thing as a longer sequence being covered by fewer reads because it is longer.

Introns are covered at a low (normally zero) rate because they are not expressed and not because they are long.

ADD REPLY • link 20 months ago by Istvan Albert 101k

0

Entering edit mode

Thanks Albert for your reply. Since you partly answered this question i just wanna ask quick follow-up. RNA-Seq data, in general, would not normally contain introns at all.. Actually it is dependent on the library preparation right. If you do poly-A selection it i possible to get few intronic reads since i'm using total RNA there is a high probability of getting introns as well (correct me if i'm wrong). It's not because of gDNA contamination because i could hardly see any reads on intergenic (The library prepared and sequenced in the one of the best institute in the world). What i'm assuming is that it could be possible that i'm fishing the premRNA vs matureRNA. Around 40% of reads goes to introns.

ADD REPLY • link 20 months ago by PK ▴ 130

1

Entering edit mode

I am not a biologist by training, and I will admit to quite a bit of missing knowledge as to when (in a time frame) does the splicing occur exactly and how much the relative concentrations of pre-mRNA are relative to mature mRNA. How much of each would be present at any given time?

If the processing time of a pre-mRNA is much shorter than that of mature mRNA, then the former would be present in a vastly lower abundance at any given moment since the mature mRNA would accumulate - so there is the issue of what information remains and is present when the sample is extracted.

In my understanding total RNA typically means ribosomal RNA + mRNA, and that ribosomal RNA will dominate the sample at some 80 to 90%. I have never done it myself, hence I rely on the memory of what I read, but isn't the purpose of selecting polyA to enrich mRNA relative to rRNA and not enrich pre-mRNA to mature mRNA.

I have never seen systematically covered introns in RNA-Seq data. I have seen strong signals and peaks from intronic and noncoding regions, which are puzzling, but I have never seen introns completely filled in. In those cases, I would expect annotation errors.

I am not saying it is impossible - just that it is unexpected - and one should consider other reasons. As far as I know, typical polyadenylated RNA-seq does not have many reads mapping to introns - but I could be off - I have seen many unexpected features of data.

ADD REPLY • link 20 months ago by Istvan Albert 101k

0

Entering edit mode

Great reply. These cells are not synchronized cells, which means we have no control over the cell division. But they are homogenous in terms of genotype. The reason why people do poly-A selection mostly for pol-II and mature transcripts. But in the total RNA preparation yes, we do deplete the ribosomes and random priming right.

how much the relative concentrations of pre-mRNA are relative to mature mRNA . It is hard to say right depends on the genes. In general heat shock and stress response tend to have lot of premRNA for the quick action.

ADD REPLY • link 20 months ago by PK ▴ 130

0

Entering edit mode

Here is a paper I just ran into (published just a few days ago, Nov 11, 2022), that sounds to contain relevant information

Retained introns in long RNA-seq reads are not reliably detected in sample-matched short reads

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02789-6

ADD REPLY • link 20 months ago by Istvan Albert 101k

0

Entering edit mode

Wow, the abstract itself scares me. However i will go through the paper.

ADD REPLY • link 20 months ago by PK ▴ 130