Counting intronic reads in bulk RNA-seq
1
0
Entering edit mode
11 months ago
John Ma ▴ 310

My experience with single-cell RNA-seq shows that the inclusion of intronic reads improves the sensitivity for several genes of interest, which otherwise have zero expression when only exonic reads are considered. While single-cell sequencing quantifiers now often have options to count intronic reads, the bulk alignment-based quantifiers (let's say RSEM or Cufflinks) don't seem to me to count intronic reads by reading their manuals.

Currently the only way I can think of to count exon+intron reads in bulk RNA-seq data is by using htseq-count with the -t transcript option, but am I right in this? In addition, are there other ways I can use? Note that I only need gene-level counts.

Thanks for your answer in advance!

rna-seq • 1.9k views
ADD COMMENT
1
Entering edit mode
11 months ago

Are you sure this is appropriate for bulk RNASeq? I thought intronic reads for single cell were only counted in nuclear preps.

ADD COMMENT
0
Entering edit mode

Correct, and I agree -- reads lying fully within introns are usually not counted for either single-cell or bulk.

Inclusion of introns in a transcriptome "index" is what improves accuracy (but this is because mapping accuracy is improved). Reads lying fully within introns are usually only used in single-cell for things like RNA velocity analysis and splicing analysis -- not for typical quantification.

For nuclear preps, purely intronic reads are quantified simply because what we're interested in is the nascent (unspliced) RNA molecules which is mostly what's present in the nucleus.

ADD REPLY
0
Entering edit mode

swbarnes2 and dsull:

Thanks for the concern. I know that counting intronic reads in bulk RNA-seq is rather unorthodox and it's something I've never done before. In this usage case, however, I do have some justifications:

  1. I'm looking at whether the gene in question (a human GENCODE gene) is expressed in a tissue type or not, without any intent for expression analyses.
  2. Alternatively, I understand that intronic reads in bulk RNA-seq can potentially be genomic DNA contamination. However, I'm also looking at the capture rates of this kind of reads as well.

That said, this put the issue of 10X decision of by-default inclusion intronic reads in read matrix since Cell Ranger 7 into question, since most scRNA-seq are performed on whole cells rather than nuclei. I suppose their assumption is that in a microfluidics platform, the gDNA contamination should not affect the total count per barcode too much?

ADD REPLY
0
Entering edit mode

Yeah, that's their assumption and also they assume that we're NOT ONLY interested in quantifying mature transcripts. However, for most analyses, it's only the mature transcripts we're interested in since nascent transcripts exhibit length biases and different properties than mature transcripts, and therefore probably should be quantified as a distinct species from mature transcripts -- the idea of summing the two up doesn't really seem like the right thing to do.

It also doesn't really make sense that exonic reads would give you zero expression while intronic reads would give you detectable expression -- exons are part of both nascent transcripts and mature transcripts so at whatever step of RNA processing, they should exist and be captured well (detecting only the introns might therefore be an artifact.)

Just interesting questions to think about, and seems like you're interested in such questions. So, yeah, cufflinks and RSEM don't really give you intron counts. kallisto/bustools can get counts of mapped introns for bulk data, but if you're interested in alignment (rather than just mapping), I think you can feed certain tools (e.g. featureCounts) ONLY the intron regions as the annotation.

ADD REPLY
0
Entering edit mode

10x does give you a big flashing warning telling you what the new default is, and tells you how to redo it without that if it's not what you wanted.

ADD REPLY

Login before adding your answer.

Traffic: 3082 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6