Question

many intronic reads and no features counts in 3' RNA-seq sequencing

1

Entering edit mode

4.2 years ago

concetta ▴ 10

Hi,

I am analyzing RNA-seq data produced with Lexogen kit "Quant-Seq 3' mRNA-Seq FWD-UMI".

My samples are FFPE samples and for each sample we sequenced 4 millions of reads 1xSE 75 nt.

I mapped the reads to the human reference genome hg38 using star and I obtained that the 80% of reads uniquely mapped to the genome and 30% multi-mapped reads.

Subsequently, I deduplicated the mapped reads based on UMI sequence. Then I checked the reads quality distribution using RSeQC and I calculated gene counts using HTSEQ-count.

Looking at rseqc output, I have observed a high number of reads tag into introns. I was wondering why I have a high number of reads tag across intron. This result can be due to the type of sequencing?

Looking at htseq-count output, I observed many reads counted as no_features. For example in one sample over a total of 3,430,447 deduplicated reads, I have 1,911,232 counts assigned to features and 1,492,974 counts considered as __no_feature. I was wondering why I have a high number of counts considered as no_features and if it can indicate some issues about the analysis and sequencing.

In addition, there is a minimum of reads that should be assigned to the features to perform gene expression analysis. For example, 877,152 counts assigned to features can be enough?

Thank you!

Concetta

RNA-Seq • 2.2k views

ADD COMMENT • link updated 3.0 years ago by Christopher Walker ▴ 70 • written 4.2 years ago by concetta ▴ 10

0

Entering edit mode

Did you follow instructions Lexogen has for processing data produced using this kit?

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

Yes, I have followed their instuctions.

ADD REPLY • link 4.2 years ago by concetta ▴ 10

0

Entering edit mode

Have you inspected the alignments to verify that the reads are indeed in introns? Are there specific pileups (those could be previously unknown genes/non-coding RNAs) or general scatter of alignments (low level DNA contamination)?

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

Yes, I checked on IGV the reads aligment. I observed that the reads are spread over introns and intergenic regions. I checked the region in the UCSC genome browser and in the regions there are annotated transposons element.

I can exclude the DNA contamination because during RNA extraction DNase treatment has been performed.

I have another question concerning the minimum number of reads mapped to genes. Is 1 million of mapped reads to genes enough to perform gene expression analysis? For example, 877,152 reads mapped to genes are enough to perform expression analysis?

ADD REPLY • link 4.2 years ago by concetta ▴ 10

0

Entering edit mode

Hi, is this the Single stranded library? have you checked that you used the correct strandness? I am not sure but I thing you should have used the -fr-secondstrand

ADD REPLY • link 4.2 years ago by theodore ▴ 90

0

Entering edit mode

I am seeing similar issues in FFPE samples. Did you ever resolve this or is this expected in heavily fragmented RNA?

ADD REPLY • link 3.4 years ago by rrdavis ▴ 60

score 0 · Answer 1 · 2021-05-01

I had the same issue and concluded this seems to be real biologic intron retention, rather than some kind of technical error. FFPE samples especially tend to a higher percentage of transcripts with retained introns compared to fresh frozen. Here is one paper comparing frozen with FFPE, although I would image many other have been published, PLoS One. 2017; 12(1): e0170632. Also see this post from a few years with a substantial discussion on the subject.