many intronic reads and no features counts in 3' RNA-seq sequencing
1
1
Entering edit mode
4.2 years ago
concetta ▴ 10

Hi,

I am analyzing RNA-seq data produced with Lexogen kit "Quant-Seq 3' mRNA-Seq FWD-UMI".

My samples are FFPE samples and for each sample we sequenced 4 millions of reads 1xSE 75 nt.

I mapped the reads to the human reference genome hg38 using star and I obtained that the 80% of reads uniquely mapped to the genome and 30% multi-mapped reads.

Subsequently, I deduplicated the mapped reads based on UMI sequence. Then I checked the reads quality distribution using RSeQC and I calculated gene counts using HTSEQ-count.

Looking at rseqc output, I have observed a high number of reads tag into introns. I was wondering why I have a high number of reads tag across intron. This result can be due to the type of sequencing?

Looking at htseq-count output, I observed many reads counted as no_features. For example in one sample over a total of 3,430,447 deduplicated reads, I have 1,911,232 counts assigned to features and 1,492,974 counts considered as __no_feature. I was wondering why I have a high number of counts considered as no_features and if it can indicate some issues about the analysis and sequencing.

In addition, there is a minimum of reads that should be assigned to the features to perform gene expression analysis. For example, 877,152 counts assigned to features can be enough?

Thank you!

Concetta

RNA-Seq • 2.2k views
ADD COMMENT
0
Entering edit mode

Did you follow instructions Lexogen has for processing data produced using this kit?

ADD REPLY
0
Entering edit mode

Yes, I have followed their instuctions.

ADD REPLY
0
Entering edit mode

Have you inspected the alignments to verify that the reads are indeed in introns? Are there specific pileups (those could be previously unknown genes/non-coding RNAs) or general scatter of alignments (low level DNA contamination)?

ADD REPLY
0
Entering edit mode

Yes, I checked on IGV the reads aligment. I observed that the reads are spread over introns and intergenic regions. I checked the region in the UCSC genome browser and in the regions there are annotated transposons element.

I can exclude the DNA contamination because during RNA extraction DNase treatment has been performed.

I have another question concerning the minimum number of reads mapped to genes. Is 1 million of mapped reads to genes enough to perform gene expression analysis? For example, 877,152 reads mapped to genes are enough to perform expression analysis?

ADD REPLY
0
Entering edit mode

Hi, is this the Single stranded library? have you checked that you used the correct strandness? I am not sure but I thing you should have used the -fr-secondstrand

ADD REPLY
0
Entering edit mode

I am seeing similar issues in FFPE samples. Did you ever resolve this or is this expected in heavily fragmented RNA?

ADD REPLY
0
Entering edit mode
3.0 years ago

I had the same issue and concluded this seems to be real biologic intron retention, rather than some kind of technical error. FFPE samples especially tend to a higher percentage of transcripts with retained introns compared to fresh frozen. Here is one paper comparing frozen with FFPE, although I would image many other have been published, PLoS One. 2017; 12(1): e0170632. Also see this post from a few years with a substantial discussion on the subject.

ADD COMMENT

Login before adding your answer.

Traffic: 2695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6