Hi, I’m trying to analyze the differential gene expression between wild type and a particular mutation. In total, I did 3 independent trials using the TruSeq stranded kit with ribosomal depletion.

After obtaining the raw reads per trial, I then performed quality control using fastp. I noticed that the polyA tail might have been sequenced because A/T is overrepresented in the fastp report. I assembled both polyA-trimmed and non-polyA-trimmed sets using Trinity for de-novo transcriptome assembly, after which I assessed the quality using read representation, transrate and metric statistics in hopes of identifying which assembly is better. I am now trying to quantify the transcripts per set-up to be able to do differential expression analysis. Of note, I am doing all these steps independently of each trial, that is, I’m doing all the steps in one trial before repeating it in another.

My questions are:

  1. At which step do I compare the 3 independent trials? For example, do I create one Trinity assembly by using the reads from all three trials? or do I do differential gene expression analysis separately for all the trials and then perform some sort of statistical analysis/ PCA at the end?

  2. Is there anything I can do with a low transrate assembly score (around 0.00012) despite good bowtie2 mapping when I mapped the reads back to the assembly?

Thank you very much in advance!

What organism? Are you sure you need to assemble transcripts?

This is from a mouse cell line but I'm looking for non-coding RNAs (a lot of which are not yet annotated) as well as novel transcripts.

