Hi, I’m trying to analyze the differential gene expression between wild type and a particular mutation. In total, I did 3 independent trials using the TruSeq stranded kit with ribosomal depletion.
After obtaining the raw reads per trial, I then performed quality control using fastp. I noticed that the polyA tail might have been sequenced because A/T is overrepresented in the fastp report. I assembled both polyA-trimmed and non-polyA-trimmed sets using Trinity for de-novo transcriptome assembly, after which I assessed the quality using read representation, transrate and metric statistics in hopes of identifying which assembly is better. I am now trying to quantify the transcripts per set-up to be able to do differential expression analysis. Of note, I am doing all these steps independently of each trial, that is, I’m doing all the steps in one trial before repeating it in another.
My questions are:
At which step do I compare the 3 independent trials? For example, do I create one Trinity assembly by using the reads from all three trials? or do I do differential gene expression analysis separately for all the trials and then perform some sort of statistical analysis/ PCA at the end?
Is there anything I can do with a low transrate assembly score (around 0.00012) despite good bowtie2 mapping when I mapped the reads back to the assembly?
Thank you very much in advance!