Hello all !
I am writing this post to expose my concerns regarding the methodology I am following. My goal is to perform de novo assembly (there is no reference transcriptome nor genome yet for my target organism) with 2 different conditions and 5 samples in each condition. And then to run functionnal annotation and differential expression analysis.
So far, I have run different assemblies (after doing some fastqc of course) and then I have selected the best one in terms of busco and transrate score. Then I have performed a Htseqcount on my bam file which contains all of the mapped reads.
And now, I don't know if it is better to perform functionnal annotation or to run differential expression analysis (using deseq2). In fact I have no clue of what is the difference to do one first or second.
Also I have noticed in my Htseqcount files that there are many transcripts with only very few reads mapping to it. Should I remove these transcripts from my fasta file, so I gain processing time and avoid processing meaningless data ? Or maybie is it better to use "the good transcripts" identified by Transrate to get my final fasta ?
Eventually, I have seen in different posts people running the differential analysis on htseqcount at "gene level". I did it on transcript level which seemed tobe the only possibility, (to do it on gene lvl you need a reference transcriptom, right ?)
I hope it is not too many questions, and that I am clear enough.
Thanks for helping me see this more clearly !