Entering edit mode
5.0 years ago
komal
•
0
Hello,
I have 75 oral cancer samples 3 control and 72 treated samples. i did its RNA seq analysis by using Tuxedo pipeline (Hisat2->Stringtie->Stringtie --merge->ballgown
) but i am getting very less no of transcripts 55049 total transcripts (24301 novel and 30747 known transcripts) that's why i used DESeq2
for DGE analysis it is giving 234628 total transcripts (74040 novel and 160588 known transcripts). can anyone tell me which will be a correct way to do the analysis??
my understanding is that the correct way doesn't exist, there are several pipeline for doing analysis that are more or less accepted by the scintific community
I agree that there is no correct way but there are many incorrect ones. komal, did you aggregate the transcript level counts from ballgown to the gene level prior to running DESeq2, e.g. with tximport. DESeq2 is not intended for differential transcript level analysis, here is why. In general, I really do not understand why people feel the need to use
stringtie
in their pipeline. The mouse and human genome/transcriptome (which I guess you use) are among the most well-annotated sets that exist in genomics. Using new transcripts (or even genes) in an analysis is probably not informative as it would (to be accurate) require experimental validation and extensive replicates and sequencing depth for the transcriptome assembly which is not granted in a standard RNA-seq analysis. I personally vote for quantifying against the established transcriptomes, e.g. with salmon and then keep the downstream analysis simple in terms of not focusing too much on novel transcripts unless your project is specifically about that with the ability to perform downstream validation experiments. In the Biostars slack, I yesterday read a comment that sums that up accuratelyGet Over Yourself: the genes you are interested in are probably not that special
. Keep things simple!Okay, thank you for the explanation.