I'm analysing RNA-seq data from two datasets (from healthy samples) and created a unique GTF file to identify new isoforms by using StringTie. Then I used Salmon to estimate their TPMs, but I have some questions hoping anyone can help me:
1) Besides PCR, how do I know that these putative novel transcripts are not "transcriptional noise"?
2) I used tximport to import my Salmon outputs as following:
txi <- tximport(files, type="salmon", txOut=TRUE, countsFromAbundance="scaledTPM") cts <- txi$counts cts <- cts[rowSums(cts) > 0,]
This generates a matrix with the TPM value per sample that I used to calculate the median across all samples from the previous matrix to have a "general TPM value" just as a reference for each novel transcript. Is this approach correct?
I'm not interested in DGE nor DTU as I don't have any "condition" to compare against with as my goal is to identify novel isoforms of my gene of interest. Is there also any other feedback you can share?
Thanks for the feedback. I'll definitely focus on the filters you suggest to improve my results.