I'm analysing RNA-seq data from two datasets (from healthy samples) and created a unique GTF file to identify new isoforms by using StringTie. Then I used Salmon to estimate their TPMs, but I have some questions hoping anyone can help me:
1) Besides PCR, how do I know that these putative novel transcripts are not "transcriptional noise"?
2) I used tximport to import my Salmon outputs as following:
txi <- tximport(files, type="salmon", txOut=TRUE, countsFromAbundance="scaledTPM") cts <- txi$counts cts <- cts[rowSums(cts) > 0,]
This generates a matrix with the TPM value per sample that I used to calculate the median across all samples from the previous matrix to have a "general TPM value" just as a reference for each novel transcript. Is this approach correct?
I'm not interested in DGE nor DTU as I don't have any "condition" to compare against with as my goal is to identify novel isoforms of my gene of interest. Is there also any other feedback you can share?