Hi all - apologies for any naivety/ambiguity; I'm fairly new to seq analysis.
So basically, I currently have some read count files (non-coding RNA) in tabular format for multiple samples that I would love to put through differential expression analysis.
The files include transcript annotation/ID, transcript coordinates, raw read count, adjusted read count, and raw RPM/adjusted RPM. The problem is that the transcripts listed in each sample are different - I assume because certain ncRNAs were present in some samples but not others (so read counts in a given sample file are only for ncRNAs that were expressed in that sample - in the next sample, the transcripts listed might be an overlapping but different set).
Is there any way that differential expression analysis can be performed on read counts in this format? If not, is there a way they can be transformed to an expression matrix?
My understanding is that I might need to have the samples in an expression matrix including the same list of transcripts between all samples + an expression value for each transcript (would be zero for non-expressed transcripts). I was thinking perhaps that I could build a transcriptome using all the sample files, then make an expression matrix using the compiled list of transcripts, but I don't really know how this should be done.
Full disclosure, I'm terrible with programming and ideally would like to use tools available on Galaxy (https://usegalaxy.org/) for this if possible....but I could probably manage something in R or python if I had very specific instructions (if someone was feeling generous or happens to have a good protocol)
If anyone is wondering, I got these read count files from sRNAtoolbox, specifically the output files from sRNAbench. The web platform includes a DE module for miRNAs, but I was hoping to further analyze the other non-coding RNAs that are annotated for you using data from RNAcentral. I did try messing with the Docker standalone, but that was a bit of a disaster.
Thanks in advance.