I am working with a dataset containing 50 libraries of small RNAs. I am interested in all kinds of small RNAs (miRNA, tRNA fragments, piRNAs, etc.). I use an in-house script to obtain a matrix of counts: number of counts of each sequence for each sample. On this matrix I apply a low count filtering, so that I am left with a matrix of approximately 50K rows.
For me, the next logical step is to use this matrix as input to DESeq2 to obtain which sequences are differentially expressed between the experimental groups. However, I have some doubts about whether DESeq2 is suitable for this kind of data, as I don't know to what extent it is distributed in the same way as gene expression data. Could someone with experience with this kind of analysis tell me if this pipeline is correct? I have seen articles using this methodology, but I have not found direct answers from the package developer indicating that DESeq2 can be used with small-RNAseq data.
I see what you mean, maybe I should reformulate the question, as I am more worried about the assumptions you have to make to use DESeq2. If I am not wrong, in RNA-seq expression analysis the negative binomial distribution is often used to model individual gene expression counts.
However, I was wondering if this same distribution can be assumed for expression data of small RNAs. In my understanding, these small RNAs are not counted as individual genes, as in the case of RNA-seq expression, but are analyzed as single sequences or clusters of similar sequences.
Does anyone have experience or knowledge on what statistical distribution might be more appropriate for modeling small RNAs expression data? Poisson's distribution or some other might be more appropriate given the context of small RNAs?