Dear all,
Background
I have RNA-seq data of diverse tissues and timings of some species. The species has two morphs (for each we have one genome) and I used pangenomic tools to build a pangenome, and then pantranscriptomic software to map reads and quantify their expression. Specifically, i used the pangenome graph builder (pggb) to build such pangenome. Then, I used the vg toolkit (e.g., this) to map reads, and then rpvg (see), a very recent software, to quantify transcript expression.
What I have
At this point, out of rpvg, I get files containing containing (i) length, (ii) effective length, (iii) read counts, and (iv) TPMs for each isoform as well as with the allele-specificity (i.e., since I have a pangenome, it is mapping over the best path).
Goal
Now, I would like to perform differential expression analysis over my data.
My question
Since I have read counts (and not estimated counts), I have put sleuth and other software aside. I am thinking that edgeR could be a good software, since I have table of read counts.
However, I have some concerns:
I know that, from the edgeR user's guide, it clearly states that edgeR can be applied to any genomic features, as long as it is read counts that are used. But, I would still like to have some confirmation that, even in that specific case of isoform and allelic specificity, the main assumption of the model used are not broken (e.g.., the loci are overlapping). On the other hand, all the concerns about assigning mapped read to one or another isoform or allele have been taken care of by the previous pipeline, so I am still thinking that this could work.
Conclusion
Any comments on the use of edgeR in that case, or suggestion on other software that I may use, would be most welcome.
Thank you for reading me, Luca