Hi
I was wondering, whether it make sense to run DESeq on the transcript level. I am interested on differentially expressed exons (differential exon usage) and would like to know how exons are being expressed under different conditions.
I have counted the reads with featureCounts on the transcript level:
featureCounts -a ~/genomes/Drosophila_melanogaster/genes.gtf -f -O -t exon -g transcript_name -o featureCounts.out.txt ../bamFiles/sample1.bam
this gives me the list of the exons with the number of reads for each exon.
I would like than to run DESeq2 with the count table of the single exon. But before I am doing that, I was wondering if it make sense at all to run DESeq with the exons, as the distribution of the reads over exon is not the same as for the genes. In featureCounts I also allow a read to be counted in multiple transcripts, if an exon is expressed in more than just one transcript.
thanks
Assa
Wouldn't it be easier to just directly run DEXseq? It uses DESeq2 functions internally, so that would seem to save you a bit of work.
As far as I understand it, DEXseq must have replica to work. Unfortunately I don't have any :-(
You can either tweak the DEXseq source to get around that or reimplement it yourself. I really recommend not doing either, so I won't show how to do them. The results without replicates would be questionable at best, so you're best off not bothering without very good reason.
Yes I know that. But this is what we have at the moment and I need to work with that. It is still better than to work with cufflinks (IMHO), if the statistics allows this kind of analysis.
If you google "DEXseq blind estimation" you might get lucky and find an example. You might consider finding the biologist that came up with this experiment and buying him/her a clue.
Thank Devon again. I do know how to manipulate DEXseq into believing that there is some kind of dispersion, by basically just manually adding one column to the data. And I also know how strongly Simon will be against such a misuse of his tool :-)
I am also very aware of the problematic with single replica, but this is what I have. I know the Biologist and they are do working on some new data/samples, but it will take some time and as usual, they need the results yesterday. This is why I have asked whether or not it is possible to "adjust" DESeq2 into working on the transcript level. (Even though I am quite sure Simon wouldn't like this idea either :D)
I know the results are not statistically as reliable as needed, but it is a start and can help us refine the long list of genes. It is a start-point to get a better look at the data, at least until we have more.
thanks for the help
It's your time to waste :)
The DEXseq source code is only a few hundred lines. Just have a go through methods.R and core.R and you should be able learn enough to get DESeq2 to do the analysis for you.