As part of a group software development project we have been given 9 sets (3 sample types each containing 3 replicates) of assembled bat transcriptome sequences and their FPKM scores. The samples are the bats before being infected, 8 hours after being infected and 24 hours after being infected. The aim is to produce a list of the most differentially expressed genes and visual representations of the relatedness between samples (PCA plots or the like).
We have blasted the samples and extracted the FPKM scores into lists. We then put the scores into a single matrix, scoring 0.01 for any genes which weren't present in the blast. We then imported this into R and created a data matrix. We have logged the FPKM scores in the data matrix and converted into an expression set. We were planning on using limma to analyse the differences between treatment groups. Assuming we do what is the best way of creating a targets matrix (something which limma seems to require) for the analysis?
If anyone has a better idea of how to perform the analysis which can be done quickly I wouldn't say no but due to having been left in the lurch by someone we're running short on time.
Thanks in advance.
You have been given a sub-par setting because FPKM should not be used for differential gene expression analysis, instead such analysis should be based on raw counts which can be transformed as required. In particular, FPKM are not suitable for analysis by limma, instead you need voom transformed raw counts. I recommend to ask you supervisor for the raw data, then do a DE analysis using DEseq or limma.
Sadly I am all too aware that it's suboptimal but he's said that that's what he wants us to use. I've seen a few places you can use a set of logged scores in Limma like here but I'm struggling to work out how to do it with what we've been given.
That is sad indeed. I still think one should not use anything but state-of-the-art approaches in teaching, and therefore, I suggest to point your instructor to this thread. This community has a dedication to teaching and can support your project, but we also have a responsibility to deliver high quality. I hope this feed-back will be valuable for all parties involved in your project.