As part of a group software development project we have been given 9 sets (3 sample types each containing 3 replicates) of assembled bat transcriptome sequences and their FPKM scores. The samples are the bats before being infected, 8 hours after being infected and 24 hours after being infected. The aim is to produce a list of the most differentially expressed genes and visual representations of the relatedness between samples (PCA plots or the like).
We have blasted the samples and extracted the FPKM scores into lists. We then put the scores into a single matrix, scoring 0.01 for any genes which weren't present in the blast. We then imported this into R and created a data matrix. We have logged the FPKM scores in the data matrix and converted into an expression set. We were planning on using limma to analyse the differences between treatment groups. Assuming we do what is the best way of creating a targets matrix (something which limma seems to require) for the analysis?
If anyone has a better idea of how to perform the analysis which can be done quickly I wouldn't say no but due to having been left in the lurch by someone we're running short on time.
Thanks in advance.