My work involves identification of diferentially expressed genes within 3 conditions using microarray data. I had planned to use existing data from GEO for this purpose. For microarray, in most datasets all the 3 conditions were present in the same experiment and was pretty good to use for my purpose. Now I read that RNA-seq has an advantage over microarray for the detection of DEG. But all the 3 conditions are rarely present within the same experiment. An experiment has 2 of 3 conditions. The experiments vary over instrument and library preparation methods. Can anyone suggest on how can I plan my further work if I want to compare between microarray data and RNAseq DEG or rather shift entirely to RNAseq. Also for the most papers I referred I saw the group uses their own samples. Prepare all the conditions themselves and sequence it. Is it a good idea to work with existing RNAseq data?
It's not a major issue if the RNA-seq samples differ based on library prep and instrument. Provided that you can access the raw FASTQ files, I would do the following:
- Obtain raw counts in each sample using Kallisto
- Import counts to DESeq2 using tximport
- Normalise counts in DESeq2 with a design model that includes library prep and instrument as factors likely to affect counts. This should mitigate the effects of these, if they exist.
There is a tutorial for this general process here: Analyzing RNA-seq data with DESeq2
all the 3 conditions were present in the same experiment and was pretty good to use for my purpose
I'm not sure I understand what you're looking for. Did the microarray data help you address the biological question at hand or not?
Now I read that RNA-seq has an advantage over microarray for the detection of DEG
That depends on the biological question you're interested in. Do you want to detect novel transcripts? Alternative splicing? Very lowly expressed genes? Then yes, probably you're better off using high-quality (!) RNA-seq data.
If you just want to get a list of genes with 2fold expression change in different conditions then microarrays should serve you well if you follow the proper processing and analysis procedures.
Is it a good idea to work with existing RNAseq data?
If it addresses your biological question and meets basic QC criteria, sure.