Question: Integration of Microarray datasets with different platforms and biological groups
0
gravatar for asalimih
29 days ago by
asalimih20
asalimih20 wrote:

Hello,
I've read a lot of posts about integrating microarray datasets but couldn't get my answer clearly for this specific scenario.
I have 4 GSE numbers of microarray datasets which have different platforms. Their biological groups are as follows:

  • GSE1: TypeA, TypeB, TypeC, TypeD
  • GSE2: TypeA, TypeE
  • GSE3: TypeE, TypeF
  • GSE4: TypeF, TypeG

As you can see each dataset has one biological group in common with next dataset. (there would be no way to distinguish batch effect from biological effect if there were no groups in common)
I can't analyse each dataset separately because for example i want to perform a differential expression analysis between TypeB and TypeF. So i need to merge all these datasets into one.
here are my questions:

  1. Is it possible to merge these datasets into one? (considering i want to perform a differential expression analysis (DEA))
  2. should i normalize them before merging ? if yes what normalization techniques should i use?
  3. How to merge them and remove batch effect? (for example if i want to use limma for DEA)

Any help would be greatly appreciated.
Thanks in advance

ADD COMMENTlink written 29 days ago by asalimih20
1

Don't do that. Your results will be confounded as you have different platforms. You can only remove a batch effect if you have replicates from each batch in the single groups which you apparently don't, therefore no way to identify and distinguish batch from biological effect. There are limitations towards what data analysis can and cannot do. Combining independent datasets at will while producing valid results is imho none of it. What you ask can most likely not be done with the available data. I know it is frustrating, I had the same issue many times, high-quality data being available via download but not suited for what I wanted to do with them. Forcing them into wrong analysis would probably only produce artefact results without deeper meaning.

ADD REPLYlink modified 29 days ago • written 29 days ago by ATpoint23k

thanks for your reply. Having a biological group in common wouldn't be any helpful?

ADD REPLYlink written 29 days ago by asalimih20
1

No, the differences in platform are generally impossible to overcome in any meaningful way in this case. If you had the same samples on each and every platform, then maybe you could get something that's mildly believable after a lot of hassle, but as @ATpoint said, even a case such as that would be looked at with a lot of skepticism.

ADD REPLYlink written 29 days ago by jared.andrews073.1k

Thanks for your reply. another question, for example i want to check for a Gene if there is a significant upregulation from TypeB to TypeE. i perform DEA on GSE1 for TypeB vs TypeA and a separate DEA on GSE2 for TypeA vs TypeE. now if there was an upregulation from TypeB to TypeA and an upregulation from TypeA to TypeE, Could i conclude there is a significant upregulation from TypeB to TypeE?

ADD REPLYlink written 29 days ago by asalimih20
1

You could maybe use it to justify doing qPCR to validate that, but realistically? No. Definitely not with any valid statistical backing.

ADD REPLYlink written 29 days ago by jared.andrews073.1k

if platforms were the same would this analysis be possible to perform? (based on groups i mentioned)

ADD REPLYlink written 29 days ago by asalimih20
1

Sure, if the samples were all run on the same platform, that'd eliminate the majority of your issues. You could compare the samples any which way you might want. You might still have to deal with batch effects, but there are established methods to help deal with that.

ADD REPLYlink written 29 days ago by jared.andrews073.1k
1

Not necessarily. I am not too much a microarray guy but in RNA-seq from what I've seen myself results are strongly confounded by the library preparation method even if run on the same Illumina platform. I guess in the array world you also have choices on how you isolate RNA, how to make cDNA and make the PCR enrichment. That comes down to the same problem, not having replicates to separate batch from true effect leaves a lot of uncertainty that might skew your analysis.

ADD REPLYlink written 29 days ago by ATpoint23k

Can I consider these platforms as the same: Affymetrix Human Genome U133A+U133B and Affymetrix Human Genome U133 plus?

ADD REPLYlink modified 26 days ago • written 26 days ago by asalimih20
1

It doesn't matter:

That comes down to the same problem, not having replicates to separate batch from true effect leaves a lot of uncertainty that might skew your analysis.

As I said before, I personally vote for ndoing what you want to do because you cannot control the batch effect. It is not only the platform, also the library prep, RNA extraction etc. And no, the probably are not the same, otherwise the company would not have put them as three products.

ADD REPLYlink written 26 days ago by ATpoint23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 700 users visited in the last hour