Question

What are the requirements to use data generated in other microarray experiments as a control?

1

Entering edit mode

6.0 years ago

Leite ★ 1.3k

Hello everyone,

I was thinking, if for example I can use samples generated from Illumina HumanHT-12 V4.0 expression beadchip as a healthy control for samples generated in Illumina HumanHT-12 V3.0 expression beadchip or vice versa.

Is this possible or not? if the answer is no, what are the requirements to use data generated in other microarray experiments as a control??

Best regards,

Leite

R microarray • 3.5k views

ADD COMMENT • link updated 6.0 years ago by JJ ▴ 680 • written 6.0 years ago by Leite ★ 1.3k

1

Entering edit mode

6.0 years ago

JJ ▴ 680

I agree - very change in condition can influence your results, which is really a shame as there are so many experiments out there but it's hard to make use of them. I do however think that one can read certain things out of it, e.g. which genes are generally expressed in this set of conditions and not in the other. I would however be careful with the log2FC values as andrew.j.skelton73 explained.

What are the requirements to use data generated in other microarray experiments as a control??

As andrew.j.skelton73 explained you will always have a batch effect, which you cannot correct for. I even once had an experiment with a notable batch effect based on the day the arrays were produced although the samples, conditions etc. otherwise were the same (but here I could correct it luckily).

ADD COMMENT • link 6.0 years ago by JJ ▴ 680

0

Entering edit mode

can you suggest me how do i make it balanced

condition <- factor(c(rep("Control", 4),rep("Test", 2)),levels=c("Control", "Test"))
batch <- factor(c(rep("A", 4),rep("B", 2)),levels=c("A", "B"))

this is where i have to modify , but im not sure how how can change the batch factor

ADD REPLY • link 6.0 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

You can't for this experiment: batch and condition is one and same

ADD REPLY • link 6.0 years ago by JJ ▴ 680

score 8 · Accepted Answer · 2018-04-25

The short answer is no. For the longer answer, you'll need to go with the age old rule of thumb, keep all experimental conditions the same, except the variable that's relevant to your hypothesis. In the realm of sequencing, and high throughput experimentation, technical variability plays a huge role.

Three things to consider when designing a sequencing / high throughput experiment: batches, platform, and sample type placement.

To absorb a given nuisance effect, there needs to be balance on your primary term of interest. Take the following design as an example :

> pheno
Sample ID    SampleType     
SAM1         A              
SAM2         A              
SAM3         A              
SAM4         B              
SAM5         B              
SAM6         B

These are all from the same sequencer, no nuisance variables that need to be accounted for, so the design matrix is simply design.matrix(~0 + SampleType, data = pheno). Now consider a known batch effect such as this:

> pheno.batch
Sample ID    SampleType     Batch
SAM1         A              B1
SAM2         A              B1
SAM3         A              B2
SAM4         A              B2
SAM5         B              B1
SAM6         B              B1
SAM7         B              B2
SAM8         B              B2

We'd consider this experiment to be well balanced, as you can see there are samples with SampleType A and B in batch 1 and 2. This balance means that variation can be estimated across SampleType and Batch, with the following design matrix: design.matrix(~0 + SampleType + Batch, data = pheno.batch).

Next, here's an example of an unbalanced design, where the design matrix will not be full rank, which means that there's an unbalanced term(s) in the design matrix. In this case, we can see that our SampleType column, and Batch columns are the same when looking at factor levels.

> pheno.batch2
Sample ID    SampleType     Batch
SAM1         A              B1
SAM2         A              B1
SAM3         A              B1
SAM4         A              B1
SAM5         B              B2
SAM6         B              B2
SAM7         B              B2
SAM8         B              B2

When running design.matrix(~0 + SampleType + Batch, data = pheno.batch2), there will be a full rank error. Now that we understand the difference between a balanced and unbalanced design, lets take your OP as an example:

> pheno.batch.op
Sample ID    SampleType        Platform
SAM1         Healthy_Control   HT12v4
SAM2         Healthy_Control   HT12v4
SAM3         Healthy_Control   HT12v4
SAM4         Healthy_Control   HT12v4
SAM5         Disease           HT12v3
SAM6         Disease           HT12v3
SAM7         Disease           HT12v3
SAM8         Disease           HT12v3

In this case, Platform is essentially, the same as SampleType, so we can't untie that variation. What that essentially means at an abstract level, is that we can't statistically tell the difference between variation that is coming from the difference of platform, or biological variance coming from the sample types. These experiments are highly sensitive to change, so that's not even accounting for library preps, sample preparations, source biological material, kits used, temperature in the room, person who prepped the sample, etc.