DESeq2: "model matrix is not full rank"
2
1
Entering edit mode
2.2 years ago
biogrin ▴ 10

Hi everyone,

My lab has differentiated iPSCs cell lines and I need to do a bioinformatic analysis to try to understand how close they are from the real organ made of these cells. To do this I've gathered publicly available RNA-seq from this tissue both fetal and adult. I'm now doing some exploratory analysis with PCA and dendograms but I'm struggling with some issues.

1. Constructing the DESeqDataSet

My coldata table (reduced in samples size) looks like this

ID      Study      Stage
Sample1   Lab     IPSCs
Sample2   Lab     IPSCs
Sample3   Lab     IPSCs
Sample4   Study1  Fetal
Sample5   Study1  Fetal
Sample6   Study2  Fetal
Sample7   Study2  Fetal
Sample8   Study2  Fetal
Sample9   Study2  Fetal
Sample10  Study2  Fetal
Sample11  Study2  Fetal


I thought the best approach would be to test for Stage (iPSCs/fetal/adult) controlling for the effect of Study (the fact that the samples come from different experiments)

dds <- DESeqDataSetFromMatrix(countdata, coldata, design= ~ Study + Stage)


However, I always come across (after trying and retrying in different ways) with the error:

Error in checkFullRank(modelMatrix):
the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed.


I don't see any linear combination. I thought the problem was our own samples (since they are the only iPSCs), however I've tried performing this without them and the error persists. So I think I'm really missing something. I've read potentially every single post from people facing the same problem, also read the tools' vignette, but still can't figure out what's wrong with my design and how can I solve the issue.

1. Accounting the batch effects for visualization purposes

I'm guessing limma::removeBatchEffects for Study would do the trick but would appreciate any hint on this topic too. Should I try sva and see if there is any other batch effects I should take into account?

I'm really sorry if these are really basic questions. I'm very new to bioinformatics and have been trying to find my way out on my own but sometimes I get stucked, specially with stuff involving statistics because of my lack of foundation on this.

I really appreciate any input you can give me. Thank you so much in advance!

RNA-Seq DESeq2 design DESeqDataSet batch • 7.6k views
0
Entering edit mode

Study is confounded by Stage (Lab has no replicates in Adult or Fetal). This is a typical situation where your experiment is fully confounded. You might eliminate batch within Fetals and within Adult but that's it. You would need samples of each Study in each Stage which you don't have. Your samples are going to (probably) cluster anywhere regardless of the true biology due to the batch effects. That is a common limitation, you cannot simply collect random studies and expect to compare them, this is unfortunately not how it works.

0
Entering edit mode

@ATpoint thanks for your input and time. However, as I mentioned, I also tried to perform this with the adult and fetal tissues only, and still got the same error which leads me to think there is something else wrong apart from my own experiment being confounded. Do you have any advice regarding what my approach should be as unfortunately that's the data I have to work with?

1
Entering edit mode

No, I do not think that there is much you can do with these data. Maybe manually scan for some marker genes that you may know that they could be candidates to characterize each stage, then check whether the Fetal and Adult indeed express the genes highly, then check expression level in your data, eventually confirm by qPCR on your RNA.

1
Entering edit mode
2.2 years ago
Macspider ★ 3.6k

The issue comes out from here:

Sample1   Lab     IPSCs
Sample2   Lab     IPSCs
Sample3   Lab     IPSCs


When you define an interaction term between Study and Stage you are basically asking the design to be on the groups defined by the combinations of these two columns. In these three samples, the Study column has one element (Lab) and the Stage column has one elemenet (IPSCs). The combination is still one element (Lab_IPSCs).

This won't fly with DESeq2, so you must find a better way to set up the design for these three samples. In other words, the information here is redundant.

The remaining samples can be used with that design, because "Fetal" corresponds to > 1 study type, and also "Adult" does.

0
Entering edit mode

Hi @Macspider, Thanks a lot for the time you took to answer. That's what I thought but when I try constructing the DESeqDataSet without my lab samples (so only fetal and adult) I still get the same error, hence why I think I'm missing out something? Apart from that, would you suggest any approach to be able to incorporate my samples and remove the batch effect associated with the fact the samples come from different experiments?

0
Entering edit mode

Did you relevel() the columns after removing the Lab samples?

0
Entering edit mode
2.2 years ago

Even Without the 'Lab' samples, your cell types are still nested within studies. There might be tricks you can use to get around that:

https://bioconductor.riken.jp/packages/3.6/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups