What should be used as a "baseline" in pooled CRISPR screen?
0
1
Entering edit mode
7 weeks ago
Aleksandr ▴ 10

Hello folks,

I have a question regarding the analysis of CRISPR screen results, to be more precise - what should be used as a "baseline" when I try to identify enriched sgRNAs - should it be a day 0 sample, or an unsorted population? Here are more details about the aim of the experiment and how it was designed:

The aim - identify up/down regulators of a gene X.

The design of the experiment is the following: We have a cell line with translational fusion X::GFP (there is a wt copy of gene X in the genome). It was transfected with genome-wide sgRNA library (Brunello) (let's call it sample day_0). Cas9 was induced with dox and after 2 and 7 days cells were sorted. For each day 3 samples were produced - Unsorted (all cells), High fluorescence (25% of all cells with the highest GFP signal), and Low fluorescence (25% of all cells with the lowest GFP signal). Libraries were prepared and sequenced.

Here is some logic of what I expect from the data: Knowing what genes get enriched in a "high GFP population" would give us an idea of what genes might interact with X and activate its transcription. Looking for enriched genes in a "Low GFP" population would give us an idea of what genes might be important for the repression of gene X.

I'm trying to process the data with MAGECK-Vispr and having a hard time figuring out what should I use as a "baseline" in this case. I've tried two approaches: 1) Using the "Unsorted" population as a baseline when calculating the enrichment score for each day. I can see ~ 2000 genes get enriched (FDR <0.05) in the high population. Doing GSEA on this gene set I can see the enrichment for some of the pathways. 2) Using the "day 0" population as a baseline. In this case, I can see ~ 1000 genes get enriched in the "high" population, so twice a few in comparison to the using "unsorted" population. When I do the GSEA on this gene set I get almost 0 pathways enriched. Almost all terms have FDR > 0.05.

So, you can see that there are more genes detected with I use the "unsorted" population as a baseline. I tried to explore this and what I can see is that almost all genes in the "unsorted" population show depletion (negative ß-score) when I compare this to the day 0 sample. So, during these two days, I can see that the "base level" for the majority of the genes is shifting down, probably this is why I can see more genes popping up in the "high" population. Another concern is that if I want to somehow compare day 2 and day 7, I feel that this would be not the right thing to do, as each day high/low sets would be compared to their own "Unsorted" population. Also, since High and Low populations represent relatively high proportions of the cells in the "unsorted" population, it becomes difficult to estimate how exactly they would contribute to the sgRNA levels.

I was reading different papers and it looks like there is no defined standard for this. In some papers00380-9#supplementaryMaterial) the sorted populations are compared to the unsorted populations. In some studies, there is a 4th sorted population that shows no change in fluorescence levels at all. In Some cases High population is compared to the Low, or Low+Unchanged.

Thank you for reading this long post and I hope you can give me some advice on what would you think would be the most right thing to do in this particular setup.

CRISPR MAGECK FACS • 348 views
ADD COMMENT
1
Entering edit mode

Are the results making biological sense based on your domain knowledge? Irrespective of the method you are using.

ADD REPLY
0
Entering edit mode

Little remark - I analyze this data for a research group, so I don't have the full competency to make conclusions on what is biologically relevant. But, these people told me that they could see some genes and pathways that are known for the regulation of this gene. The problem here is that I get a lot of hits from HDACs, and they are present in so many terms that I feel this is also pushing some of the terms up.

ADD REPLY
1
Entering edit mode

Isn't day 0 population = Unsorted cells before induction? If not how is that sample different?

ADD REPLY
0
Entering edit mode

The unsorted population is the population after induction and after 2 (or 7 days). So yes, this is unsorted in a way. But when I compare the unsorted vs the day 0 I can see a general depletion in sgRNAs.

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6