Analysis of shRNA/CRISPR screens in 2021
1
3
Entering edit mode
11 months ago
ATpoint 62k

I am looking for hands-on experiences towards the analysis of shRNA (or CRISPR as it's basicall the same) screens. What are the tools one uses in 2021, is Mageck still the standard for it?

We performed a screen with about 2000 barcodes, with an average of 5 shRNAs per gene, 60 positive (killing) and 60 negative control (non-targeting) barcodes. Three celllines, each with an input control harvested 1 day after infecting the cells with the shRNA virus soup. Each sample and input is triplicated, so 18 libraries in total. I can do QC and everything without a dedicated tool, looking for dedicated DE software.

The aim would be to both compare each cellline with its control but also compare celllines with each other. There are no batch effects so correction for these is not a requirement. Support for normalization to the control genes would be a plus but it is also fine if I have to do it externally with the controlGenes from DESeq2::estimateSizeFactors.

Any pointers that worked well in your hands? If not a dedicated shRNA/CRISPR tool but something like DESeq2 or edgeR, how did you take care of combining stats for multiple shRNAs/guide for the same gene? Thanks!

crispr shrna • 902 views
3
Entering edit mode
11 months ago
dsull ★ 3.1k

I've used Mageck for CRISPR screens and it works great.

A few things:

• It, by default, doesn't allow mismatches between read and library but still I've always had good (>= ~80%) mapping rates; I've had better mapping results with paired-end reads (because if one read fails to align because of a mismatch, the second read might succeed)
• You may need to use cutadapt to remove technical nucleotides in your sequencing (mageck tries to figure this out automatically, but it doesn't always work especially if you had adapters on both ends of your reads or the adapter varies in length between different reads)
• When running mageck mle, you can play around with designs (e.g. put cell lines and/or treatment/control in your design formula)

The main downside of mageck is that there are a bunch of options and ways to do the analysis with mageck mle, and it's difficult to figure out which one is ideal. E.g. Is it better to use permutation p-values or Wald p-values? Should you use the control sgRNAs for normalization? (with control normalization with non-targeting sgRNAs, you may get inflated false positives because non-targeting sgRNAs don't act the same as sgRNAs targeting non-essential loci; not sure if it's the same deal with shRNAs)

You just have to run it and see if there's anything funky (e.g. a super skewed beta score distribution, if too few genes are meeting your FDR threshold [when you expect more], if your positive controls don't look as expected, etc.).

I recommend using a dedicated tool because it has been been peer-reviewed, validated by multiple labs, etc.; don't use DESeq2/EdgeR or make up your own workflow (trying to re-invent the wheel, that labs at the top institutes work full time developing, never ends up working well IMHO).

As for comparing Mageck vs. other tools, I'm not sure -- I haven't come across any reliable benchmarking papers that I really like. Different tools will always produce different results and make different assumptions about your data. Mageck is a tool that seems to work well (based on what we currently know) and there are always better ways, in theory, to analyze data but Mageck seems to get the job done. Best thing to do is to extensively validate your screen (if not biological validation, do extensive technical validation: check known essential genes, known non-essential genes, do GO analysis, etc. to see if things are behaving as expected).

Just my thoughts!

1
Entering edit mode

Also, important to note: although mageck was published a while back, it is still actively maintained (which is important for a tool -- it's incorporating newer methods as people discover better ways to do things). I think this is important.

0
Entering edit mode

That's for this informative and extensive answer!

0
Entering edit mode

Do you have any take on choosing permutation- or Wald-based FDRs? It gives quite different results. Permutation seems to be much more conservative from what I've seen in my data. Cutoffs for permutation-based FDR could be as high as 0.25 from what I've read from the authors in the mageck google group.

1
Entering edit mode

In one dataset I had, the Wald was much more conservative and I could barely get any hits. When looking at my z score distribution, most genes had a small z score and the z scores were skewed towards the negative. In Wald test, when doing hypothesis testing, you're testing z-score against a standard normal distribution. I suspect my standard errors were too high (I don't remember how Mageck calculates errors both across replicates and across sgRNAs, but I had biological duplicates and only 4 guides per gene).

However, when I did permutation test, it looked fine (null distribution = shuffling sgRNAs, rather than being the standard normal distribution).

In another dataset (one that had no replicates, but six guides per gene), the Wald worked just fine.