Question

Analysis of shRNA/CRISPR screens in 2021

3

Entering edit mode

2.8 years ago

ATpoint 82k

I am looking for hands-on experiences towards the analysis of shRNA (or CRISPR as it's basicall the same) screens. What are the tools one uses in 2021, is Mageck still the standard for it?

We performed a screen with about 2000 barcodes, with an average of 5 shRNAs per gene, 60 positive (killing) and 60 negative control (non-targeting) barcodes. Three celllines, each with an input control harvested 1 day after infecting the cells with the shRNA virus soup. Each sample and input is triplicated, so 18 libraries in total. I can do QC and everything without a dedicated tool, looking for dedicated DE software.

The aim would be to both compare each cellline with its control but also compare celllines with each other. There are no batch effects so correction for these is not a requirement. Support for normalization to the control genes would be a plus but it is also fine if I have to do it externally with the controlGenes from DESeq2::estimateSizeFactors.

Any pointers that worked well in your hands? If not a dedicated shRNA/CRISPR tool but something like DESeq2 or edgeR, how did you take care of combining stats for multiple shRNAs/guide for the same gene? Thanks!

crispr shrna • 2.1k views

ADD COMMENT • link updated 2.7 years ago by dsull ★ 5.8k • written 2.8 years ago by ATpoint 82k

score 3 · Answer 1 · 2021-07-26

I've used Mageck for CRISPR screens and it works great.

A few things:

It, by default, doesn't allow mismatches between read and library but still I've always had good (>= ~80%) mapping rates; I've had better mapping results with paired-end reads (because if one read fails to align because of a mismatch, the second read might succeed)
You may need to use cutadapt to remove technical nucleotides in your sequencing (mageck tries to figure this out automatically, but it doesn't always work especially if you had adapters on both ends of your reads or the adapter varies in length between different reads)
When running mageck mle, you can play around with designs (e.g. put cell lines and/or treatment/control in your design formula)

The main downside of mageck is that there are a bunch of options and ways to do the analysis with mageck mle, and it's difficult to figure out which one is ideal. E.g. Is it better to use permutation p-values or Wald p-values? Should you use the control sgRNAs for normalization? (with control normalization with non-targeting sgRNAs, you may get inflated false positives because non-targeting sgRNAs don't act the same as sgRNAs targeting non-essential loci; not sure if it's the same deal with shRNAs)

You just have to run it and see if there's anything funky (e.g. a super skewed beta score distribution, if too few genes are meeting your FDR threshold [when you expect more], if your positive controls don't look as expected, etc.).

I recommend using a dedicated tool because it has been been peer-reviewed, validated by multiple labs, etc.; don't use DESeq2/EdgeR or make up your own workflow (trying to re-invent the wheel, that labs at the top institutes work full time developing, never ends up working well IMHO).

As for comparing Mageck vs. other tools, I'm not sure -- I haven't come across any reliable benchmarking papers that I really like. Different tools will always produce different results and make different assumptions about your data. Mageck is a tool that seems to work well (based on what we currently know) and there are always better ways, in theory, to analyze data but Mageck seems to get the job done. Best thing to do is to extensively validate your screen (if not biological validation, do extensive technical validation: check known essential genes, known non-essential genes, do GO analysis, etc. to see if things are behaving as expected).

Just my thoughts!