Mageck RRA is listing negative controls as high fold change in the gene summary
1
0
Entering edit mode
12 months ago
liz.b • 0

I've been running MageckRRA and providing a list of negative control sgRNAs using --norm-method control and --control-sgrna negative_ctrls.txt

In my gene_summary file I see these negative controls appearing as the top high and low fold change genes. For one of the sets of samples I ran RRA on the log file includes a lot of 'Skipping gene ... for permutation ...' messages, but for another set of samples it apparently didn't do this.

My colleague said the negative controls should not appear in the gene summary. Are they correct? Is there something I need to change to get the negative controls out of the results? Is this likely to be a code issue or an issue of poor quality data?

Thank you!

python crispr mageck • 1.6k views
ADD COMMENT
0
Entering edit mode

Please provide code and output examples, textual descriptions are hard to debug. IIRC when I used MAGeCK RRA for shRNA screening it did not return controls in the output. But I wanted to also see the stats for the negative controls, so I duplicated negative controls and added a "_1" to the duplicates. The original names I provided to the normalization --norm-method control --control-sgrna parameters, so normalization and permutation distribution where based on them. Still the duplicated ones would come back in the output so I could double-check that most controls were not called as significant. But yes, "normally" they should not be in results.

ADD REPLY
0
Entering edit mode

Sorry! Will include code next time!

Fyi, my current solution is to stop using RRA and just also use MLE on direct sample to sample comparisons.

ADD REPLY
0
Entering edit mode

I'm having the same problem: in my gene_summary.txt, these negative controls appear.

I used the following script:

#!/bin/bash
set -e
LIBRARY="librairies/Human_GeCKOv2_Library_combine_2.csv"
CTLSGRNA="librairies/CONTROL_SGRNA.csv"
OUTDIR="mageck_output_mix"
mkdir -p $OUTDIR
mageck count \
-l $LIBRARY \
--sample-label alphaA1,alphaA2,unselA,alphaB1,alphaB2,unselB \
--fastq \
Data/Hap1-GeckoA-AlphaVirus-S1_R1_001.fastq.gz \
Data/Hap1-GeckoA-AlphaVirus-S2_R1_001.fastq.gz \
Data/Hap1-GeckoA-Unselected_R1_001.fastq.gz \
Data/Hap1-GeckoB-AlphaVirus-S1_R1_001.fastq.gz \
Data/Hap1-GeckoB-AlphaVirus-S2_R1_001.fastq.gz \
Data/Hap1-GeckoB-Unselected_R1_001.fastq.gz \
--norm-method control \
--control-sgrna $CTLSGRNA \
--pdf-report \
--output-prefix $OUTDIR/geckoAB_RRA_control

The file CONTROL_SGRNA.csv contains (and no errors in output) :

HGLibB_57029,ACGGAGGCTAAGCGTCGCAA,NonTargetingControlGuideForHuman_0001
HGLibB_57030,CGCTTCCGCGGCCCGTTCAA,NonTargetingControlGuideForHuman_0002
HGLibB_57031,ATCGTTTCCGCTTAACGGCG,NonTargetingControlGuideForHuman_0003
HGLibB_57032,GTAGGCGCGCCGCTCTCTAC,NonTargetingControlGuideForHuman_0004
HGLibB_57033,CCATATCGGGGCGAGACATG,NonTargetingControlGuideForHuman_0005
HGLibB_57034,TACTAACGCCGCTCCTACAG,NonTargetingControlGuideForHuman_0006
HGLibB_57035,TGAGGATCATGTCGAGCGCC,NonTargetingControlGuideForHuman_0007
HGLibB_57036,GGGCCCGCATAGGATATCGC,NonTargetingControlGuideForHuman_0008
HGLibB_57037,TAGACAACCGCGGAGAATGC,NonTargetingControlGuideForHuman_0009
HGLibB_57038,ACGGGCGGCTATCGCTGACT,NonTargetingControlGuideForHuman_0010
HGLibB_57039,CGCGGAAATTTTACCGACGA,NonTargetingControlGuideForHuman_0011
HGLibB_57040,CTTACAATCGTCGGTCCAAT,NonTargetingControlGuideForHuman_0012
HGLibB_57041,GCGTGCGTCCCGGGTTACCC,NonTargetingControlGuideForHuman_0013
HGLibB_57042,CGGAGTAACAAGCGGACGGA,NonTargetingControlGuideForHuman_0014
...

Or this one:

HGLibB_57029 ACGGAGGCTAAGCGTCGCAA NonTargetingControlGuideForHuman_0001
HGLibB_57030 CGCTTCCGCGGCCCGTTCAA NonTargetingControlGuideForHuman_0002
HGLibB_57031 ATCGTTTCCGCTTAACGGCG NonTargetingControlGuideForHuman_0003
HGLibB_57032 GTAGGCGCGCCGCTCTCTAC NonTargetingControlGuideForHuman_0004
HGLibB_57033 CCATATCGGGGCGAGACATG NonTargetingControlGuideForHuman_0005
HGLibB_57034 TACTAACGCCGCTCCTACAG NonTargetingControlGuideForHuman_0006
HGLibB_57035 TGAGGATCATGTCGAGCGCC NonTargetingControlGuideForHuman_0007
HGLibB_57036 GGGCCCGCATAGGATATCGC NonTargetingControlGuideForHuman_0008
HGLibB_57037 TAGACAACCGCGGAGAATGC NonTargetingControlGuideForHuman_0009
HGLibB_57038 ACGGGCGGCTATCGCTGACT NonTargetingControlGuideForHuman_0010
HGLibB_57039 CGCGGAAATTTTACCGACGA NonTargetingControlGuideForHuman_0011
HGLibB_57040 CTTACAATCGTCGGTCCAAT NonTargetingControlGuideForHuman_0012
HGLibB_57041 GCGTGCGTCCCGGGTTACCC NonTargetingControlGuideForHuman_0013
HGLibB_57042 CGGAGTAACAAGCGGACGGA NonTargetingControlGuideForHuman_0014
HGLibB_57043 CGAGTGTTATACGCACCGTT NonTargetingControlGuideForHuman_0015

And, I had previously tested as recommended by Mageck's wiki here :

NonTargetingControlGuideForHuman_0001
NonTargetingControlGuideForHuman_0002
NonTargetingControlGuideForHuman_0003
NonTargetingControlGuideForHuman_0004
NonTargetingControlGuideForHuman_0005
NonTargetingControlGuideForHuman_0006
NonTargetingControlGuideForHuman_0007
NonTargetingControlGuideForHuman_0008
NonTargetingControlGuideForHuman_0009
NonTargetingControlGuideForHuman_0010
NonTargetingControlGuideForHuman_0011
NonTargetingControlGuideForHuman_0012
NonTargetingControlGuideForHuman_0013
NonTargetingControlGuideForHuman_0014
 ...

But none of them work

INFO  @ Tue, 20 May 2025 14:13:32: 0 out of 1000 control sgRNAs are found in count table. 
ERROR @ Tue, 20 May 2025 14:13:32: Not enough control sgRNAs found in the count table. Please check your control sgRNA list.

What is the right format ?

To make sure I'm using the right method, is it normal to have NonTargetingControlGuideForHuman in the Human_GeCKOv2_Library_combine_2.csv library supplied by Mageck? Should these lines be removed?

ADD REPLY
1
Entering edit mode

The file provided to --control-sgrna should contain only the sgRNA IDs in a single column. Currently, you're providing the gene IDs. The identifiers should be the HGLibB_57029 containing column. And no, you should not remove them from the library file.

ADD REPLY
0
Entering edit mode

OK, thanks a lot, that's the first column.

EDIT: I just tried with a .txt file containing:

HGLibB_57029
HGLibB_57030
HGLibB_57031
HGLibB_57032
HGLibB_57033
HGLibB_57034
HGLibB_57035
HGLibB_57036
HGLibB_57037
HGLibB_57038
HGLibB_57039
HGLibB_57040
HGLibB_57041
HGLibB_57042
HGLibB_57043
HGLibB_57044
HGLibB_57045
HGLibB_57046
HGLibB_57047
HGLibB_57048
HGLibB_57049
HGLibB_57050
HGLibB_57051
HGLibB_57052
HGLibB_57053
HGLibB_57054
HGLibB_57055
HGLibB_57056
...

with 1000 sgRNA IDs (HGLibB_....), but it still doesn't work. I am using Mageck 0.5.9.5 in a conda environment.

Perhaps I haven't understood what the right format should be. My script contains :

#!/bin/bash
set -e

# Don't forget to activate conda env: conda activate mageck_py310
LIBRARY="librairies/Human_GeCKOv2_Library_combine_2.csv"
CTLSGRNA="librairies/CONTROL_SGRNA.txt"
OUTDIR="mageck_output_mix"
mkdir -p $OUTDIR

mageck count \
  -l $LIBRARY \
  --sample-label alphaA1,alphaA2,unselA,alphaB1,alphaB2,unselB \
  --fastq \
  Data/Hap1-GeckoA-AlphaVirus-S1_R1_001.fastq.gz \
  Data/Hap1-GeckoA-AlphaVirus-S2_R1_001.fastq.gz \
  Data/Hap1-GeckoA-Unselected_R1_001.fastq.gz \
  Data/Hap1-GeckoB-AlphaVirus-S1_R1_001.fastq.gz \
  Data/Hap1-GeckoB-AlphaVirus-S2_R1_001.fastq.gz \
  Data/Hap1-GeckoB-Unselected_R1_001.fastq.gz \
  --norm-method control \
  --control-sgrna $CTLSGRNA \
  --pdf-report \
  --output-prefix $OUTDIR/geckoAB_RRA_control

EDIT2: the warning concerns the number of sgRNAs found in the count table. I misread it.

ADD REPLY
0
Entering edit mode

Then run without providing --control-sgrna and check the resulting counts table to see what's going on.

ADD REPLY
1
Entering edit mode
11 months ago

Most likely, your negative control sgRNAs specified are using the wrong IDs or in the wrong format somehow.

If provided appropriately, you should see messages in the log like:

INFO  @ Wed, 28 Sep 2022 16:48:26:   Skipping gene neg09 for permutation ... 
INFO  @ Wed, 28 Sep 2022 16:48:26:   Skipping gene neg05 for permutation ... 
INFO  @ Wed, 28 Sep 2022 16:48:26:   Skipping gene neg10 for permutation ... 
INFO  @ Wed, 28 Sep 2022 16:48:26:   Skipping gene neg04 for permutation ... 
INFO  @ Wed, 28 Sep 2022 16:48:26:   Skipping gene neg01 for permutation ... 
INFO  @ Wed, 28 Sep 2022 16:48:26:   Skipping gene neg02 for permutation ... 
INFO  @ Wed, 28 Sep 2022 16:48:26:   Skipping gene neg08 for permutation ... 
INFO  @ Wed, 28 Sep 2022 16:48:26:   Skipping gene neg06 for permutation ... 
INFO  @ Wed, 28 Sep 2022 16:48:26:   Skipping gene neg07 for permutation ...

And:

INFO  @ Wed, 28 Sep 2022 16:48:26:   Total # control sgRNAs: 51

And yes, the control guides/genes should not be in the summary files if done properly. I've found it useful to compare the normalized counts for the negative control guides/genes between timepoints to assess how consistent they are.

ADD COMMENT
1
Entering edit mode

You may also be interested in a recent package that I got accepted into Bioconductor to visualize and compare MAGeCK results - CRISPRball. At worse, it'll help you make pretty pictures.

ADD REPLY

Login before adding your answer.

Traffic: 3544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6