I've been running MageckRRA and providing a list of negative control sgRNAs using --norm-method control
and --control-sgrna negative_ctrls.txt
In my gene_summary
file I see these negative controls appearing as the top high and low fold change genes. For one of the sets of samples I ran RRA on the log file includes a lot of 'Skipping gene ... for permutation ...' messages, but for another set of samples it apparently didn't do this.
My colleague said the negative controls should not appear in the gene summary. Are they correct? Is there something I need to change to get the negative controls out of the results? Is this likely to be a code issue or an issue of poor quality data?
Thank you!
Please provide code and output examples, textual descriptions are hard to debug. IIRC when I used MAGeCK RRA for shRNA screening it did not return controls in the output. But I wanted to also see the stats for the negative controls, so I duplicated negative controls and added a "_1" to the duplicates. The original names I provided to the normalization
--norm-method control --control-sgrna
parameters, so normalization and permutation distribution where based on them. Still the duplicated ones would come back in the output so I could double-check that most controls were not called as significant. But yes, "normally" they should not be in results.Sorry! Will include code next time!
Fyi, my current solution is to stop using RRA and just also use MLE on direct sample to sample comparisons.
I'm having the same problem: in my
gene_summary.txt
, these negative controls appear.I used the following script:
The file CONTROL_SGRNA.csv contains (and no errors in output) :
Or this one:
And, I had previously tested as recommended by Mageck's wiki here :
But none of them work
What is the right format ?
To make sure I'm using the right method, is it normal to have NonTargetingControlGuideForHuman in the Human_GeCKOv2_Library_combine_2.csv library supplied by Mageck? Should these lines be removed?
The file provided to
--control-sgrna
should contain only the sgRNA IDs in a single column. Currently, you're providing the gene IDs. The identifiers should be theHGLibB_57029
containing column. And no, you should not remove them from the library file.OK, thanks a lot, that's the first column.
EDIT: I just tried with a .txt file containing:
with 1000 sgRNA IDs (HGLibB_....), but it still doesn't work. I am using Mageck 0.5.9.5 in a conda environment.
Perhaps I haven't understood what the right format should be. My script contains :
EDIT2: the warning concerns the number of sgRNAs found in the count table. I misread it.
Then run without providing
--control-sgrna
and check the resulting counts table to see what's going on.