CRISPR/Cas9 screen analysis: reads R1 and R2 mixed
0
0
Entering edit mode
15 months ago

Hi! I want to analyse data from a CRISPR/Cas9 screen (control vs. treatment) and I'm using Mageck (https://sourceforge.net/projects/mageck/). The sequencing was performed using Illumina (paired-end).

The problem is that I've noticed that in R1 fastq files the half of the reads containing the sgRNA are R2 (in R2 files there are R1 reads as well). Should I consider these reads in the sgRNA count?.

crispr screen reads illumina sequencing • 593 views
1
Entering edit mode

in R1 fastq files the half of the reads containing the sgRNA are R2 (in R2 files there are R1 reads as well)

How did that happen? Always best to go back and get original data files in cases where you are in doubt.

0
Entering edit mode

I don't know because the sequencing was commissioned but I think it could be due to the ligation: https://seekdeep.brown.edu/illumina_paired_info.html

1
Entering edit mode

Are you saying that you have short inserts (these being sgRNA) so R1/R2 are likely to overlap (i.e. there is no mixing per se)? You could just use R1 read or look into a tool like bbmerge.sh from BBTools that can merge R1/R2 reads. You can then trim adapters and then count the consensus sequence produced (or align to a reference and then count).

0
Entering edit mode

No, there is a mixing because R1 files contain R2 reads.

1
Entering edit mode

If they are truly mixed and you want to separate the reads (and if they have standard Illumina headers then you can do something like):

grep -A 3 "1:N:0" original.fq > R1.fq
grep -A 3 "2:N:0" original.fq > R2.fq