CRISPR/Cas9 screen analysis: reads R1 and R2 mixed

0

Entering edit mode

3.8 years ago

Swimming bird ▴ 20

Hi! I want to analyse data from a CRISPR/Cas9 screen (control vs. treatment) and I'm using Mageck (https://sourceforge.net/projects/mageck/). The sequencing was performed using Illumina (paired-end).

The problem is that I've noticed that in R1 fastq files the half of the reads containing the sgRNA are R2 (in R2 files there are R1 reads as well). Should I consider these reads in the sgRNA count?.

crispr screen reads illumina sequencing • 1.6k views

ADD COMMENT • link 3.8 years ago by Swimming bird ▴ 20

1

Entering edit mode

in R1 fastq files the half of the reads containing the sgRNA are R2 (in R2 files there are R1 reads as well)

How did that happen? Always best to go back and get original data files in cases where you are in doubt.

ADD REPLY • link 3.8 years ago by GenoMax 141k

0

Entering edit mode

I don't know because the sequencing was commissioned but I think it could be due to the ligation: https://seekdeep.brown.edu/illumina_paired_info.html

ADD REPLY • link 3.8 years ago by Swimming bird ▴ 20

1

Entering edit mode

Are you saying that you have short inserts (these being sgRNA) so R1/R2 are likely to overlap (i.e. there is no mixing per se)? You could just use R1 read or look into a tool like bbmerge.sh from BBTools that can merge R1/R2 reads. You can then trim adapters and then count the consensus sequence produced (or align to a reference and then count).

ADD REPLY • link 3.8 years ago by GenoMax 141k

0

Entering edit mode

No, there is a mixing because R1 files contain R2 reads.

ADD REPLY • link 3.8 years ago by Swimming bird ▴ 20

1

Entering edit mode

If they are truly mixed and you want to separate the reads (and if they have standard Illumina headers then you can do something like):

grep -A 3 "1:N:0" original.fq > R1.fq
grep -A 3 "2:N:0" original.fq > R2.fq

ADD REPLY • link 3.8 years ago by GenoMax 141k

Login before adding your answer.