Question

CRISPR screen Trimming with mageck low mappability

1

Entering edit mode

7.8 years ago

morovatunc ▴ 550

Hi,

I would like to generate count table from my crispr screen fastq files. I have 248 genes which have their unique sgRNA sequence. I know that the total sequence should be " known sequence 1 + sgRNA sequence + known sequence 2"

Those known sequence 1 and 2 are in the array sequence column so I know their content and length. In order to generate the count table I have to trim the 5' and 3' sequences and count sgRNA sequence based on the library that I have.

Mageck takes 3 inputs for the trimming. 5' trim length, sgrna sequence library(csv) file) and adaptor sequence(says its optional). I thought that known sequence 1 should be trimmed with 5' trimming, sgRNA will be taken care of with sgrna sequence library and adaptor will be the known sequence 2. ( This could be problematic)

When I use mageck for trimming and mapping, my mapping rate appears to be very low ~8%. I think this caused by the lengths that I give to the trimming process. I checked individual reads in the fastq file and I saw that total sequence does exists in the reads conserved, but they have additional sequences on their 5' and 3', so when I enter a sequence length for trimming, it might fell short and considers the read as unmapped.

Could you guide me to find out where is the problem or how can I solve this issue?

Thank you very much,

Best,

Tunc.

crispr screen trimming mapping • 3.9k views

ADD COMMENT • link updated 7.6 years ago by Biostar 20 • written 7.8 years ago by morovatunc ▴ 550

0

Entering edit mode

If you know sequence 1 and sequence 2 then perhaps the better option would be to use BBDuk.sh from BBMap suite. You can provide the two sequences in a file (or as literal= option) that way you don't need to be dependent on length based trimming. You can also specify which side the sequences should be trimmed on. A comprehensive thread describing BBDuk is available.

ADD REPLY • link 7.8 years ago by GenoMax 141k