I would like to generate count table from my crispr screen fastq files. I have 248 genes which have their unique sgRNA sequence. I know that the total sequence should be " known sequence 1 + sgRNA sequence + known sequence 2"
Those known sequence 1 and 2 are in the array sequence column so I know their content and length. In order to generate the count table I have to trim the 5' and 3' sequences and count sgRNA sequence based on the library that I have.
Mageck takes 3 inputs for the trimming. 5' trim length, sgrna sequence library(csv) file) and adaptor sequence(says its optional). I thought that known sequence 1 should be trimmed with 5' trimming, sgRNA will be taken care of with sgrna sequence library and adaptor will be the known sequence 2. ( This could be problematic)
When I use mageck for trimming and mapping, my mapping rate appears to be very low ~8%. I think this caused by the lengths that I give to the trimming process. I checked individual reads in the fastq file and I saw that total sequence does exists in the reads conserved, but they have additional sequences on their 5' and 3', so when I enter a sequence length for trimming, it might fell short and considers the read as unmapped.
Could you guide me to find out where is the problem or how can I solve this issue?
Thank you very much,