Question

Advice on aligning error prone NGS reads to a limited number of templates

0

Entering edit mode

16 months ago

Kristy • 0

Hello,

I am looking for advice on aligning error prone NGS reads to a limited number of templates.

I have sequencing data from a 2 x 250bp NovaSeq run (350M reads in each direction) and am interested in a 26bp region near the end of R2. This 26bp region corresponds to my CRISPRi oligos which I need to identify.

I have managed to trim the R2 sequence to the 26bp region I am interested in using cutadapt. The mean Phred score for this region is around 25 according to FastQC. See example:

Example of Per base sequencing quality of R2

I have 280 possible sequences that this region could correspond to (my CRISPR oligo library).

I have attempted to use bowtie 1 for alignment, using the oligo library as my “index” but only ~30% of reads mapped to oligos using this method. (I chose bowtie 1 as bowtie 2 is optimised for reads of >50bp and my reads are 26bp.)

I want to leverage the small size of my index (only 280 possibilities) to improve the % of mapped reads.

I have tried to do this manually by calculating the Levenshtein distance between every read and all 280 guides, filtering for the top 2 matches and manually choosing thresholds for an acceptable best call. My final filters are a Levenshtein distance of no more than 6 for the best match and a difference in Levenshtein distances between the best and second best read of at least 4. These were determined empirically by examining a small subset of my data. This process is very slow and the thresholds unvalidated. However, the advantage is that it allows me to accept alignments with many errors, as long as it is sufficiently better than the second-best match.

I am looking for advice on whether any validated tools or pipelines are available to find the best possible match of a limited number of templates to an error prone region. Or for suggestions on how to improve the above process.

Thank you in advance for your help!

Kristy

NGS alignment bowtie • 922 views

ADD COMMENT • link updated 16 months ago by GenoMax 141k • written 16 months ago by Kristy • 0

0

Entering edit mode

am interested in a 26bp region near the end of R2.

Then cut these bases out before you align. You can do that with bbduk.sh like this:

bbduk.sh -Xmx2g in=your.fastq.gz out=26bp.fq.gz forcetrimleft=NN

Adjust trimleft number to keep the 26 bases you want.

You can then easily extend this to do the alignments as well

bbduk.sh -Xmx2g in=your.fastq.gz out=stdout.fq forcetrimleft=NN | bbmap.sh -Xmx4g in=stdin.fq ref=your_targets.fa out=align.sam ambig=all

ADD REPLY • link 16 months ago by GenoMax 141k

0

Entering edit mode

Thank you for this suggestion GenoMax!

I introduced stagger to the beginning of my reads so fixed length trimming will not work in this case. But the sequence based trimming from cutadapt seems to be working very well.

I have not tried bbmap for alignment, that you for the suggestion. I will have a go and see how it performs.

ADD REPLY • link 16 months ago by Kristy • 0

score 0 · Answer 1 · 2022-12-09

0

Entering edit mode

16 months ago

barslmn ★ 2.1k

You might want to take a look at CRISPResso2. http://crispresso2.pinellolab.org/help

ADD COMMENT • link 16 months ago by barslmn ★ 2.1k

0

Entering edit mode

Thanks for your suggestion barslmn.

I considered CRSPResso, however it was designed for analysing genome-editing experiments while this is a gene knockdown experiment (CRISPR interference). My understanding is that CRISPResso is looking for edits in the region of interest, while I am looking to assign reads to guides. I will see if I can find any examples of it being adapted for CRISPRi work.

ADD REPLY • link 16 months ago by Kristy • 0

score 0 · Answer 2 · 2022-12-12

0

Entering edit mode

16 months ago

GenoMax 141k

while I am looking to assign reads to guides.

Then try https://github.com/afombravo/2FAST2Q

ADD COMMENT • link 16 months ago by GenoMax 141k