How to identify 100ish deletion from NGS data
1
0
Entering edit mode
2.3 years ago
haewon • 0

Dear Biostars,

I generated knock-out cell lines using CRISPR-Cas9 with 2 sgRNAs targeting 100bp apart. Then I identified knockout clones by PCR. Instead of sequencing all 75 clones, I decided to combine all clones together and send samples for targeted sequencing. I mapped the reads to the region of interest - instead of mapping to whole genome - using bwa mem default parameter. However instead of detecting deletion, bwa clipped the unmapped sequence. So my questions are,

1.Do you have any suggestions of which mapper I should use to detect 100bp deletion? It would be great if you have any suggestions for parameters to detect 100bp deletion.

2.Is there any software to identify how many different variants are in my samples?

CRISPR targeted sequencing • 578 views
0
Entering edit mode

Use bwa mem against the entire genome and then a deletion caller such as Manta, Delly or any tool of your choice. bwa is a mapper, not a deletion caller so yes, it will clip unmapped bases. These are then in turn the starting point for the variant callers to actually call the variants. What does targeted sequencing mean here? NGS with a custom panel? The question is though why you did not simply screen the clones in a 96-well format PCR for presence of the deletion (standard 3-primer set, 1+2 binding outside the region and 3 a reverse primer inside the region to only amplify together with 1 in case of no deletion) and then sent promising clones to Sanger sequencing with a primer close to the expected breakpoints.

0
Entering edit mode

1. Targeted sequencing means NGS sequencing(MySeq) of PCR amplicon. Because I know approximately what my sequences are, I thought I don't really need to map them to entire genome.
2. I have already screened my clones with PCR. Since there is a large deletion (100bp) it was easy to detect KO from wild type. So I already know that there is ~100bp deletion. The reason I wanted to run NGS sequencing is because I want to know exact sequence of the breakpoint and since I have combined 75 clones together I hope to know how many different variants present in my pooled clones.
3. bwa mem gives data something like 200H48M. So I wasn't sure I can use this mapped reads to detect deletion since the rest of reads are clipped out. What I was expecting from mapper was something like 50M100D50M which still retains the sequence.

Anyway thanks for suggestions for the tools. I'll take a look and try them.

1
Entering edit mode

I'm pretty sure bwa-mem won't find a 100 bp deletion, the algorithm just isn't designed for that. You could try with a splice aware aligner, those are designed to look for large gaps.

0
Entering edit mode

Any recommendation for splice aware aligner? I searched and found only RNA-seq aligners including STAR and Tophat2. I couldn't figure out how I can use these tools with DNA-seq data.

0
Entering edit mode

Why don't you think those will work?

1
Entering edit mode
2.3 years ago

You might try blat against a fasta of your target regions. It will handle the gaps, and then you can parse and consolidate the outputs to see where the breakpoints are. I know that seems silly to not use an NGS algorithm, but with a tiny reference, the blat should work fast, and it will figure out the deletions.

0
Entering edit mode

I have never thought that way and realized that blast is quite good at handling large deletion. Thanks for the suggestion.