How do I detect deletions?
3
0
Entering edit mode
2.8 years ago
gt ▴ 30

Hi there, I have been working on this project with DNA-seq samples from E.coli data. I am trying to see if there are any indels or deletions on a small contig in the .fa file and I viewed one of my BAM files in IGV. Below is a screenshot.

enter image description here

In the IGV manual, it says the red pairs are deletions. I have tried to use bcftools mpileup and bcftools call to identify these deletions but am having no success. I have also tried freebayes and HaplotypeCaller and still have no success. Any help is appreciated.

bcftools DNA-seq calling variant • 3.5k views
ADD COMMENT
0
Entering edit mode

What is your definition of a deletion? Which size do you expect?

ADD REPLY
3
Entering edit mode
2.7 years ago
bernatgel ★ 3.4k

Hi gt

In the image you included, You can see that the red pairs have an insert size larger than expected. This MIGHT be due to the presence of a deletion in your sample, since the reads would be at the expected distance in your sample, but further separated in the reference dues to the extra part of the genome your sample is missing. The fact that there are overlapping reads with the expected insert size suggest that the deletion would be heterozygous.

Variant callers (such as freebayes) can detect only small deletions (up to ~30 - 50 bp, depending on the tool) and for larger events you'll need to use a structural variant caller (Lumpy, for example, but many others exist).

I'd suggest first to tell IGV to show you the soft clipped reads (https://software.broadinstitute.org/software/igv/Preferences). If a real deletion is present, you'll see an accumulation of soft clipped reads showing you the exact breakpoints of your deletion (and the actual soft-clipped bases should map to the other side of the deletion)

UPDATE: An example to clarify the soft-clipped reads

In this image (from this paper)

Example of deletion in IGV with soft-clipped reads

there are two samples with deletions* in the same region (CDKN2A) . The top one has two nested deletions. These stripes of multicolor bases are the soft-clipped parts of the reads (not shown by default in IGV, I reaaaally recommend activating them). In the places where there's an accumulation of soft-clipped reads, we can see many reads that match perfecty up to a point and then do not match at all (soft-clipped part). This marks the exact position of the breakpoint, and the soft clipped sequence should be the same for all reads and match the sequence at the other side of the opposite breakpoint.

Is it a bit more clear?

Bernat

(*) Actually, in this case these are really translocations, but the image you'll see for deletion will be basically the same.

ADD COMMENT
0
Entering edit mode

Yes this does thank you. I'm still a little confused by the last paragraph. Could you give an example?

ADD REPLY
0
Entering edit mode
2.8 years ago
MSRS ▴ 580

If you have fastq files (raw read) and a reference (.gb file), then run snippy. You will get all types of SNP, indel with respect to reference in a txt file.

ADD COMMENT
0
Entering edit mode

It looks like snippy uses freebayes for the variant calling which I have already tried :(

ADD REPLY
0
Entering edit mode
2.8 years ago
xiaoguang ▴ 140

I didn't see any indels on the IGV figure. Deletions should show as black line in one read.

ADD COMMENT

Login before adding your answer.

Traffic: 3327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6