Estimate large insertions size in WGS data.
0
0
Entering edit mode
14 months ago
meetmet ▴ 10

Hello,

I'm currently doing my master’s internship in a genomic diagnostic lab. I was tasked with building a Snakemake workflow to detect large insertions in whole-genome sequencing (WGS) data within fragile sites.

My question is regarding the final output of my Snakemake workflow, which is a BED file containing all putative insertions (INS) larger than 50 base pairs. I would like to add an additional annotation to this BED file for the inserted sequence size at each INS breakpoint, but I’m unsure if my algorithm makes sense.

Here’s my idea:

  • For each breakpoint, use samtools to extract discordant reads with mates mapping to another chromosome.
  • Check if there is a consensus chromosome origin for these discordant mate reads.
  • If a consensus chromosome is identified, extract the forward and reverse out-mapping discordant mate reads on this chromosome.
  • Estimate the insertion length by subtracting the position of the rightmost forward read from the position of the leftmost reverse read.

Does this approach make sense?

                                                        ILLUSTATION

Reference genome:

-------------------------------------| bp |---------------------------------------

r1______________--------

r2 _________________________ --------

Actual inserted sequence:

_start

r1-2 __ Rev |---------------


r2-2 __________________________--------------| Forv

_stop

Fomula: size_ins = stop - start

Of course the insertes sequence should come from the same genome.

Cheers.

NGS Diagnostic SV WGS • 505 views
ADD COMMENT
1
Entering edit mode

Can you format the Illustration a lot better than what is there now? This may be one instance where uploading an image may be significantly better than trying to format this as code in Biostars edit window.

ADD REPLY

Login before adding your answer.

Traffic: 3392 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6