Question: Identify insertions with IGV
0
gravatar for marongiu.luigi
9 weeks ago by
Germany, Mannheim, UMM
marongiu.luigi330 wrote:

Dear all,

is it possible to visualize insertions in a sequence?

I have prepared a simulated sequence of the mitochondrial genome from the release hg38 by placing non human sequences right in the middle of it (position 8284). I then aligned the simulated genome to the mitochondrial index and the visualized the alignment with the integrated genome viewer IGV. However, I don't see any sign of insertions in the figure.

enter image description here

Is there a way to highlight the insertion point? Maybe by showing only clipped reads or the reads that map only on one mate?

Thank you.

igv visualization insertions • 292 views
ADD COMMENTlink written 9 weeks ago by marongiu.luigi330
2

IGV is meant to visualize your alignment. It is not a variant caller. Appropriate tools for SV identification exist, e.g. lumpy

ADD REPLYlink written 9 weeks ago by WouterDeCoster32k
1

In IGV, insertions are represented with I. I can see a bunch of purple I in your snapshot. Please refer to: http://software.broadinstitute.org/software/igv/AlignmentData for more info.

Insertions
In a gapped alignment, IGV indicates insertions with respect to the reference with a purple I () or red I for  insertions greater than a user activated and specified cutoff.  Hover over the insertion symbol to view the inserted bases.
ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Sej Modha3.7k

sorry, I forgot to mention that -- in order to differentiate the simulated sequence from the original -- I also generated random mutations in the sequence. The Is might simply who that. One might also argue that the coloured reads might mark the insertion point, but there are other regions with such colouring (not reported in the figure), so it is not a specific marker.

ADD REPLYlink written 9 weeks ago by marongiu.luigi330

sorry, I forgot to mention that -- in order to differentiate the simulated sequence from the original -- I also generated random mutations in the sequence. The Is might simply who that

So is this real data salted with simulated reads or just plain simulated reads?

ADD REPLYlink written 9 weeks ago by genomax57k

The procedure was this: I split the mitochondrial fasta file from grch38 into two pieces and merged the non human sequence in between. then I used EMBOSS to introduce random mutations and then ART to generate fastq pair mates. I then used BWA MEM to align to the mitochondrial index (prepared with BWA index for the original grch38 mitochondrial fasta).

ADD REPLYlink written 9 weeks ago by marongiu.luigi330

merged the non human sequence in between.

What was the length of this sequence? When you are referring to insertions are you referring to single bp or something longer like the actual size of the non-human sequence you inserted.

ADD REPLYlink written 9 weeks ago by genomax57k

I placed a stretch of 4000 bases from Parvovirus B19 after base 8284 of the mitochondrion, then introduced 500 mutattions with msbar.

ADD REPLYlink written 9 weeks ago by marongiu.luigi330
1

Take a look at the "Detecting structural variants" section on this IGV help page.

ADD REPLYlink written 9 weeks ago by genomax57k

The figure I get after colouring for the INSERT SIZE (and INSERT SIZE AND PAIR ORIENTATION) is this: enter image description here

With a bit of imagination, one could argue that there is a purple blob in the centre of the genome, where should be the insertion point. This is the enlargement: enter image description here would this be enough to say that IGV suggests a large insertion event?

ADD REPLYlink written 9 weeks ago by marongiu.luigi330

Depends on your context , if it's "somatic" insertion could be .... By the way your alignment is full of insertion ( first picture ) is it still simulated reads ?

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Titus740

yes. since there are 3 types of mutations in msbar (insertion, deletion, substitutions), there should be in theory 500/3 insertions points.

ADD REPLYlink written 9 weeks ago by marongiu.luigi330

You artificial insertion is too big to be picked by IGV, and also to big to affect insert size, as it is probably larger than the simulated insert size. In this scenario, what you would have is an increase of one mate mapped, the other unmapped, close to the insertion point. You could argue there is an insertion larger than your sequencing insert size, but without further data, you can't say how much larger.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by h.mon20k
1

Do you mean highlight the insertion point in the coverage bar ? So why don't you use another way to check insertion point ( based on coverage insertion rate with IGVtools or variant caller) and then check it on IGV ?

ADD REPLYlink written 9 weeks ago by Titus740

I thought IGV might show reads that have peculiar behaviour such as those with soft clips or a single mate mapped. If there are other tools, I will be happy to use them...

ADD REPLYlink written 9 weeks ago by marongiu.luigi330

You have to enable "show soft clipped bases" in IGV preferences.

ADD REPLYlink written 9 weeks ago by genomax57k

yes I did. The figure reports clipped reads included

ADD REPLYlink written 9 weeks ago by marongiu.luigi330
3
gravatar for h.mon
9 weeks ago by
h.mon20k
Brazil
h.mon20k wrote:

I see evidence of the "transgene" insertion: all those identical soft-clipped bases centered at the position you inserted the non-human sequence. Pay attention: 1) all reads are soft-clipped at the same reference position, 2) as far as I can tell, all soft-clipped bases are identical between different reads.

Look at the picture below. The big red arrow indicates the insertion point, and the darkened rectangles indicate the inserted sequence (which I was able to determine as parvovirus by blasting them, even before you told us it was parvovirus).

image3721

However, keep in mind this visual inspection works well because you have a simple, small and with no duplications reference genome, and a simple and small insertion, without other copies of it throughout the reference genome. As WouterDeCoster pointed above, there are better methods to identify structural variation events in more complex scenarios.

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by h.mon20k

OK then, IGV is not the tool for checking insertion sites. I will use other tools. If there are other suggestions over lumpy I will be happy to check them. Thank you.

ADD REPLYlink written 9 weeks ago by marongiu.luigi330
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1550 users visited in the last hour