Question

Find mutations that are spatially clustered?

1

Entering edit mode

4.1 years ago

Rubal ▴ 350

I have been creating rainfall plots following the tutorial from https://bernatgel.github.io/karyoploter_tutorial//Examples/Rainfall/Rainfall.html

Does anyone know how I could extract only mutations with a high degree of spatial clustering from the GRanges object? For example if I want to create a new object which is a list of all mutations that are within 1kb of another mutation. The output would be in this format for input into another program:

Chr start end ref alt
1 10 10 A T
1 50 50 C G
2 181 181 A T
2 280 280 T A

I also have the raw list of all variants in the format above that becomes a GRanges object, so it might be easier just to use a function that works directly on that to find clustered variants? I imagine a tool must already exist to identify variants that are spatially clustered in the genome, but I have not yet found one.

Thanks in advance for suggestions.

genome variants clustering GRanges • 1.2k views

ADD COMMENT • link updated 4.1 years ago by bernatgel ★ 3.4k • written 4.1 years ago by Rubal ▴ 350

0

Entering edit mode

Most probably not an exact answer but you can look at Andreas Wagners paper (here https://www.genetics.org/content/176/4/2451 ) The perl script to find clusters of SNPs is available here https://www.ieu.uzh.ch/wagner/publications-software.html

ADD REPLY • link 4.1 years ago by microfuge ★ 1.9k

score 2 · Accepted Answer · 2020-03-13

2

Entering edit mode

4.1 years ago

bernatgel ★ 3.4k

Hi Rubai,

If you are already using kpPlotRainfall you can take advatage of the distances it computes. You should call it and assign the result to kp again

kp <- kpPlotRainfall(kp, data=your.data, [...])

and then you can get the distances with

distances <- kp$latest.plot$computed.values$distances

The distances are in log10, so if you want just the variants that are closer than 1kb to the next variant you should

your.data[distances<3,]

And that should do it.

Hope this helps

ADD COMMENT • link 4.1 years ago by bernatgel ★ 3.4k

0

Entering edit mode

This is great but when I try your.data[distances<3,] I get the error:

Error in your.data[distances < 3, ] : 
  'list' object cannot be coerced to type 'double'

I've tried doing this first:

distances <- as.numeric(unlist(distances))

This seems to do the trick, does that seem valid to you?

Thanks very much

ADD REPLY • link 4.1 years ago by Rubal ▴ 350

0

Entering edit mode

Also something conceptual I don't understand, the code can output a single mutation for a chromosome but if mutations are spatially clustered then by definition there must be at least 2 mutations that are close together. Is this just giving the first mutation if 2 mutations are < 1kb apart? I would like to have all mutations that are < 1kb apart. Maybe I misunderstand what is happening?

ADD REPLY • link 4.1 years ago by Rubal ▴ 350

0

Entering edit mode

Hi @Rubai,

The list has one GRanges per chromosome. You unlist approach is valid.

Also yes, the mutations that it returns are the ones that are closer to 1kb to the NEXT mutation. In some cases missing the last mutation of a cluster of several mutations could be acceptable. If you want them you could add the next position to the list with something like

distances <- kp$latest.plot$computed.values$distances
distances <- as.numeric(unlist(distances))
close.muts <- distances<3
close.muts <- unique(c(close.muts, close.muts+1))
your.close.muts <- your.data[your.close.muts,]