Question

Method of Checking for Mutation Patterns

0

Entering edit mode

7.1 years ago

L. A. Liggett ▴ 120

I am trying to think of an approach to understand mutation biases in my data if they exist, but I can't think of a good method. The idea is that I see particular base change biases in my genome sequencing data which match similar published data for example C mutates to T more often than C mutates to A. It is relatively straightforward to just look at the output vcf file and get this information.

However, now I would like to look for more complex patterns. For instance perhaps C often changes to T but the majority of the time this is only in the context of ACA -> ATA, because that the surrounding bases influence the error rate. Similarly we could imagine that any number of surrounding bases might influence the error rate such that perhaps AAACAAA -> AAATAAA is the most prevalent C -> T change.

So I am looking for some guidance or suggestions on how to proceed with an analysis like this. I know some labs have performed and published this type of data but I can't think of how to do it myself.

sequencing • 1.8k views

ADD COMMENT • link updated 7.1 years ago by Charles Yin ▴ 180 • written 7.1 years ago by L. A. Liggett ▴ 120

score 1 · Answer 1 · 2017-04-04

1

Entering edit mode

7.1 years ago

igor 13k

I don't know if there is already a tool that specifically does this. Since you already have a list of variants, you can use something like bedtools getfasta to retrieve the surrounding sequence for each one. That gives you the genomic context that you would then have to summarize.

ADD COMMENT • link 7.1 years ago by igor 13k

0

Entering edit mode

Oh cool I wasn't aware of bedtools getfasta. That is helpful.

ADD REPLY • link 7.1 years ago by L. A. Liggett ▴ 120

score 1 · Answer 2 · 2017-04-04

1

Entering edit mode

7.1 years ago

Charles Yin ▴ 180

You may try NNMF (Non-negative matrix factorization) to analyze the mutation signature of the mutation patterns. The signatures are based on k-mer pattern of the mutations. The following is good reference (Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J., & Stratton, M. R. (2013). Deciphering signatures of mutational processes operative in human cancer. Cell reports, 3(1), 246-259.) link to the paper

ADD COMMENT • link 7.1 years ago by Charles Yin ▴ 180

0

Entering edit mode

You know this was one of the papers i was remembering, but i didn't remember that they linked to an explanation of how to run the analysis; I see that now, hopefully this will help me solve my problem. Thanks.

ADD REPLY • link 7.1 years ago by L. A. Liggett ▴ 120