Tool/method to calculate the sequence context of variants?
2
0
Entering edit mode
3.0 years ago
Ian 6.0k

Hi,

Before I go trying to reinvent the wheel I am wondering if anyone knows of a handy tool to answer the following.

I am trying to replicate a figure the shows the sequence context of specific variants (e.g. C to A SNVs). So for all C to A variants the figure shows the frequency of event where the variant lies between AA, AC, AG, AT, CA, CC, CG, CT, etc.

This relates to figure 3 of 'https://doi.org/10.1038/s41598-020-61807-4' if you have access.

The long way round is to extract the three base sequence for every variant, but is there a tool that generates this type of information?

Thank you.

sequence context variant • 1.2k views
ADD COMMENT
1
Entering edit mode
3.0 years ago
Ian 6.0k

I have subsequently found my question has been answered before. Extract SNPs flanking sequences based on VCF and genome Fasta files

However I am going with my original plan of using:
bedtools flank
bedtools getfasta
a python script to obtain the flanking nucleotides, add each unique pairing to a dictionary, and tally the occurrences

ADD COMMENT
0
Entering edit mode
3.0 years ago
heskett ▴ 110

Yes, there are plenty of tools that calculate mutation signatures, by Alexandrov and others and they create these 3mer tables for all variants and might even make the plot you're talking about by default. I havent used this one but it came up first on google

https://www.nature.com/articles/s41598-020-58107-2

ADD COMMENT
0
Entering edit mode

I'll look into that thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6