Flanking sequences around cancer variants
1
0
Entering edit mode
5.6 years ago
Gene_MMP8 ▴ 240

I want to analyze the flanking sequences around cancer variants. I have already downloaded cancer variants from COSMIC database. Now I am planning to extract variants from the reference build of the variant database (hg38 in this case) using bioMart. But I am having doubts regarding whether the flanking sequences properly represent what we expect from a cancer genome? For example:
AAGCT, here AA and CT are flanking sequences. What if in the original cancer bam file containing the variant G at that very position, AA and CT are mutated to AG and CA. In other words, if I want to study flanking sequences of cancer variants, is it a good idea to extract these sequences from the reference build? How much variation in the data am i losing just by doing this?

genome snp cancer_variants • 1.2k views
ADD COMMENT
0
Entering edit mode

What is the question that you want to answer?

ADD REPLY
3
Entering edit mode
5.6 years ago

In other words, if I want to study flanking sequences of cancer variants, is it a good idea to extract these sequences from the reference build?

2 options to deal with this:

  1. You can make a consensus sequence from your variant file and the reference genome, then do your analysis if you'd like. It's pretty easy to do with GATK.
  2. When writing whatever you plan to do for the flanking sequence, check if any of the variants in your VCF file also lie in the flanking sequence and adjust the sequence as necessary if so. More annoying, potentially more informative since you can easily track how often that is occurring.

How much variation in the data am i losing just by doing this?

This is a lot tougher to answer without more info - number of variants, are you looking only at SNPs or indels as well, what sort of analysis are you running on the flanking sequence, etc. More info will yield more/better answers.

ADD COMMENT
0
Entering edit mode

thanks for your comment. It makes sense to build a consensus sequence and extract the flanking sequence. We should take steps to ensure that cancerous variants are not included in the flanking sequences. For the variation question, I will reply to this thread if I have more information.

ADD REPLY

Login before adding your answer.

Traffic: 2536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6