Question: Flanking sequences around cancer variants
0
gravatar for banerjeeshayantan
9 months ago by
banerjeeshayantan110 wrote:

I want to analyze the flanking sequences around cancer variants. I have already downloaded cancer variants from COSMIC database. Now I am planning to extract variants from the reference build of the variant database (hg38 in this case) using bioMart. But I am having doubts regarding whether the flanking sequences properly represent what we expect from a cancer genome? For example:
AAGCT, here AA and CT are flanking sequences. What if in the original cancer bam file containing the variant G at that very position, AA and CT are mutated to AG and CA. In other words, if I want to study flanking sequences of cancer variants, is it a good idea to extract these sequences from the reference build? How much variation in the data am i losing just by doing this?

snp cancer_variants genome • 313 views
ADD COMMENTlink modified 9 months ago by jared.andrews072.5k • written 9 months ago by banerjeeshayantan110

What is the question that you want to answer?

ADD REPLYlink written 9 months ago by ATpoint18k
3
gravatar for jared.andrews07
9 months ago by
jared.andrews072.5k
St. Louis, MO
jared.andrews072.5k wrote:

In other words, if I want to study flanking sequences of cancer variants, is it a good idea to extract these sequences from the reference build?

2 options to deal with this:

  1. You can make a consensus sequence from your variant file and the reference genome, then do your analysis if you'd like. It's pretty easy to do with GATK.
  2. When writing whatever you plan to do for the flanking sequence, check if any of the variants in your VCF file also lie in the flanking sequence and adjust the sequence as necessary if so. More annoying, potentially more informative since you can easily track how often that is occurring.

How much variation in the data am i losing just by doing this?

This is a lot tougher to answer without more info - number of variants, are you looking only at SNPs or indels as well, what sort of analysis are you running on the flanking sequence, etc. More info will yield more/better answers.

ADD COMMENTlink modified 9 months ago • written 9 months ago by jared.andrews072.5k

thanks for your comment. It makes sense to build a consensus sequence and extract the flanking sequence. We should take steps to ensure that cancerous variants are not included in the flanking sequences. For the variation question, I will reply to this thread if I have more information.

ADD REPLYlink modified 9 months ago • written 9 months ago by banerjeeshayantan110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1052 users visited in the last hour