Question: How to get mutation frequency from exome sequencing data?
1
gravatar for mangfu100
4.8 years ago by
mangfu100730
Korea, Republic Of
mangfu100730 wrote:

Hi all.

I am trying to understand the mutation frequency to annotate my exome sequence data.

Since I look it up on the Internet but I tried to fail how to calculate it.

To make it matter worse, I didn't know the actual meaning of mutation frequency( Wiki seems to explain their definition but I didn't get it..it is too formal) and why this information is important when analyzing mutation. 

Could anyone tell me a little bit about their basic meaning and equation to get it? 

sequencing sequence • 4.0k views
ADD COMMENTlink modified 4.8 years ago by ethan.kaufman360 • written 4.8 years ago by mangfu100730
3
gravatar for ethan.kaufman
4.8 years ago by
ethan.kaufman360
Canada
ethan.kaufman360 wrote:

Frequency is just a count.  It is usually normalized to some fixed unit of time or space to enable comparison with other counts.  Mutation frequency can conceivably refer to many things depending on the context:

  • Number of mutations per sample/per Mb/per gene, etc
  • Number of samples in which a particular mutation is observed
  • Percent of reads that support a particular mutation
  • Number of mutant alleles in an individual or population (usually called "allele frequency")

Really, you need to define for yourself what it is you want to calculate.  The calculation itself should then be straightforward.

ADD COMMENTlink written 4.8 years ago by ethan.kaufman360

Thank you for your comments.

What I would like to to is a case that # of mutation / per MB.

In this case, how to calculate the MB? 

Does it simply refer to sum of entire chromosome's length from 1 to 22? (or just considering only exome length?)

ADD REPLYlink written 4.8 years ago by mangfu100730

In your case you would use the exome length.  If you have a bed file for the captured regions, then this should be pretty easy.  If not, you can compute the genome coverage from the bam file with bedtools and then add up the regions that have depth above a minimum threshold.
 

ADD REPLYlink written 4.8 years ago by ethan.kaufman360

Thanks for you reply.

Fortunately, I have a bed for the exome sequencing.

my bed files are composed of four columns as follows:

GENE START END EXON_NAME

As you mentioned, is it right to sum up each (END-START) corresponding each exon and then divide them by mutation that I found?

ADD REPLYlink written 4.8 years ago by mangfu100730
1

That should be a good enough approximation, yes, assuming all the mutations called are within the exome regions.  Minor point: the length of each region is END-START+1 

ADD REPLYlink written 4.8 years ago by ethan.kaufman360

Thanks!

Your comments will be very helpful in my research.

I will try it :)

ADD REPLYlink written 4.8 years ago by mangfu100730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour