Question: Getting total sequenced megabases for each sample
0
gravatar for A
9 days ago by
A3.8k
A3.8k wrote:

Hello

I have total number of mutation per sample (whole genome sequencing)

I want to convert that to total number of mutations per giga base

I googled for that

People say I should divide the numbers of mutations by the total sequenced megabases for each sample

But I am not sure what is the total sequenced megabases for each sample here and how I can get that

About my sample I have this information

"Target sequencing depth was 50x coverage for tumours and 30x coverage for normal samples, with 94% of the known genome being sequenced to at least 8× coverage while achieving a PHRED quality of at least 30 for at least 80% of mapping bases"

Can you help me?

vcf genome • 94 views
ADD COMMENTlink written 9 days ago by A3.8k
1

I want to convert that to total number of mutations per giga base

Decide if you want to do that using the size of the genome or size of the sequenced data you have. Since this is targeted sequencing you are not sequencing the entire genome. Perhaps a 50 Mb chunk at 30x so a total of 1500 Mb of raw sequence.

ADD REPLYlink modified 2 days ago • written 9 days ago by genomax87k

Sorry what is the size of genome here?

Actually size is 2820 ?

ADD REPLYlink written 9 days ago by A3.8k

Thank you

I am not sure this is a target sequencing because they are saying whole genome sequencing and from previous paper on the same data they say

A single library was created for each sample, and 100-bp paired-end sequencing was performed under contracts by Illumina and the Broad Institute to a typical depth of at least 50x for tumors and 30x for matched normals, with 94% of the known genome being sequenced to at least 8x coverage and achieving a Phred quality of at least 30 for at least 80% of mapping bases.

So I should divide total number of mutations in each sample by 1500 or 2820?

ADD REPLYlink modified 9 days ago • written 9 days ago by A3.8k

Size of the whole genome is 2820 since your new para does not say anything about targeting a portion. But the total amount of bases sequenced is more than 8 (which is average) X 2820 = 22,560 Mb (you may know exactly how much if you have all the data in hand).

So you could xpress this value in two ways:

  1. mutations over size of the genome (technically 2820 is haploid size, sequencing was done for diploid genome).
  2. mutations over number of bases sequenced
ADD REPLYlink modified 8 days ago • written 8 days ago by genomax87k

@genomax and @RamRs

As described about my data

A single library was created for each sample, and 100-bp paired-end sequencing was performed under contracts by Illumina and the Broad Institute to a typical depth of at least 50x for tumors and 30x for matched normals, with 94% of the known genome being sequenced to at least 8x coverage and achieving a Phred quality of at least 30 for at least 80% of mapping bases.

94% of genome has been sequence so 3*0.94=2.826 mutation per giga base

But this is for whole cohort

In a sample wise manner, how I know mutation per giga base for each individual sample ? because coverage for each sample may be different within the cohort

Thank you for any thoughts

ADD REPLYlink modified 3 days ago • written 3 days ago by A3.8k
1

Like genomax said, it's your call to make.

ADD REPLYlink written 3 days ago by RamRS28k
1

But this is for whole cohort

Are you sure because your para says this:

A single library was created for each sample

so the coverage is per sample more than likely.

ADD REPLYlink written 3 days ago by genomax87k

2 points:

  1. You want total mutations per giga base but "people" say you should divide by total sequenced mega-base?
  2. 94% of known genome would be 94% of the length of the organism's genome, no? Would that give you the total number of bases?
ADD REPLYlink written 9 days ago by RamRS28k

I googled for giga base but I only found for mega base

I googled, for example, some exome-seq kits capture 50 mb of the genome, so is it true about my data too?

ADD REPLYlink written 9 days ago by A3.8k
1

giga is mega * 1000, so per-giga = 1/(mega * 1000) = 0.001 * per-mega. This is from a basic math perspective, I'm not sure if I'm missing something.

If genome size is 3000 MB (3GB), 94% of that is 2820 MB, or 2.82 GB. per-GB would be x/2.82, not x/2820.

ADD REPLYlink written 9 days ago by RamRS28k

I found this

if “94% of the known genome being sequenced to at least 8× coverage”, I should 3000 MB * 0.94 = 2820. I should be dividing by 2820.

For giga base does this differ?

ADD REPLYlink modified 9 days ago • written 9 days ago by A3.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1579 users visited in the last hour