Question: Getting total sequenced megabases for each sample
0
A3.8k wrote:

Hello

I have total number of mutation per sample (whole genome sequencing)

I want to convert that to total number of mutations per giga base

People say I should divide the numbers of mutations by the total sequenced megabases for each sample

But I am not sure what is the total sequenced megabases for each sample here and how I can get that

About my sample I have this information

"Target sequencing depth was 50x coverage for tumours and 30x coverage for normal samples, with 94% of the known genome being sequenced to at least 8× coverage while achieving a PHRED quality of at least 30 for at least 80% of mapping bases"

Can you help me?

vcf genome • 94 views
written 9 days ago by A3.8k
1

I want to convert that to total number of mutations per giga base

Decide if you want to do that using the size of the genome or size of the sequenced data you have. Since this is targeted sequencing you are not sequencing the entire genome. Perhaps a 50 Mb chunk at 30x so a total of 1500 Mb of raw sequence.

Sorry what is the size of genome here?

Actually size is 2820 ?

Thank you

I am not sure this is a target sequencing because they are saying whole genome sequencing and from previous paper on the same data they say

A single library was created for each sample, and 100-bp paired-end sequencing was performed under contracts by Illumina and the Broad Institute to a typical depth of at least 50x for tumors and 30x for matched normals, with 94% of the known genome being sequenced to at least 8x coverage and achieving a Phred quality of at least 30 for at least 80% of mapping bases.

So I should divide total number of mutations in each sample by 1500 or 2820?

Size of the whole genome is 2820 since your new para does not say anything about targeting a portion. But the total amount of bases sequenced is more than 8 (which is average) X 2820 = 22,560 Mb (you may know exactly how much if you have all the data in hand).

So you could xpress this value in two ways:

1. mutations over size of the genome (technically 2820 is haploid size, sequencing was done for diploid genome).
2. mutations over number of bases sequenced

@genomax and @RamRs

A single library was created for each sample, and 100-bp paired-end sequencing was performed under contracts by Illumina and the Broad Institute to a typical depth of at least 50x for tumors and 30x for matched normals, with 94% of the known genome being sequenced to at least 8x coverage and achieving a Phred quality of at least 30 for at least 80% of mapping bases.

94% of genome has been sequence so 3*0.94=2.826 mutation per giga base

But this is for whole cohort

In a sample wise manner, how I know `mutation per giga base for each individual sample` ? because coverage for each sample may be different within the cohort

Thank you for any thoughts

1

Like genomax said, it's your call to make.

1

But this is for whole cohort

Are you sure because your para says this:

A single library was created for each sample

so the coverage is per sample more than likely.

2 points:

1. You want total mutations per giga base but "people" say you should divide by total sequenced mega-base?
2. 94% of known genome would be 94% of the length of the organism's genome, no? Would that give you the total number of bases?

I googled for giga base but I only found for mega base

I googled, for example, some exome-seq kits capture 50 mb of the genome, so is it true about my data too?

1

giga is mega * 1000, so per-giga = 1/(mega * 1000) = 0.001 * per-mega. This is from a basic math perspective, I'm not sure if I'm missing something.

If genome size is 3000 MB (3GB), 94% of that is 2820 MB, or 2.82 GB. per-GB would be x/2.82, not x/2820.

I found this

``````if “94% of the known genome being sequenced to at least 8× coverage”, I should 3000 MB * 0.94 = 2820. I should be dividing by 2820.
``````

For giga base does this differ?