Question

Getting total sequenced megabases for each sample

0

Entering edit mode

3.7 years ago

zizigolu ★ 4.3k

Hello

I have total number of mutation per sample (whole genome sequencing)

I want to convert that to total number of mutations per giga base

I googled for that

People say I should divide the numbers of mutations by the total sequenced megabases for each sample

But I am not sure what is the total sequenced megabases for each sample here and how I can get that

About my sample I have this information

"Target sequencing depth was 50x coverage for tumours and 30x coverage for normal samples, with 94% of the known genome being sequenced to at least 8× coverage while achieving a PHRED quality of at least 30 for at least 80% of mapping bases"

Can you help me?

genome vcf • 1.3k views

ADD COMMENT • link 3.7 years ago by zizigolu ★ 4.3k

1

Entering edit mode

I want to convert that to total number of mutations per giga base

Decide if you want to do that using the size of the genome or size of the sequenced data you have. Since this is targeted sequencing you are not sequencing the entire genome. Perhaps a 50 Mb chunk at 30x so a total of 1500 Mb of raw sequence.

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

Sorry what is the size of genome here?

Actually size is 2820 ?

ADD REPLY • link 3.7 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Thank you

I am not sure this is a target sequencing because they are saying whole genome sequencing and from previous paper on the same data they say

A single library was created for each sample, and 100-bp paired-end sequencing was performed under contracts by Illumina and the Broad Institute to a typical depth of at least 50x for tumors and 30x for matched normals, with 94% of the known genome being sequenced to at least 8x coverage and achieving a Phred quality of at least 30 for at least 80% of mapping bases.

So I should divide total number of mutations in each sample by 1500 or 2820?

ADD REPLY • link 3.7 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Size of the whole genome is 2820 since your new para does not say anything about targeting a portion. But the total amount of bases sequenced is more than 8 (which is average) X 2820 = 22,560 Mb (you may know exactly how much if you have all the data in hand).

So you could xpress this value in two ways:

mutations over size of the genome (technically 2820 is haploid size, sequencing was done for diploid genome).
mutations over number of bases sequenced

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

@genomax and @RamRs

As described about my data

A single library was created for each sample, and 100-bp paired-end sequencing was performed under contracts by Illumina and the Broad Institute to a typical depth of at least 50x for tumors and 30x for matched normals, with 94% of the known genome being sequenced to at least 8x coverage and achieving a Phred quality of at least 30 for at least 80% of mapping bases.

94% of genome has been sequence so 3*0.94=2.826 mutation per giga base

But this is for whole cohort

In a sample wise manner, how I know mutation per giga base for each individual sample ? because coverage for each sample may be different within the cohort

Thank you for any thoughts

ADD REPLY • link 3.7 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Like genomax said, it's your call to make.

ADD REPLY • link 3.7 years ago by Ram 43k

1

Entering edit mode

But this is for whole cohort

Are you sure because your para says this:

A single library was created for each sample

so the coverage is per sample more than likely.

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

2 points:

You want total mutations per giga base but "people" say you should divide by total sequenced mega-base?
94% of known genome would be 94% of the length of the organism's genome, no? Would that give you the total number of bases?

ADD REPLY • link 3.7 years ago by Ram 43k

0

Entering edit mode

I googled for giga base but I only found for mega base

I googled, for example, some exome-seq kits capture 50 mb of the genome, so is it true about my data too?

ADD REPLY • link 3.7 years ago by zizigolu ★ 4.3k

1

Entering edit mode

giga is mega * 1000, so per-giga = 1/(mega * 1000) = 0.001 * per-mega. This is from a basic math perspective, I'm not sure if I'm missing something.

If genome size is 3000 MB (3GB), 94% of that is 2820 MB, or 2.82 GB. per-GB would be x/2.82, not x/2820.

ADD REPLY • link 3.7 years ago by Ram 43k

0

Entering edit mode

I found this

if “94% of the known genome being sequenced to at least 8× coverage”, I should 3000 MB * 0.94 = 2820. I should be dividing by 2820.

For giga base does this differ?

ADD REPLY • link 3.7 years ago by zizigolu ★ 4.3k