Hello
I have total number of mutation per sample (whole genome sequencing)
I want to convert that to total number of mutations per giga base
I googled for that
People say I should divide the numbers of mutations by the total sequenced megabases for each sample
But I am not sure what is the total sequenced megabases for each sample here and how I can get that
About my sample I have this information
"Target sequencing depth was 50x coverage for tumours and 30x coverage for normal samples, with 94% of the known genome being sequenced to at least 8× coverage while achieving a PHRED quality of at least 30 for at least 80% of mapping bases"
Can you help me?
Decide if you want to do that using the size of the genome or size of the sequenced data you have. Since this is targeted sequencing you are not sequencing the entire genome. Perhaps a 50 Mb chunk at 30x so a total of 1500 Mb of raw sequence.
Sorry what is the size of genome here?
Actually size is 2820 ?
Thank you
I am not sure this is a target sequencing because they are saying whole genome sequencing and from previous paper on the same data they say
A single library was created for each sample, and 100-bp paired-end sequencing was performed under contracts by Illumina and the Broad Institute to a typical depth of at least 50x for tumors and 30x for matched normals, with 94% of the known genome being sequenced to at least 8x coverage and achieving a Phred quality of at least 30 for at least 80% of mapping bases.
So I should divide total number of mutations in each sample by 1500 or 2820?
Size of the whole genome is 2820 since your new para does not say anything about targeting a portion. But the total amount of bases sequenced is more than 8 (which is average) X 2820 = 22,560 Mb (you may know exactly how much if you have all the data in hand).
So you could xpress this value in two ways:
@genomax and @RamRs
As described about my data
A single library was created for each sample, and 100-bp paired-end sequencing was performed under contracts by Illumina and the Broad Institute to a typical depth of at least 50x for tumors and 30x for matched normals, with 94% of the known genome being sequenced to at least 8x coverage and achieving a Phred quality of at least 30 for at least 80% of mapping bases.
94% of genome has been sequence so 3*0.94=2.826 mutation per giga base
But this is for whole cohort
In a sample wise manner, how I know
mutation per giga base for each individual sample
? because coverage for each sample may be different within the cohortThank you for any thoughts
Like genomax said, it's your call to make.
Are you sure because your para says this:
so the coverage is per sample more than likely.
2 points:
I googled for giga base but I only found for mega base
I googled, for example, some exome-seq kits capture 50 mb of the genome, so is it true about my data too?
giga is mega * 1000, so per-giga = 1/(mega * 1000) = 0.001 * per-mega. This is from a basic math perspective, I'm not sure if I'm missing something.
If genome size is 3000 MB (3GB), 94% of that is 2820 MB, or 2.82 GB. per-GB would be x/2.82, not x/2820.
I found this
For giga base does this differ?