Hi I am a student from genomic research group,CVASU,Chittagong, Bangladesh.
We have a on going reference based genome assembly and annotation project. We have just received Total file size of 310.2 GB of FASTQ files. I have read in some publication and they mention
350 Gb of Raw data. i am confused is it giga base of giga byte??
i know 1 Gbp (giga base pairs) = 1,000,000,000 bp
we have total sequence :199893331
sequence length :40-100 in one part of a FASTQ file
should i convert it into Gbp?
how can i calculate genome coverage?
If you have a 100,000,000 bp genome and received 1,000,000,000 bp of sequence data (as a file of reads of equal or variable length) then you have 10x the number of total bases expected to be present in a single copy of the genome.
You can't claim/conclude to have 10x genome coverage since the distribution of reads is never going to be uniform (unless you pre-selected the reads to be so). In places it may be 0 (i.e. no coverage) while in other locations it could be 100x. You would not be able to determine this distribution until you align the reads to the reference (if you have one).
A gigabyte/gibibyte happens to be a measurement of data size in computer science. Depending on type and level of compression a gigabase of sequence will have variable file sizes (measured in some multiple by bytes, of which gigabyte happens to be one measure).
GB is gigabyte. Gb is Gigabase.
To avoid this confusion I write Gbp or Gbase.