Hi I am a student from genomic research group,CVASU,Chittagong, Bangladesh. We have a on going reference based genome assembly and annotation project. We have just received Total file size of 310.2 GB of FASTQ files. I have read in some publication and they mention 350 Gb of Raw data. i am confused is it giga base of giga byte?? i know 1 Gbp (giga base pairs) = 1,000,000,000 bp we have total sequence :199893331 sequence length :40-100 in one part of a FASTQ file should i convert it into Gbp? how can i calculate genome coverage?
If you have a 100,000,000 bp genome and received 1,000,000,000 bp of sequence data (as a file of reads of equal or variable length) then you have 10x the number of total bases expected to be present in a single copy of the genome.
You can't claim/conclude to have 10x genome coverage since the distribution of reads is never going to be uniform (unless you pre-selected the reads to be so). In places it may be 0 (i.e. no coverage) while in other locations it could be 100x. You would not be able to determine this distribution until you align the reads to the reference (if you have one).
A gigabyte/gibibyte happens to be a measurement of data size in computer science. Depending on type and level of compression a gigabase of sequence will have variable file sizes (measured in some multiple by bytes, of which gigabyte happens to be one measure).