Entering edit mode
10 months ago
kirillkirilenko
▴
40
I would like to estimate genome size using the popular tutorial on Jellyfish provided here. However, I only have RNA-seq data with high coverage. Is it possible to utilize this type of data to achieve my goal?"
No, the transcriptome is not the genome, and while some papers claim that most of the genome is transcribed at some low base level I doubt that a standard RNA-seq experiment is even close to picking that up. Exons are about 5% of the mouse/human genome. That might be a little different in other species, but it is save to assume that you will miss a lot of sequence content looking only at the transcriptome. Is it so difficult to do a Nanopore sequencing of your genome?
Yes, we have nanopore reads, and we are attempting to assemble the genome, but the resulting genome size is 160Mb, whereas we were expecting to obtain around 250Mb. It's possible that the issue lies in the fact that the reads turned out to be short (N50=3kb) with a coverage of approximately 40X. That's why I want to estimate the genome size.
What is the expectation based on?
Expectation is based on the size of closely related species' genomes. None of the closely related species have a genome size smaller than 200 megabases (Mb), typically around 250 Mb. By the way, a BUSCO score of 89% (Complete Dipters Genes) for an unpolished assembly.