Question: What's the difference between WGS and RefSeq databases?
2.7 years ago by
gbdias60 wrote:

I read the Refseq documentation in the NCBI handbook but it is still not clear to me. I'm aware WGS represents all assembled contigs from a sequencing project, and Refseq supposedly has some curation...

Does that mean WGS is more complete than Refseq (even if it includes a bunch of unnannotated features)?

ADD COMMENTlink modified 2.7 years ago by Denise - Open Targets4.7k • written 2.7 years ago by gbdias60
2.7 years ago by
UK, Hinxton, EMBL-EBI
Denise - Open Targets4.7k wrote:

I'd not think those things are comparable really, as they mean different things. Annotation is only possible when the sequences are available. RefSeq and others provides the annotation of these sequences (e.g. the Ensembl gene set), whether they are assembled or not (yet). The genomic sequence comes from Whole Genome Sequencing (WGS) experiments and we carry out the annotation of genes, transcripts, genetic variants, regulatory regions, etc.

Denise - Open Targets4.7k

Thank you for the explanation. What if I wanted to find all ERVs in a primate genome, for example. Knowing that most of these sequences are not annotated, the WGS is the option to go, right? I mean, the Refseq would not include non-annotated non-protein-coding sequences even if they are assembled in the WGS, would it?

gbdias60
