Hello, I am quite confused that from my perspective there were two versions of gene assemblies. First is from Genome Reference Consortium and second from Ensembl. Just recently I figured out that Ensembl is probably just copy sequence from Genome Reference Consortium release builds and does nothing with a sequence. Thus for example it is the same to download these data ftp://ftp.ensembl.org/pub/release-84/fasta/homo_sapiens/dna/ and these data ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000001405.31_GRCh38.p5/GCF_000001405.31_GRCh38.p5_assembly_structure/Primary_Assembly/assembled_chromosomes/FASTA/ The real difference is that Ensembl do some post processing and keeps data in sync with dbSNP information and other types of information maybe more clearly because it needs actually to use these data by its own tools which are more public.
However I would like to know more about gene build and gene annotation process. What are the steps it includes? Who are the people behind assembling sequence and annotations? What type of tools are they using? What public sources of funding are they consuming to perform their work? Are they performing de novo assemblies of Human and other genomes or are they only currating some sequencing results produced decades away?
I feel that we all are discussing different tools to work with their data but I really want to know more about these reference data and how it all emerge.