Geneid Genes (geneid.txt.gz) is an older transcript predictor algorithm that is based on the genome sequence alone and only relevant when you are working on a particular locus where you think that the manually curated gene models (Ensembl and RefSeq) have errors.
UCSC RefSeq (refGene.txt.gz) is NCBI RNA reference sequences aligned against the human genome using the Blast-Like Alignment Tool of the UCSC Genome Browser. The track shows known human protein-coding and non-protein-coding genes.
You can use the Table Browser to extract information of start sites (TSS) protein-coding genes. For example, to query the UCSC RefSeq (refGene) on hg38, navigate to the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) and make the following selections:
Under Select dataset:
assembly: Dec. 2013 (GRCh38/hg38)
group: Genes and Gene Predictions
track: NCBI RefSeq
table: UCSC RefSeq (refGene)
Set the region: to “genome”
Click create next to “filter:”
On the “Filter on Fields from hg38.refGene” page, insert “cdsStart” next to cdsEnd is, change ignored to “!=” then click submit
Set the output format to “Selected fields from primary and related tables”. This will allow you to select fields of interest. Click get output
On the following page, scroll down to the Linked Tables section and select "hgFixed refLink" then click allow selection from checked tables
You can then select the following fields:
name Name of gene
chrom Reference sequence chromosome or scaffold
strand + or - for strand
txStart Transcription start position
protAcc protein accession
Click get output
This should display all the genes with their transcription start sites and protein accession numbers.
If you have any follow up questions, our public help desk can always be reached at firstname.lastname@example.org. You may also send questions to email@example.com if they contain sensitive data. For any Genome Browser questions on Biostars, the UCSC tag is the best way to ensure visibility by the team.