While downloading the human proteome in fasta format from the uniprot site, I noticed that it was mentioned that there was one protein per sequence (20,594). However, above the protein count is mentioned (81,837) and this made me wonder. I need this file to interpret spectrums obtained from bottom up proteomics experiment. Doesn't this give a very bad representation of the proteins present? Additionally, how is it decided which sequence they display if alternative splicing occurs at a gene? Lastly, is there an alternative approach that searches the entire proteome rather than the gene-centered subset?

You could use "unreviewed" Human set (186K): https://www.uniprot.org/uniprotkb?facets=reviewed%3Afalse%2Cmodel_organism%3A9606&query=Human

Use Protein Existence filters in left column to trim this down (transcript level etc).

In the human proteome page, https://www.uniprot.org/proteomes/UP000005640, both protein count and gene count are provided. The gene count is only provided for reference proteomes, and is algorithmically computed: for each gene, a single representative protein sequence is chosen from the proteome. Where possible, reviewed (Swiss-Prot) protein sequences are chosen as the representatives. For more detail, I suggest you look at this help page: https://www.uniprot.org/help/gene_centric_isoform_mapping

There are use cases for both approaches - some users prefer seeing only one entry per gene, others prefer using the complete proteome set with potentially several entries per gene. The latter can be downloaded from the website by clicking on the "Protein count" link in https://www.uniprot.org/proteomes/UP000005640 - or directly at https://www.uniprot.org/uniprotkb?query=proteome:UP000005640

Please don't hesitate to contact the UniProt helpdesk if you have any additional questions.