Are there any major differences between the GRCh38 (NCBI) and hg38(UCSC) databases, aside from the fact that GRCh38 uses a 1-based coordinate system, while UCSC uses a 0-based coordinate system? Are there any pros/cons in using one vs the other? And, I am guessing that any identifier conversion software (e.g, BioMart) should choose one database over the other? Also, where does Ensembl come into play? Is the Ensembl database just a subset of the GRCh38 (NCBI) database? Any clarification would be greatly appreciated.
GRCh37/hg19 and GRCh38 are genome builds rather than annotations, which describe where features are in a given genome build. The actual sequences you'll get from NCBI/UCSC/Ensembl will be identical, but their annotations will be different and (importantly) updated at different frequencies. NCBI's annotation is the "refseq" dataset (the "refGene" track in UCSC), which is essentially a subset of the UCSC and Ensembl annotations. UCSC's annotations are kind of a mess. You'll find genes with the same ID on multiple strand and multiple chromosomes, which makes them a bit useless. Ensembl's annotations typically contain more features than UCSC (so a bit more noise), but they're otherwise much better put together (e.g., you'll never find a gene ID on different strand or different chromosomes) and their IDs are typically easier to map to other things (e.g., gene names, GO and pathway memberships). Ensembl also updates its annotation fairly often and versions everything nicely, so it's quite convenient to report what version you used in a paper (reproducibility is always a good thing). Given the choice, use the Ensembl annotation.
BTW, don't forget that the various sources can use different names for chromosomes (e.g., chr1 in UCSC is just 1 in Ensembl), so don't mix and match them.