I am relatively new to analysis of whole transcriptome RNA sequencing data. I am planning to map human RNA sequencing reads against the reference human genome/transcriptome (i.e., generate BAM files from fastq files).
I notice that reference genome assemblies are available from a number of sources: UCSC (currently as hg19), Ensembl (currently as GRCh37.73), 1000 Genome project (currently as v37), etc. All of these releases seem to be based on Genome Research Consortium's GRCh37 release.
(1) What are the differences between such different genome assemblies?
(2) What are the differences between the different releases from Ensembl (e.g., GRCh37.70 vs .71)?
(3) For my purpose, aligning raw reads to obtain gene expression data for differential expression analysis, does it matter if one used a particular GRCh37-based reference assembly for a group of samples, and, in the future, for another group of samples used a different GRCh37-based assembly (either a different source or the same source but a different release)?
(4) Finally, can I use the reference genome assembly from one source or release and a gene annotation file from another source or release as long as they all are based on GRCh37?