I am trying to match Gencode's annotations to assemblies.
It is my understanding that the sequence of reference chromosomes changes only when there is a major version update (e.g. GRCh37 -> GRCh38). For minor versions (such as GRCh38.p2), patches (deltas between the major version and the new minor version) may be added (as well as haplotypes etc).
Gencode releases the following annotation: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_22/gencode.v22.chr_patch_hapl_scaff.annotation.gtf.gz
that matches the following assembly: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_22/GRCh38.p2.genome.fa.gz
If one doesn't want the patches, he can refer to the primary assembly: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_22/GRCh38.primary_assembly.genome.fa.gz
which matches the following annotation: ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_22/gencode.v22.primary_assembly.annotation.gtf.gz
But then, what is this annotation for? ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_22/gencode.v22.annotation.gtf.gz
According to the description, this annotation describes reference chromosomes only. So why isn't this suitable for the primary assembly?
Also, it is often suggested not to mix and match Ensembl or Gencode annotations with UCSC assemblies, but given that there are 1:1 matchings (such as hg38 = GRCh38) that should be doable, as long as one takes care that chromosome names follow the same convention.
Similarly, if we remove patches and alternate loci from GRCh38.p2, at that point wouldn't we get back to the primary assembly GRCh38, except for differences in scaffolds? Then, if our annotations of choice only describe reference chromosomes, then those annotations, originally meant for GRCh38, would also work fine with GRCh38.p2. Isn't that the case?
Thank you for your help!