I have a question regarding the use of salmon with the GRCh37 ensembl reference.
For my analysis I run salmon for the purpose of gene-quantification using the following reference transcriptome: ftp://ftp.ensembl.org/pub/grch37/release-88/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.cdna.all.fa.gz
In order to get gene-estimates I need to provide a mapping between Ensembl Transcript IDs and Ensembl Gene IDs (as a tabular file). I created this file using the GTF-File provided on the Ensembl FTP-server: ftp://ftp.ensembl.org/pub/grch37/release-88/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.gtf.gz
I noticed that 16723 Transcripts do not have a corresponding Ensembl Gene ID in the GTF-file. I believe this to be due to those transcript being added as patches to the GRCh37 later on. Using the Ensembl ID History Converter I can convert the Transcript IDs with missing Gene IDs to the corresponding Transcript IDs of the newer releases and then find their corresponding Gene IDs.
Now my question: Should I include in my analysis the transcripts that do not have a corresponding Gene ID in the original GTF-File? Or is it incorrect to include them, because I use information from 2 different Ensembl releases?
Note: I used GRCh37 and not the newest ensembl release to ensure comparability with other analyses I run.
Thanks in advance!