I just started working with pseudoalignment tools and I now am trying salmon. I am trying to obtain a transcripts count table with RNA-seq samples.
I've built the index with transcriptome fasta provided by Ensembl, using the following command:
salmon index -t emsembl_human.fa -i ensembl_human
Then I used the quantification tool, with the following example:
salmon quant -p 12 -i ensembl_human --gcBias -o sample -1 sample_1.fa.gz -2 sample_2.fa.gz
After obtaining the count table, I've noticed that there are transcripts (with Ensembl Transcript ID) in my count table that are not present in the GTF file from the same source and version of the transcriptome. Many of those transcripts were already annotated by Ensembl and are present in Ensembl database if you make a quick query on their website. I may be missing something very obvious here, but I'm not understanding how salmon annotates those transcripts if this information is not present in the GTF. I am worried because I have about ~6k transcripts that only appear on the count table and are not in the GTF.
I would appreciate if someone clarifies this issue for me.