Question: Salmon annotation of transcripts in quasi-mapping step
0
gravatar for Iara Souza
9 days ago by
Iara Souza0
Brazil
Iara Souza0 wrote:

I just started working with pseudoalignment tools and I now am trying salmon. I am trying to obtain a transcripts count table with RNA-seq samples.

I've built the index with transcriptome fasta provided by Ensembl, using the following command:

salmon index -t emsembl_human.fa -i ensembl_human

Then I used the quantification tool, with the following example:

salmon quant -p 12 -i ensembl_human --gcBias -o sample -1 sample_1.fa.gz -2 sample_2.fa.gz

After obtaining the count table, I've noticed that there are transcripts (with Ensembl Transcript ID) in my count table that are not present in the GTF file from the same source and version of the transcriptome. Many of those transcripts were already annotated by Ensembl and are present in Ensembl database if you make a quick query on their website. I may be missing something very obvious here, but I'm not understanding how salmon annotates those transcripts if this information is not present in the GTF. I am worried because I have about ~6k transcripts that only appear on the count table and are not in the GTF.

I would appreciate if someone clarifies this issue for me.

ADD COMMENTlink modified 9 days ago by Devon Ryan94k • written 9 days ago by Iara Souza0
2
gravatar for Devon Ryan
9 days ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:

Salmon doesn't annotate anything and never sees a GTF file. Those sequences are present in the fasta file you gave to salmon, so it's quantifying them. It sounds like you downloaded GTF and transcriptome fasta files from different Ensembl releases.

ADD COMMENTlink written 9 days ago by Devon Ryan94k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1540 users visited in the last hour