can salmon distinguish similar transcript?
19 months ago
arezu.lari • 0

I have a problem with salmon result and really appreciate if you help me, I used the salmon in the lightweight method for my RNA-seq dataset. A transcript with the length of 1848 in chromosome Y was significantly enriched. This transcript has similar in the other chromosome with more than 90 percent similarity. Even my female case have numbered for the transcript after using salmon. Is it possible that the count was for the other similar transcript? And how much is the specificity of salmon in K=31?

as far as I know salmon should be quite specific in it's "aligning", (I deduct this from the fact it's often used for isoform expression studies)

19 months ago
Rob 4.9k

Yes; in general the ability to distinguish between similar transcripts should be quite good. Of course, it depends on the exact read evidence. If the reads deriving from these transcripts overlap very few (or no) variants, then there is no specific evidence of the presence of one transcript versus the other, and so the abundance estimates may predict the presence of both. In general, there are a couple of things one might recommend trying. First, make sure you are using the most recent version of the software. Second, try generating posterior samples (e.g. by using --numGibbsSamples or --numBootstraps). This will allow you to assess the abundance of those transcripts as point estimates, but also will give you some notion of the uncertainty / confidence in those predictions. If there is simply, inherently, a lot of uncertainty about the abundance of these transcripts, then this should show up as spread / variance in the posterior distributions.

arezu.lari just that you know Rob is the developer of salmon.

Thank you very much Rob, the result with salmon was really better than HTseq and I was just stuck in that issue. The genes were pseudogenes of 18s rRNA, that most of them are very similar, so now for related Logfc and p.value, I can attribute in general to the pseudogenes, not a specific one.

thank you again, Rob, I have another question, In the salmon output, is there any file that shows the sequence of reads which mapped to the transcript? in the output_quant file, there are some log, txt, count, JSON file, but there have not such information. and also there are some other files that I do not know how to open them.

By default, the mapping information is not written out. If you run salmon with the --writeMappings flag, and give it a file to write the mappings to, then it will write a SAM file with the mappings it generated.

thanks a lot for your guidance.

And Rob, would you please tell me how can I search the specific transcript in that SAM file. I could not open it with IGV, and I just can view them with samtools but I can not search in it.

You cannot put it into a browser because the coordinates represent transcriptome, not genome coordinates. if there by now is any tool to make a quick conversion please share, I never felt the passion to really write one so far.

Thank you, your right. The error in IGV was because of the coordinate of the SAM file. So I can not find the sequence of the reads which map to interest transcript.