I have a problem with salmon result and really appreciate if you help me, I used the salmon in the lightweight method for my RNA-seq dataset. A transcript with the length of 1848 in chromosome Y was significantly enriched. This transcript has similar in the other chromosome with more than 90 percent similarity. Even my female case have numbered for the transcript after using salmon. Is it possible that the count was for the other similar transcript? And how much is the specificity of salmon in K=31?
Yes; in general the ability to distinguish between similar transcripts should be quite good. Of course, it depends on the exact read evidence. If the reads deriving from these transcripts overlap very few (or no) variants, then there is no specific evidence of the presence of one transcript versus the other, and so the abundance estimates may predict the presence of both. In general, there are a couple of things one might recommend trying. First, make sure you are using the most recent version of the software. Second, try generating posterior samples (e.g. by using
--numBootstraps). This will allow you to assess the abundance of those transcripts as point estimates, but also will give you some notion of the uncertainty / confidence in those predictions. If there is simply, inherently, a lot of uncertainty about the abundance of these transcripts, then this should show up as spread / variance in the posterior distributions.