Hi All,
I am wondering if anyone estimated transcript-level counts from plate based scRNA datasets like smart-seq that capture full length transcripts instead of 3' ends. I tried to use salmon but the estimated counts/TPMs seems to be highly inflated.
For example, for this transcript, the output from a single cell looks like this.
Name Length EffectiveLength TPM NumReads
ENST00000344843.7 721 250 247.016 471.945
When I looked the same cell bam file in IGV, that transcript has only around 50 reads mapped. Salmon NumReads
is 471. This happens for a lot of transcripts.
I am wondering why the values are inflated ? One potential reason could be due to the default scaling factor used, as for bulk-rna where the total counts tend to be in millions.
I would like to know if I anyone estimated the transcript counts from scRNA before. There are tools like Alevin but they seem to work only with 3' enriched droplet based methods.
Thanks,
Goutham A
You scale goes up to 51, so you have up to 51 at any one position. If you have 2 non-overlapping reads, your maximum coverage will be 1, but you actually have 2 counts. Even if you look at just the first exon, you'll see that there are many non-overlapping reads, so you should have more than 50 counts for that exon alone.
Thanks. I missed that logic of independent read counts.