Entering edit mode
7 weeks ago
rayanelkholdi
•
0
Hello everyone,
I had a question about SMARTer Technology. I'm planning to use SMART-Seq Total RNA Pico Input with UMIs (ZapR Mammalian) where the very first step is fragmentation of RNA. In that case, when the fragmentation is done before the 1st PCR I don't understand how you can deduplicate the UMIs. If we fragment the RNA before the introduction of UMIs, I feel like different fragment of the same RNA molecule will have different UMIs leading to counting them as several original RNA molecules ? Or am I wrong ? How should I deduplicate RNA in that case ?
Thanks in advance !
UMI's are only going to be de-duplicated at the level of each fragmented RNA, which undergoes PCR. UMI's thus mark each individual fragment of RNA that was RT'ed. See Figure 2 in manual. Counting will be done with UMI-deduplicated fragments aligned to the reference.
Thank you for your answer ! So if for example I have one transcript of Protein A (so 1 RNA molecule) at the beginning and it gets fragmented into 4 fragments, each of these fragments will get a UMI and at the end I will have counted 4 transcripts of Protein A instead of 1?
Those 4 fragments will align to the gene for protein A and will be counted as 1 copy, assuming "gene" level summarization (which is what many use).
Depends how you do the qualification. If you were to use an EM quantifier, like RSEM or Salmon, then they'd probably all come out in the wash as one transcript, but remember if you that these tools compute relative, not absolutely transcript numbers. If you use straight forward counts based quantification then you'll get 4 counts. However, it's highly unlikely that if a transcript were fragmented into 4 pieces, that the sequencing would capture all 4. Most likely you'll only see one of the fragments in the data. I've always been sceptical of claims that UMIs allow you to calculate absolute transcript numbers.