I want to use Kallisto for RNA-Seq data analysis. The species I am working is heterozygous and I want to study allelic expression. I would like to know how Kallisto handle identical transcript? Is it needed to generate non-redundant transcript dataset before running Kallisto?
I think I tried this a few years ago, and the outcome was that if two entries in the transcriptome share exactly the same sequence, they will get exactly the same expression value.
Anyway, it is kind of straightforward to generate a "dummy" test with two identical sequences in the fasta file (just duplicate one o the existing reference transcripts), run kallisto and inspect the output.
If there are identical transcripts, the reads will be distributed among the identical transcripts (e.g. if there are two identical transcripts and one read maps to them, each transcript will get a count of 0.5).
You can either simply add up the counts of the identical transcripts at the end or you can remove duplicated transcripts from your FASTA file before indexing.
correct, and as mentioned in my answer, this is because the quantification is divided among the two entries (e.g. 1 mapping read = each entry gets expression value 0.5).
Thank you very much for reply. I will try this.