Hi,
This maybe a silly question, but I want to quantify the expression level of a specific transcript across many single-end fastq files. The transcript in question is a novel one, and I don't think its in the HG38 transcriptome. Using salmon
gives me the abundance of ALL transcripts in ONE fastq file. So,
Is there a way to quantify the expression level of a single transcript, across many fastq files (these are all different samples), other than running
salmon
on each fastq file and then searching for my transcript of interest?How can this be done for a transcript which is not in the reference transcriptome?
Thank you!
Thanks Lieven, that was quite helpful! Also, is there any advantage to using this approach to quantify the differential expression of this novel transcript, as compared to using genome-guided methods like Cufflinks etc? Especially given that this is in human
Yes and no, in general I'm pro aligning to the genome and then use FeatureCount or such to do the gene quantification (read counts) as in that case you are less biased and more accurate (in transcriptome some sequences or parts of, like UTRs, might still be missing) but given that this is human I would think both genome and transcriptome are on a similar quality level.
I would not use cufflinks personally, not only because it's kinda deprecated but also because you don't really need it in this case and will likely only create confusion and/or noise in your analysis. Main advantage of the salmon approach will be speed, Salmon runs quick quickly compared to other (true alignment based) approaches while still being very accurate.