Question

Measure expression level of a single (novel) transcript in multiple fastq files

0

Entering edit mode

6.1 years ago

c_u ▴ 530

Hi,

This maybe a silly question, but I want to quantify the expression level of a specific transcript across many single-end fastq files. The transcript in question is a novel one, and I don't think its in the HG38 transcriptome. Using salmon gives me the abundance of ALL transcripts in ONE fastq file. So,

Is there a way to quantify the expression level of a single transcript, across many fastq files (these are all different samples), other than running salmon on each fastq file and then searching for my transcript of interest?
How can this be done for a transcript which is not in the reference transcriptome?

Thank you!

RNA-Seq • 1.4k views

ADD COMMENT • link 6.1 years ago by c_u ▴ 530

score 7 · Accepted Answer · 2019-06-18

7

Entering edit mode

6.1 years ago

lieven.sterck 15k

not sure what you are hinting at, but simply add the novel transcript to the reference transcriptome, run salmon (yes , once per fastq file but that should not take that long) and extract the data you want.

this way you are not only avoiding introducing a bias towards your novel transcript (in case you only use that one as reference), but it's a simple straightforward well established approach, so no worries about the approach there.

ADD COMMENT • link 6.1 years ago by lieven.sterck 15k

0

Entering edit mode

Thanks Lieven, that was quite helpful! Also, is there any advantage to using this approach to quantify the differential expression of this novel transcript, as compared to using genome-guided methods like Cufflinks etc? Especially given that this is in human

ADD REPLY • link 6.1 years ago by c_u ▴ 530

1

Entering edit mode

Yes and no, in general I'm pro aligning to the genome and then use FeatureCount or such to do the gene quantification (read counts) as in that case you are less biased and more accurate (in transcriptome some sequences or parts of, like UTRs, might still be missing) but given that this is human I would think both genome and transcriptome are on a similar quality level.

I would not use cufflinks personally, not only because it's kinda deprecated but also because you don't really need it in this case and will likely only create confusion and/or noise in your analysis. Main advantage of the salmon approach will be speed, Salmon runs quick quickly compared to other (true alignment based) approaches while still being very accurate.

ADD REPLY • link 6.1 years ago by lieven.sterck 15k