Question: Merging the output of salmon for RNA-seq counting with paired-end and single-end reads simultaneously
0
gravatar for nanoide
12 weeks ago by
nanoide30
nanoide30 wrote:

Hi all

So, I'm working with some RNA-seq raw reads, both paired end (2 x 150 bp) and single-end (1x75 bp). These are coming from the same samples, but sequenced differently in 2 rounds. I want to get counts for each at the trasncript level using salmon. For DEseq2 differential expression analysis at the gene level, I just took the raw counts for each sample (sample_1_seq1 and sample_1_seq2 and sum them. This was after checking there was correlation and the 2 experiments were similar. This way, I just got one count for each sample.

With salmon, I got the quant.sf file for each sample. I see inside the length, effectivelength, TPM and NumReads for each transcript. Does anyone have any idea on how could I get one single .sf file kind of "merging" sample_1_seq1 and sample_1_seq2?

Maybe I can provide salmon from the beginning with the 2 sets of reads somehow?

Thank you for your advice

ADD COMMENTlink written 12 weeks ago by nanoide30
2

There are a few ways to go about doing this technically. However, I would actually suggest _not_ merging the paired-end and single-end runs. Instead, I'd treat the sequencing protocol (SE vs PE) as a technical factor in the design matrix when you want to do differential testing in DESeq2.

ADD REPLYlink written 12 weeks ago by Rob3.6k
2

tend to agree with Rob here, however if you really insist to merge them I would propose to take the forward reads of the paired run (and only that one!!) and merge that with the SE one and run salmon using that combined input file. (not sure though what the influence of the diff read lengths will be)

ADD REPLYlink written 12 weeks ago by lieven.sterck6.2k
2

The thing here is that the experimental design is suboptimal and your analysis, if you really want to do it properly (at least what I think would be proper), is limited by the "weakest link of the chain" which is 1x75bp. I would therefore trim everything to 75bp, keeping only R1, followed by checking for potential other batch effects using something like PCA or MDS. The latter is probably not necessary if it is indeed the exact same pool of cDNA you sequenced.

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by ATpoint25k

Thank you all for useful suggestions

ADD REPLYlink written 12 weeks ago by nanoide30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1119 users visited in the last hour