Question: Combine large number of RSEM outputs into a single matrix
0
gravatar for pr
3.6 years ago by
pr0
pr0 wrote:

Hello Biostars,

I used the STAR aligner to align a bunch of single-cell fastq files and then subsequently ran RSEM for quantification. Now I have 96 sets of RSEM outputs, and I want to combine all these (preferably TPM columns) into a single matrix.

I searched online, but could not find an easy way of doing it. The trinity denovo transcriptome building tool may have a perl script that is applicable, but then I didn't understand how to use it for this purpose. RSEM's "rsem-generate-data-matrix" seems simple to use, but then the filenames have to be manually given as inputs, which would be very cumbersome in my situation and the command wouldn't probably accept 96 files.

So, anybody here knows a better way to do this? Is there a ready-made tool for this? I would like to pick the TPMs, if possible.

Any help would be appreciated!

Thanks a lot.

PR

rna-seq • 2.5k views
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by pr0
 RSEM's "rsem-generate-data-matrix" seems simple to use, but then the filenames have to be manually given as inputs, which would be very cumbersome in my situation and the command wouldn't probably accept 96 files.

I don't know RSEM output format. If the "rsem-generate-data-matrix" does the job , then you need not worry about passing 96 files on command line if you are working on Linux, as the upper limit for maximum number of arguments is much higher. You can use wildcards to pass the files as arguments using some pattern in the filename. What does all your filenames look like?

ADD REPLYlink written 3.6 years ago by Santosh Anand5.1k

Sorry, posted my reply as an answer by mistake. Apologies.

ADD REPLYlink written 3.6 years ago by pr0

If the order is same for all 96 files you can loop over this command: paste <(print $2 $FILE_1) <(print $2 $FILE_2) $2 is column number with TPM values (you have to change accordingly) and $FILE is name of the file. The above example is for two files but you can loop over.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Avi70

Thanks for the reply. I actually ended up writing an R script to loop through, check for order, and make the matrix.

ADD REPLYlink written 3.5 years ago by pr0
0
gravatar for pr
3.6 years ago by
pr0
pr0 wrote:

Thanks for responding! RSEM outputs files of this format: http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html#output where sample_name varies. Yes, I am using a Linux cluster environment. What would be the best way to pass wildcard arguments?

Thanks again. PR

ADD COMMENTlink written 3.6 years ago by pr0

Sorry, posted my reply as an answer by mistake. Apologies.

ADD REPLYlink written 3.6 years ago by pr0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1977 users visited in the last hour