Combine large number of RSEM outputs into a single matrix
1
0
Entering edit mode
7.5 years ago
pr • 0

Hello Biostars,

I used the STAR aligner to align a bunch of single-cell fastq files and then subsequently ran RSEM for quantification. Now I have 96 sets of RSEM outputs, and I want to combine all these (preferably TPM columns) into a single matrix.

I searched online, but could not find an easy way of doing it. The trinity denovo transcriptome building tool may have a perl script that is applicable, but then I didn't understand how to use it for this purpose. RSEM's "rsem-generate-data-matrix" seems simple to use, but then the filenames have to be manually given as inputs, which would be very cumbersome in my situation and the command wouldn't probably accept 96 files.

So, anybody here knows a better way to do this? Is there a ready-made tool for this? I would like to pick the TPMs, if possible.

Any help would be appreciated!

Thanks a lot.

PR

RNA-Seq • 5.5k views
ADD COMMENT
0
Entering edit mode
 RSEM's "rsem-generate-data-matrix" seems simple to use, but then the filenames have to be manually given as inputs, which would be very cumbersome in my situation and the command wouldn't probably accept 96 files.

I don't know RSEM output format. If the "rsem-generate-data-matrix" does the job , then you need not worry about passing 96 files on command line if you are working on Linux, as the upper limit for maximum number of arguments is much higher. You can use wildcards to pass the files as arguments using some pattern in the filename. What does all your filenames look like?

ADD REPLY
0
Entering edit mode

Sorry, posted my reply as an answer by mistake. Apologies.

ADD REPLY
0
Entering edit mode

If the order is same for all 96 files you can loop over this command: paste <(print $2 $FILE_1) <(print $2 $FILE_2) $2 is column number with TPM values (you have to change accordingly) and $FILE is name of the file. The above example is for two files but you can loop over.

ADD REPLY
0
Entering edit mode

Thanks for the reply. I actually ended up writing an R script to loop through, check for order, and make the matrix.

ADD REPLY
0
Entering edit mode
7.5 years ago
pr • 0

Thanks for responding! RSEM outputs files of this format: http://deweylab.biostat.wisc.edu/rsem/rsem-calculate-expression.html#output where sample_name varies. Yes, I am using a Linux cluster environment. What would be the best way to pass wildcard arguments?

Thanks again. PR

ADD COMMENT
0
Entering edit mode

Sorry, posted my reply as an answer by mistake. Apologies.

ADD REPLY

Login before adding your answer.

Traffic: 1634 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6