Hello everyone, I am having trouble understanding something and would appreciate any help or even a tutorial on this if someone can link it.
I got 20 Bulk RNA samples sequenced and the bioinformatics core gave me 20 quant.genes.sf files obtained through DRAGEN RNA pipeline. I need to generate a count matrix from these to use for a GSEA analysis. I have read tutorials on how to use tximport but those require quant.sf files or transcriptomic quantification
How can I use quant.genes.sf files to generate a count matrix ? It has the same columns as a quant.sf file e.g (Length, Effective Length, TPM, Num Reads)
Thank you !
What do you have in rows? Genes/transcripts? If that is the case then the "Num Reads" column is likely raw counts so you could use that column.
Thank you for your response
I am attaching a photo below for what it has.
So the rows are genes but the `NumReads` may not be raw counts since they don't look like integers. Hard to tell from the screenshot.Edit: As suggested by @cpad0112 below this seems to be standard output of
salmon
so proceed accordingly.This tutorial may help you: https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html. They don't need transcript level information. Try with quant.genes.sf as per tutorial.
Thank you for your response.
I read through this. They revert to using quant.sf for their tutorial. Only mention quant.genes.sf once but no clear information on how to move forward with it. Any other guides or experience you have with this ?
Try
txIn=FALSE
while importing.I tried this before posting my question. With the following code,
I get this message and I basically get a count matrix which has the same Counts as NumReads. Is the Count Matrix produced usable for GSEA and other downstream stuff ?
where is the code? Btw, did you try quantmerge from salmon?
Unfortunately I do not have access to Salmon. That is done by the bioinformatics core group. Also thank you for your help. Appreciate it!