Count Matrix from quant.genes.sf files
1
0
Entering edit mode
2.5 years ago
Abdullah ▴ 10

Hello everyone, I am having trouble understanding something and would appreciate any help or even a tutorial on this if someone can link it.

I got 20 Bulk RNA samples sequenced and the bioinformatics core gave me 20 quant.genes.sf files obtained through DRAGEN RNA pipeline. I need to generate a count matrix from these to use for a GSEA analysis. I have read tutorials on how to use tximport but those require quant.sf files or transcriptomic quantification

How can I use quant.genes.sf files to generate a count matrix ? It has the same columns as a quant.sf file e.g (Length, Effective Length, TPM, Num Reads)

Thank you !

Rstudio R RNASeq • 4.2k views
ADD COMMENT
0
Entering edit mode

What do you have in rows? Genes/transcripts? If that is the case then the "Num Reads" column is likely raw counts so you could use that column.

ADD REPLY
0
Entering edit mode

Thank you for your response

I am attaching a photo below for what it has.

This is what it is.

ADD REPLY
0
Entering edit mode

So the rows are genes but the `NumReads` may not be raw counts since they don't look like integers. Hard to tell from the screenshot.

Edit: As suggested by @cpad0112 below this seems to be standard output of salmon so proceed accordingly.

ADD REPLY
0
Entering edit mode

This tutorial may help you: https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html. They don't need transcript level information. Try with quant.genes.sf as per tutorial.

ADD REPLY
0
Entering edit mode

Thank you for your response.

I read through this. They revert to using quant.sf for their tutorial. Only mention quant.genes.sf once but no clear information on how to move forward with it. Any other guides or experience you have with this ?

ADD REPLY
0
Entering edit mode

Try txIn=FALSE while importing.

ADD REPLY
0
Entering edit mode

I tried this before posting my question. With the following code,

txi = tximport(files = test,
           type = "salmon",
           txIn = F,
           txOut = F,
           countsFromAbundance = "lengthScaledTPM",
           tx2gene = NULL,
           varReduce = FALSE,
           dropInfReps = FALSE,
           infRepStat = NULL,
           ignoreTxVersion = T,
           ignoreAfterBar = FALSE,
           geneIdCol = "Name",
           txIdCol = NULL,
           abundanceCol = "TPM",
           countsCol = "NumReads",
           lengthCol = "EffectiveLength",
           importer = NULL,
           existenceOptional = F,
           sparse = FALSE,
           sparseThreshold = 1,
           )

I get this message and I basically get a count matrix which has the same Counts as NumReads. Is the Count Matrix produced usable for GSEA and other downstream stuff ?

Warning message:
In computeRsemGeneLevel(files, importer, geneIdCol, abundanceCol,  :
countsFromAbundance other than 'no' requires transcript-level estimates
ADD REPLY
0
Entering edit mode

where is the code? Btw, did you try quantmerge from salmon?

ADD REPLY
0
Entering edit mode

Unfortunately I do not have access to Salmon. That is done by the bioinformatics core group. Also thank you for your help. Appreciate it!

ADD REPLY
0
Entering edit mode
2.5 years ago

I am not sure how you want to go from here. For ssGSEA (for each sample), TPMs should be fine. To import tximport object (txi here) into DESeq2 and do downstream analysis, follow this thread: https://github.com/COMBINE-lab/salmon/issues/581. Mike love posted the code how to import salmon tximport object into DESeq2 package. Since your data is already scaled at gene level and probably, gene lengths are same across samples, I think no data transformation is happening over there (here is nice explanation what happens when you do tximport: https://github.com/COMBINE-lab/salmon/issues/98#issuecomment-252635252) . Can you try removing countsFromAbundance = "lengthScaledTPM" and rerun the function?

ADD COMMENT
0
Entering edit mode

Hey thank you for this. You were right. To use it in ssGSEA, I can use the TPM values provided. I set the countsFromAbundance = "No" and then just used the table created to directly plug into ssGSEA. Thank you for your help !

ADD REPLY

Login before adding your answer.

Traffic: 1803 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6