Error in Transcript to gene level estimation using "Genesum" package
1
0
Entering edit mode
6.3 years ago
Pam ▴ 30

I am using Genesum package for estimating gene abundnace from transcript abundance. As input, I give sailfish generated expression file "Quant.sf" and gene annotation "GTF" file. But i get the following error

**Parsing input expression file
terminate called after throwing an instance of 'std::invalid_argument'
what():  stoi
Aborted (core dumped)**


Can someone help me solve this issue ??

TIA

RNA-Seq • 2.1k views
2
Entering edit mode
6.3 years ago
Rob 5.3k

Hi Pam,

The reason for this is that Sailfish (& Salmon) have since changed their default output format (actually, making them simpler to read with standard tsv parsing functions etc.), and I have not yet updated GeneSum to keep pace. However, I should say that I'd actually recommend tximport, which solves the same problem while offering some more options than GeneSum. Also, it's worth noting that Sailfish and Salmon now actually have built-in support for aggregating their expression estimates to the gene level (though I'd still probably recommend using tximport).

0
Entering edit mode

Hi Rob, Thanks for the calrification. Will try tximport. Thanks again.

0
Entering edit mode

Hi Rob, I have used tximport and now I have counts from each sample for DEG analysis. I need TPM values(summarised for gene) for downstream analysis like heatmaps etc. I enabled "scaledTPM" in tximport but i get in countsfromabundance column just "scaledTPM" string and not any values ??!!

2
Entering edit mode

Check ?tximport and scan to the section Value. This section describes the object that is returned by tximport().

The return object is a list which contains three matrices, one of which is "abundance", this is the TPM summarized to gene-level.

1
Entering edit mode

Hi Michael, Thanks a lot. I am sorry. Yes !! it is clearly mentioned in your paper :-).

1
Entering edit mode

Hi Pam,

Tx-import only generates counts as its output. The "countsfromabundance" field just described the method used to generate the counts from the input abundances. That said, given counts, computing TPM should be very simple. The "length" of a gene should be represented as the abundance-weighted combination of the lengths of its isoforms, and the count provided by Tx-import gives the read count for the gene. Perhaps Mike Love (the main tx-import author) might even have a function to perform this transformation and get back gene-level TPM. I'll point him here on twitter!

1
Entering edit mode

Thanks Rob for your clarification and pointing this to Michael.