Question

Error in Transcript to gene level estimation using "Genesum" package

0

Entering edit mode

8.1 years ago

Pam ▴ 30

I am using Genesum package for estimating gene abundnace from transcript abundance. As input, I give sailfish generated expression file "Quant.sf" and gene annotation "GTF" file. But i get the following error

**Parsing input expression file
terminate called after throwing an instance of 'std::invalid_argument'
  what():  stoi
Aborted (core dumped)**

Can someone help me solve this issue ??

TIA

RNA-Seq • 2.5k views

ADD COMMENT • link updated 8.1 years ago by Rob 6.5k • written 8.1 years ago by Pam ▴ 30

score 2 · Accepted Answer · 2016-03-24

2

Entering edit mode

8.1 years ago

Rob 6.5k

Hi Pam,

The reason for this is that Sailfish (& Salmon) have since changed their default output format (actually, making them simpler to read with standard tsv parsing functions etc.), and I have not yet updated GeneSum to keep pace. However, I should say that I'd actually recommend tximport, which solves the same problem while offering some more options than GeneSum. Also, it's worth noting that Sailfish and Salmon now actually have built-in support for aggregating their expression estimates to the gene level (though I'd still probably recommend using tximport).

ADD COMMENT • link 8.1 years ago by Rob 6.5k

0

Entering edit mode

Hi Rob, Thanks for the calrification. Will try tximport. Thanks again.

ADD REPLY • link 8.1 years ago by Pam ▴ 30

0

Entering edit mode

Hi Rob, I have used tximport and now I have counts from each sample for DEG analysis. I need TPM values(summarised for gene) for downstream analysis like heatmaps etc. I enabled "scaledTPM" in tximport but i get in countsfromabundance column just "scaledTPM" string and not any values ??!!

ADD REPLY • link 8.0 years ago by Pam ▴ 30

2

Entering edit mode

Check ?tximport and scan to the section Value. This section describes the object that is returned by tximport().

The return object is a list which contains three matrices, one of which is "abundance", this is the TPM summarized to gene-level.

ADD REPLY • link 8.0 years ago by Michael Love ★ 2.6k

1

Entering edit mode

Hi Michael, Thanks a lot. I am sorry. Yes !! it is clearly mentioned in your paper :-).

ADD REPLY • link 8.0 years ago by Pam ▴ 30

1

Entering edit mode

Hi Pam,

Tx-import only generates counts as its output. The "countsfromabundance" field just described the method used to generate the counts from the input abundances. That said, given counts, computing TPM should be very simple. The "length" of a gene should be represented as the abundance-weighted combination of the lengths of its isoforms, and the count provided by Tx-import gives the read count for the gene. Perhaps Mike Love (the main tx-import author) might even have a function to perform this transformation and get back gene-level TPM. I'll point him here on twitter!