Question

Unstranded new transcript from Cuffmerge leads to error while RSEM prepares the reference

0

Entering edit mode

8.6 years ago

Thibault D. ▴ 700

Hi,

I am looking for a solution to the following situation :

If I use unstranded RNA-Seq data through Tophat, then Cufflinks, finally Cuffmerge, I have a new GTF file with a set of transcripts newly identified. As the input was unstranded, those new transcripts have no orientation in the merged GTF file (i.e. a point (".") on the 7th column instead of a plus ("+") or a minus ("-")).

However, the first step in the RSEM pipeline (labelled : "rsem-prepare-reference"), fails due to the point on the 7th column. It is waiting for either a plus or a minus, nothing else.

I use cufflinks/cuffmerge before RSEM, because RSEM does not detect new isoforms in RNA-Seq data.

Many thanks in advance for your advices,

Thibault

RNA-Seq RSEM Cuffmerge • 2.1k views

ADD COMMENT • link updated 8.6 years ago by Devon Ryan 104k • written 8.6 years ago by Thibault D. ▴ 700

score 1 · Answer 1 · 2015-09-18

1

Entering edit mode

8.6 years ago

Devon Ryan 104k

You can just replace the "." with a "+" or "-". With unstranded data it won't matter which you use, in fact. Something like awk 'BEGIN{FS="\t"}{if($7==".") {$7 = "+"} print $0}' foo.gtf > foo.fixed.gtf should work (I don't recall if you need OFS="\t").

ADD COMMENT • link 8.6 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you for your answer, Devon

I thought it was not right to add it that way, and that it was the reason RSEM did not add a random strand by itself. I'll consider this option next time.

ADD REPLY • link 8.6 years ago by Thibault D. ▴ 700