Unstranded new transcript from Cuffmerge leads to error while RSEM prepares the reference
1
0
Entering edit mode
8.6 years ago
Thibault D. ▴ 700

Hi,

I am looking for a solution to the following situation :

If I use unstranded RNA-Seq data through Tophat, then Cufflinks, finally Cuffmerge, I have a new GTF file with a set of transcripts newly identified. As the input was unstranded, those new transcripts have no orientation in the merged GTF file (i.e. a point (".") on the 7th column instead of a plus ("+") or a minus ("-")).

However, the first step in the RSEM pipeline (labelled : "rsem-prepare-reference"), fails due to the point on the 7th column. It is waiting for either a plus or a minus, nothing else.

I use cufflinks/cuffmerge before RSEM, because RSEM does not detect new isoforms in RNA-Seq data.

Many thanks in advance for your advices,

Thibault

RNA-Seq RSEM Cuffmerge • 2.1k views
ADD COMMENT
1
Entering edit mode
8.6 years ago

You can just replace the "." with a "+" or "-". With unstranded data it won't matter which you use, in fact. Something like awk 'BEGIN{FS="\t"}{if($7==".") {$7 = "+"} print $0}' foo.gtf > foo.fixed.gtf should work (I don't recall if you need OFS="\t").

ADD COMMENT
0
Entering edit mode

Thank you for your answer, Devon

I thought it was not right to add it that way, and that it was the reason RSEM did not add a random strand by itself. I'll consider this option next time.

ADD REPLY

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6