Question

how to keep transcript ids

0

Entering edit mode

7.9 years ago

jolin0701-dy ▴ 100

I used tophat-cufflinks pipeline for RNA-seq analysis.

In the output file, there is only temporarily test ids such as XLOC_000001.

Since I use GRCh38 annotation file from NCBI and there are ensemble ids from gtf file.

How can I keep the transcript ids such as ensemble ids during cufflinks?

Or how can I retrieve ensemble ids from the output files of cuffdiff?

I did search and found this useful post from seqanswers:

http://seqanswers.com/forums/showthread.php?t=51071

I save this script as "m.py" run it. then it shows that

File "m.py", line 12 for line in fh: ^ IndentationError: expected an indented block

I am a newbie for python, anyone can help me ?

Thanks so much~~

RNA-Seq • 1.3k views

ADD COMMENT • link 7.9 years ago by jolin0701-dy ▴ 100

score 0 · Answer 1 · 2016-06-06

0

Entering edit mode

7.9 years ago

andrew.j.skelton73 6.5k

The cuffdiff output should have "nearest gene" annotation, as an XLOC code can encompass more than one gene, it's a sliding window of sorts that arbitrarily* chooses sections of the genome to quantify against.

*It's probably not arbitrarily, it's just not documented and I haven't looked at the code.

ADD COMMENT • link 7.9 years ago by andrew.j.skelton73 6.5k