how to keep transcript ids
1
0
Entering edit mode
7.9 years ago
jolin0701-dy ▴ 100

I used tophat-cufflinks pipeline for RNA-seq analysis.

In the output file, there is only temporarily test ids such as XLOC_000001.

Since I use GRCh38 annotation file from NCBI and there are ensemble ids from gtf file.

How can I keep the transcript ids such as ensemble ids during cufflinks?

Or how can I retrieve ensemble ids from the output files of cuffdiff?

I did search and found this useful post from seqanswers:

http://seqanswers.com/forums/showthread.php?t=51071

I save this script as "m.py" run it. then it shows that

File "m.py", line 12 for line in fh: ^ IndentationError: expected an indented block

I am a newbie for python, anyone can help me ?

Thanks so much~~

RNA-Seq • 1.3k views
ADD COMMENT
0
Entering edit mode
7.9 years ago

The cuffdiff output should have "nearest gene" annotation, as an XLOC code can encompass more than one gene, it's a sliding window of sorts that arbitrarily* chooses sections of the genome to quantify against.

*It's probably not arbitrarily, it's just not documented and I haven't looked at the code.

ADD COMMENT

Login before adding your answer.

Traffic: 2283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6