Question

Generating GTF (exon, etc.) files for species not in UCSC genome browser

0

Entering edit mode

9.7 years ago

Anna S ▴ 510

Hello,

I need to do RNA-seq comparison between a few species of fish that are not in the UCSC genome browser (bluefin tuna, red sea bream, and torafugu putter fish). Normally, I'd first run tophat and obtain the reference files it requires (GTF and assembly for bowtie) from UCSC. However, these fish species are not yet in UCSC as tuna for example has only recently been sequenced and I haven't yet found an assembly for Pagrus major (sea bream).

I just got this project today and I'm trying to think of how I can make progress. For example, I was thinking of collecting all the known information from NCBI and creating partial GTF and assembly files myself from these. Do you know of a better way to go about it?

Thanks a lot.

Anna

A new day, a totally new project, how can you not love bioinformatics?

GTF RNA-Seq Assembly • 2.5k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by Anna S ▴ 510

0

Entering edit mode

I think this problem requires de novo trascriptome assembly. Any suggestions for what is the best such software for fish?

Thanks a lot.

Anna

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by Anna S ▴ 510

2

Entering edit mode

9.7 years ago

Jeff Stafford ▴ 50

The first step is aligning all of the reads. Look into "trans-abyss." From what I hear, its a pretty good de-novo aligner.

Cufflinks can perform de-novo transcriptome assemby and gives you a GTF for each sequencing replicate. Use cuffmerge to create a common GTF for each organism. If you want differential expression data, you feed the BAM files and merged GTF to something like Cuffdiff or HTSEQ-count/DESeq2. Cuffdiff isn't as precise as HTSEQ + DESeq2, but it's the only widely-used algorithm that calculates changes in splicing (something to keep in mind when choosing an expression counting algorithm).

You'll probably want to look into getting access to a supercomputer cluster for all of these steps and parallelize as much as possible (if you haven't done that already). Each of those steps will take days of computer time, even if you have a ton of cores to play with.

ADD COMMENT • link 9.7 years ago by Jeff Stafford ▴ 50

0

Entering edit mode

Thank you so much Jeff! I ended up first running Trinity because of this paper, and it has just finished successfully. Once I figure out more, I'll add another comment here in case my findings are helpful to others. Thanks again!

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by Anna S ▴ 510

0

Entering edit mode

Yeah no problem! Let me know how it all works out. I personally haven't tried a de-novo workflow yet (I work in Drosophila, so everything is pretty well annotated...), so curious to see how things go for you in case I ever need to try this myself.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by Jeff Stafford ▴ 50

Ram · Accepted Answer · 2014-09-05

1

Entering edit mode

9.6 years ago

Anna S ▴ 510

One solution is is to run RSEM and then edgeR after trinity de novo assembly as explained in this pdf. It worked great for me!!!

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by Anna S ▴ 510