Question: Generating GTF (exon, etc.) files for species not in UCSC genome browser
0
gravatar for Anna S
6.2 years ago by
Anna S500
PSU
Anna S500 wrote:

Hello,

I need to do RNA-seq comparison between a few species of fish that are not in the UCSC genome browser (bluefin tuna, red sea bream, and torafugu putter fish).  Normally, I'd first run tophat and obtain the reference files it requires (GTF and assembly for bowtie) from UCSC.  However, these fish species are not yet in UCSC as tuna for example has only recently been sequenced and I haven't yet found an assembly for Pagrus major (sea bream).  

I just got this project today and I'm trying to think of how I can make progress.  For example, I was thinking of collecting all the known information from NCBI and creating partial GTF and assembly files myself from these.  Do you know of a better way to go about it?

Thanks  a lot.

Anna

A new day, a totally new project, how can you not love bioinformatics ??

rna-seq assembly gtf • 1.7k views
ADD COMMENTlink modified 6.1 years ago • written 6.2 years ago by Anna S500

I think this problem requires de novo trascriptome assembly.  Any suggestions for what is the best such software for fish?

Thanks a lot.

Anna

ADD REPLYlink written 6.2 years ago by Anna S500
1
gravatar for Anna S
6.1 years ago by
Anna S500
PSU
Anna S500 wrote:

One solution is is to run RSEM and then edgeR after trinity de novo assembly as explained in pdf below.  It worked great for me!!!

ftp://ftp.broadinstitute.org/pub/users/bhaas/rnaseq_workshop/Trinity_workshop_activities.pdf

 

 

 

 

ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by Anna S500
2
gravatar for Jeff Stafford
6.2 years ago by
Canada
Jeff Stafford50 wrote:

The first step is aligning all of the reads. Look into "trans-abyss." From what I hear, its a pretty good de-novo aligner.

Cufflinks can perform de-novo transcriptome assemby and gives you a GTF for each sequencing replicate. Use cuffmerge to create a common GTF for each organism. If you want differential expression data, you feed the BAM files and merged GTF to something like Cuffdiff or HTSEQ-count/DESeq2. Cuffdiff isn't as precise as HTSEQ + DESeq2, but it's the only widely-used algorithm that calculates changes in splicing (something to keep in mind when choosing an expression counting algorithm).

You'll probably want to look into getting access to a supercomputer cluster for all of these steps and parallelize as much as possible (if you haven't done that already). Each of those steps will take days of computer time, even if you have a ton of cores to play with.

ADD COMMENTlink written 6.2 years ago by Jeff Stafford50

Thank you so much Jeff!   I ended up first running Trinity because of this paper (http://life.scichina.com:8082/sciCe/EN/abstract/abstract510015.shtml), and it has just finished successfully.  Once I figure out more, I'll add another comment here in case my findings are helpful to others.  Thanks again!

ADD REPLYlink written 6.2 years ago by Anna S500

Yeah no problem! Let me know how it all works out. I personally haven't tried a de-novo workflow yet (I work in Drosophila, so everythings pretty well annotated...), so curious to see how things go for you in case I ever need to try this myself.
 

ADD REPLYlink written 6.2 years ago by Jeff Stafford50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2282 users visited in the last hour