Question: Can I use own transcriptome without GTF file?
0
gravatar for jhkim1972
3.9 years ago by
jhkim19720
Canada
jhkim19720 wrote:

I am a novice of RNA-Seq analysis. For differentially expressed genes anaysis, I am trying to run bowtie/tophat and cufflinks pipeline with own transcriptome as a reference. Can I run with only the transcriptome without GTF file? My species is salmon that there is no genome for GTF file. Is the GTF file essential for this pipeline? 

Thank you in advance for any idea and suggestion for this fundimental question.

rna-seq • 2.8k views
ADD COMMENTlink modified 3.9 years ago by cyril-cros890 • written 3.9 years ago by jhkim19720
8
gravatar for Devon Ryan
3.9 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

Bowtie will work fine here, but cufflinks will not. If all you have to align against is a transcriptome then I would recommend doing the following:

  1. Align against the transcriptome with Hisat, bowtie2, or one of the many many other aligners.
  2. Use eXpress, RSEM, or one of the many similar tools to get estimated per-transcript counts. You could alternatively use Kallisto or Salmon for both this and the previous step (I'd recommend using Salmon simply because using software with the same name as the organism you're working on should get you a "+1 nicely played" from reviewers).
  3. Use these estimated counts in limma or edgeR or a similar program (not DESeq2, it won't allow this) to get differentially expressed transcripts. If you know (or at least can make a good guess) which transcripts belong together as a gene then you can sum their estimated counts and do gene-level differential expression.
ADD COMMENTlink written 3.9 years ago by Devon Ryan90k

wondering why DESeq2 cannot be used for this purpose..Can you explain why ?. Because you don't have gene names ?

ADD REPLYlink written 3.9 years ago by Antonio R. Franco4.0k

It won't accept non-integer counts. That's the only reason.

ADD REPLYlink written 3.9 years ago by Devon Ryan90k

Dear Devon Ryan, Thank you for your brilliant suggestion. I will do.

ADD REPLYlink written 3.9 years ago by jjinhyoungkim0

thank you does RSEM accepts hisat2 index?

ADD REPLYlink modified 6 months ago • written 6 months ago by Shahzad10
4
gravatar for cyril-cros
3.9 years ago by
cyril-cros890
France
cyril-cros890 wrote:

Just read on the Salmon genome sequencing efforts, you are correct there is no annotated genome yet, and this will be annoying.

You may need to use a de novo transcriptome (created with Trinity Oases, transAbyss), after aligning your RNASeq reads. I also believe Cufflinks (http://cole-trapnell-lab.github.io/cufflinks/cufflinks/index.html) can work without an annotation (confirmation, anyone?). Anyway, this step should yield a transcripts.fa file with the genomic sequences of the transcripts you want to quantify.

Use Devon Ryan's advice for finding differentially expressed transcripts. You will then need to identify your differentially expressed transcripts in order to match them to a metabolic process.

This article http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4368115/ is about the Rainbow Trout which is also a member of the Salmonidiae. It might be of interest to you in terms of material/methods, and as a relatively close species.

 

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by cyril-cros890

While cufflinks can run without a reference annotation, it wouldn't produce meaningful results when run on data aligned against a transcriptome. Cufflinks (the same holds for stringTie) is only useful when you feed it alignments in genomic coordinates. For transcriptomic coordinates, you already know where all of the transcripts are, they're each entry in the fasta file.

ADD REPLYlink written 3.9 years ago by Devon Ryan90k

Thanks for the clarification. I have just got one last question: once you have aligned your reads, you don't have immediate access to the transcript sequences (the multi fasta file required by eXpress or Salmon - +1 indeed for the pun), no?
I don't see how you can get from the alignment step to the quantifying step without using a de novo alignment - but I am no expert...

ADD REPLYlink written 3.9 years ago by cyril-cros890

jhkim1972 mentioned aligning against a transcriptome, so I assume he/she either downloaded an assembled transcriptome or already assembled one (e.g., with Trinity). Otherwise, yeah, the first step would be assembly. I think Trinity comes with some instructions for the whole assembly->DE transcripts process.

ADD REPLYlink written 3.9 years ago by Devon Ryan90k

My bad, missed the 'own transcriptome' part. I am now unsure what the author means by 'no genome for gtf file'.

ADD REPLYlink written 3.9 years ago by cyril-cros890

What OP meant to write was, "there's not yet a reference or assembled genome and, thus, also no good annotation (e.g., GTF file) yet."
 

ADD REPLYlink written 3.9 years ago by Devon Ryan90k

Exactly, I have already got a de novo transcriptome by Trinity.  And I want to use this transcriptome as a reference for my RNA-Seq data. However, as Devon Ryan said, cufflinks seemed to need assembled genome (GTF file). The study of salmonid genome is difficult because of a whole genome duplication (WGD). Anyway, thanks all for the useful comment.
 

ADD REPLYlink written 3.9 years ago by jjinhyoungkim0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 766 users visited in the last hour