Question: Get transcript sequence from RNA-seq
0
gravatar for colin.kern
2.6 years ago by
colin.kern200
United States
colin.kern200 wrote:

I have cufflinks output from tophat alignments and I want to get the sequences of the transcripts. I've been extracting the sequence from the reference genome, but I'm working in chicken where the reference genome is constructed from the wild type and I'm sequencing a very specialized breed, so I would really like to get the sequences of the transcripts from my RNA-seq data. I've searched around this site and other places and found some solutions like generating vcf files with samtools but they all seem geared towards just getting a single sequence, rather than thousands. I think using a loop with these methods will be extremely slow. Is there any quicker way to get the full set of transcript sequences predicted by cufflinks from the RNA-seq data?

rna-seq • 1.1k views
ADD COMMENTlink modified 2.6 years ago by natasha.sernova3.4k • written 2.6 years ago by colin.kern200

Read this description :

https://transdecoder.github.io/

And read these papers:

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-323

MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3789545/

These people worried about splicing:

http://www.cs.colostate.edu/~asa/pdfs/spliceGrapherXT.pdf#page=1&zoom=auto,-73,798

http://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0156132

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by natasha.sernova3.4k

I don't see how TransDecoder is useful here. It seems like it requires already having a fasta file of the transcript sequences, or if you input a gtf it extracts the sequences from the genome which is what I don't want to do.

RSEM is not suitable as I'm interesting in novel transcripts, and RSEM aligns to a known transcript set rather than the whole genome (unless I'm misunderstanding).

I am not sure MITIE is good for my purpose either. It says it will report a small set of optimal transcripts from a set of RNA-seq libraries, however I'm interested in finding novel transcripts, especially long non-coding RNA with a focus on tissue-specific transcripts. So I think MITIE would miss picking up many of those.

ADD REPLYlink written 2.6 years ago by colin.kern200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1070 users visited in the last hour