Question: RNA-seq de novo transcript assembly from one gene
0
gravatar for Floris Brenk
2.9 years ago by
Floris Brenk880
USA
Floris Brenk880 wrote:

Hi all,

I was wondering whether anyone has experience with de novo transcript assembly of RNA-seq (100 bp PE Illumina reads) of only one gene. We have about 50 RNA-seq library of human tissue and are at the moment only interested in one gene and want to know what all the expressed transcripts are of this gene. Are there specific packages/programs for this? or does anyone has some tips or ideas about this?

rna-seq next-gen assembly • 975 views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Floris Brenk880
2
gravatar for Rohit
2.9 years ago by
Rohit1.3k
California
Rohit1.3k wrote:

Is there any specific reason for opting denovo assembly but not reference alignment though you already know the gene.

Also if you think denovo would be better, you can go for a reference-guided trinity assembly using your gene of interest. If you think the isoforms are the most interesting, you can also do a complete transcriptome denovo assembly (no guide) and then check how your denovo transcript looks like.

ADD COMMENTlink written 2.9 years ago by Rohit1.3k

Thanks for your reply. The reason is when I look for this gene (and many others) in the Fantom5 transcription start site database there are many more potential transcription start sites than annotated. So this means that there might be more transcripts present then annotated in e.g. gencode or UCSC. For trinity would you recommend pooling all samples together initially for assembly or do it per library?

ADD REPLYlink written 2.9 years ago by Floris Brenk880

Pooling samples or not for denovo transcriptomes, this is one question with no obvious answer. By pooling you will get better continuity, but chances of mis-assemblies increase. Without pooling the low-level expressed transcripts will not be missed. Not pooling means more time and multiple runs. If you want to go for a quick and dirty approach why not pool the samples for a start, if you have enough computing power. If you reach the memory-limit, you might have to normalise the data.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Rohit1.3k

To prevent high computational work. Do you think I can also just start with the already aligned bam files and extract the reads there from the gene of interest and turn them into .fastq files and continue from there or would this be a to biased approach?

ADD REPLYlink written 2.9 years ago by Floris Brenk880

You could do that. There should be no bias since you have aligned to whole genome. Are you going to opt for "region" of interest rather than gene of interest (or you would just use the co-ordinates for the longest gene model?) if you suspect that there are additional transcription starts.

ADD REPLYlink written 2.9 years ago by genomax64k

Yes I was indeed thinking more about region of interest and based on some UCSC tracks and the mitranscriptome data.

ADD REPLYlink written 2.9 years ago by Floris Brenk880
1
gravatar for Antonio R. Franco
2.9 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.0k wrote:

I have recently run an de novo assembly of a tree genome using 2X100 paired end data from Illumina using Trinity

I did it to find the ortholog of a tomato gen of interest in the tree I am working

I used Trinity using the regular adjustment of the program.

In the first assembly, I got over 860.000 fasta contigs files, and after running a local blast using the tomato sequence as the query, I could find several fasta candidates for my gen of interest. To my surprise, one of this contigs had an extension of DNA over 3700 bases encoding for a protein with more than 1100 amino acids and a degree of homology at the level of the protein higher than 67%. This makes me confident that Trinity is working much better that I was expecting

I run Trinity in the Galaxy web service of the Indiana University. The kindly provide me with free access to it, and I very much recommend it

ADD COMMENTlink written 2.9 years ago by Antonio R. Franco4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1604 users visited in the last hour