Question: Help with genome guided de novo assembly in trinity
0
gravatar for vrea
3 months ago by
vrea0
vrea0 wrote:

Hi all,

I'm trying to run a genome-guided de novo assembly using trinity with fastq files from zebrafish RNA samples but I have never done an RNA-Seq experiment before.

In the documentation for trinity it says that you must create a coordinate-sorted BAM file for the reference genome using STAR or TopHat. I'm not sure how to go about doing this.. I have downloaded the file 'Danio_rerio.GRCz11.dna_rm.primary_assembly.fa' from Ensembl. Is this the right file to download? Also, what should I be doing with this file?

Any help is greatly appreciated!!!

Sincerely, Victoria

ADD COMMENTlink modified 3 months ago • written 3 months ago by vrea0

Why do you want to perform assembly for zebrafish? It has a well-annotated genome, it is unlikely you will get any improvements over the annotated genome.

In the documentation for trinity it says that you must create a coordinate-sorted BAM file for the reference genome using STAR or TopHat.

You will have to read the documentation from the tool you choose. I would consider STAR (very fast, but memory hungry) if you have about 30Gb RAM available, or HISAT2 or GSNAP if you have less memory.

I have downloaded the file 'Danio_rerio.GRCz11.dna_rm.primary_assembly.fa' from Ensembl.

I wouldn't use the hard repeat-masked assembly. Better is Danio_rerio.GRCz11.dna.primary_assembly.fa.

ADD REPLYlink written 3 months ago by h.mon29k

I have RNA samples from different treatment groups and I would like to assemble the transcriptomes in trinity and ultimately compare them. I was under the impression that it is better to do this along with a reference genome if the genome is well annotated (which I know zebrafish is) rather than do a de novo assembly. But in trinity it says I have to provide the read alignments of the reference genome in a coordinate-sorted BAM file in order to do so. Is this not the case? Thanks so much for your help!

ADD REPLYlink written 3 months ago by vrea0

Why do you want to do a denovo assembly of you RNA-seq reads which come from zebrafish which already has a well annotated genome as h.mon pointed out. Denovo transcriptome assembly is done for organisms whose genome information is not available.

ADD REPLYlink written 3 months ago by ashish320

You just need to map the reads to the genome and quantify counts for each gene - STAR can perform both steps, or you could use HISAT2 (or GSNAP, or Subread) + featureCounts.

Alternatively, you can use Salmon to quantify counts "mapping" to the transcriptome - this would be very fast.

ADD REPLYlink written 3 months ago by h.mon29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1968 users visited in the last hour