Entering edit mode
5.0 years ago
vrea
•
0
Hi all,
I'm trying to run a genome-guided de novo assembly using trinity with fastq files from zebrafish RNA samples but I have never done an RNA-Seq experiment before.
In the documentation for trinity it says that you must create a coordinate-sorted BAM file for the reference genome using STAR or TopHat. I'm not sure how to go about doing this.. I have downloaded the file 'Danio_rerio.GRCz11.dna_rm.primary_assembly.fa' from Ensembl. Is this the right file to download? Also, what should I be doing with this file?
Any help is greatly appreciated!!!
Sincerely, Victoria
Why do you want to perform assembly for zebrafish? It has a well-annotated genome, it is unlikely you will get any improvements over the annotated genome.
You will have to read the documentation from the tool you choose. I would consider STAR (very fast, but memory hungry) if you have about 30Gb RAM available, or HISAT2 or GSNAP if you have less memory.
I wouldn't use the hard repeat-masked assembly. Better is Danio_rerio.GRCz11.dna.primary_assembly.fa.
I have RNA samples from different treatment groups and I would like to assemble the transcriptomes in trinity and ultimately compare them. I was under the impression that it is better to do this along with a reference genome if the genome is well annotated (which I know zebrafish is) rather than do a de novo assembly. But in trinity it says I have to provide the read alignments of the reference genome in a coordinate-sorted BAM file in order to do so. Is this not the case? Thanks so much for your help!
Why do you want to do a denovo assembly of you RNA-seq reads which come from zebrafish which already has a well annotated genome as h.mon pointed out. Denovo transcriptome assembly is done for organisms whose genome information is not available.
You just need to map the reads to the genome and quantify counts for each gene - STAR can perform both steps, or you could use HISAT2 (or GSNAP, or Subread) + featureCounts.
Alternatively, you can use Salmon to quantify counts "mapping" to the transcriptome - this would be very fast.