Question: Please suggest an appropriate genome-guided transcriptome assembler
1
gravatar for seta
27 days ago by
seta1000
Sweden
seta1000 wrote:

Dear all,

I have RNA-seq data generated by Illumina Hiseq 2000 as 100bp, PE from human, control and diseased samples. I'm looking for polymorphic simple sequence marker (SSR) between two groups of control and disease. I'm going to do genome-guided transcriptome assembly for each group, then survey the probable polymorphic marker between them. For genome-guided transcriptome assembly, I know about cufflinks and stringtie, but as I found here some people suggested to avoid using them. Could you please kindly suggest me the appropriate tool for this purpose?

Any other comments on the issue would be highly appreciated.

Thanks

rna-seq alignment marker genome • 187 views
ADD COMMENTlink modified 26 days ago by Kevin Blighe23k • written 27 days ago by seta1000

can you elaborate why some people advise to avoid them? Is

here

the biostar forum btw?

ADD REPLYlink modified 27 days ago • written 27 days ago by lieven.sterck1.9k

Yes, here, biostar forum, I don't exactly remember why. However, I performed genome-guided transcriptome assembly with two programs, cufflinks and stringtie and obtained so different results. It sounds stringtie miss a lot of genes, unlike cufflinks.

ADD REPLYlink modified 27 days ago • written 27 days ago by seta1000
1
gravatar for Kevin Blighe
26 days ago by
Kevin Blighe23k
Republic of Ireland
Kevin Blighe23k wrote:

The one tool that is no longer recommended (even by the developers of the program) is TopHat / TopHat2. I have not seen anybody not recommending the use of HISAT2 for the purposes of genome-guided de novo transcriptome assembly. Use HISAT2 / StringTie.

Kevin

ADD COMMENTlink written 26 days ago by Kevin Blighe23k

Thank you, Kevin. For alignment, I used STAR, it sounds great. However, my issue is the genome-guided assembler, as I said in my previous comment, the results of STAR/cufflinks and STAR/Stringtie are so different, Stringtie created few genes compared to cufflinks, I don't know why it missed lots of genes. So, I'm looking for another suitable genome-guided assembler, what about Trinity?

ADD REPLYlink modified 25 days ago • written 25 days ago by seta1000
1

Trinity you can of course also use, and there are other de novo transcriptome assemblers too. I would imagine that many of the differences between Cufflinks and StringTie relate to low abundance transcripts. These tools undoubtedly use different thresholds, too?

Trinity has a good reputation, if you wanted to use that instead

ADD REPLYlink written 25 days ago by Kevin Blighe23k

Dear @Kevin, Hi. I have the RNA-seq data of a fish (3 cond1 and 3 cond2 as biological replicate) and I have done Trinity de novo assembly on it. Now the draft genome of that species have released. In your opinion which pipeline and approach is better for me to do a genome-guided comparison? Thanks

ADD REPLYlink written 22 days ago by Farbod3.1k
1

HISAT2 would be okay to use. It would be interesting to use the new reference genome as a guide (in HISAT2) for the purposes of identifying the transcriptional 'landscape' of this fish species. That would make for a very good publication.

ADD REPLYlink modified 22 days ago • written 22 days ago by Kevin Blighe23k

Thank you. So, prior to using HISAT2, I should make an index reference of my species of interest using STAR or no?

ADD REPLYlink written 22 days ago by Farbod3.1k
1

You should create HISAT2-specific indices. Please read here: https://ccb.jhu.edu/software/hisat2/manual.shtml#indexing-a-reference-genome

After you run HISAT2, you then use a program called 'StringTie' for the purposes of identifying transcripts in the aligned data. If you encounter different types f errors during this process, please feel free to open up new questions on Biostars.

ADD REPLYlink modified 22 days ago • written 22 days ago by Kevin Blighe23k

Could you kindly suggest me some source or paper for appropriate scripts of HISAT2 usage? I mean scripts for indexing and mapping all 6 left and right reads to reference genome and then downstream DEG analysis? it looks like that this program has many -options.

ADD REPLYlink written 22 days ago by Farbod3.1k
1

Sure thing! Here is the publication in Nature Methods: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.

If you cannot obtain the PDF. then I may be able to get it for you.

ADD REPLYlink modified 22 days ago • written 22 days ago by Kevin Blighe23k

Dear @Kevin, I have used "./hisat2-build -p 6 '/home/Salmon-genome/GCF_salmon_genome.fna' ht2_base_salmon_genome" script in order to create an indexed genome using Hisat2. (is it a good script?)

8 *.ht2 files have been created.

I guess before using StringTie, I should mapped my individual paired-ends to the indexed reference genome, using HISAT2, is that correct?

ADD REPLYlink modified 21 days ago • written 21 days ago by Farbod3.1k

Hello Farbod, yes, that is correct. HISAT2 is a 'splice aware' alignment program, i.e., it can take RNA-seq reads and faithfully map these back to a reference genome, taking into account the fact that RNA-seq reads are mRNA and are comprised [mostly] of exon.

ADD REPLYlink written 21 days ago by Kevin Blighe23k

Dear @Kevin, hi . It seems that each .sam files that are produced in the mapping procedure will be a huge file, yes?

ADD REPLYlink written 19 days ago by Farbod3.1k
1

That is likely, yes, because SAM is uncompressed data. You can compress these to BAM or CRAM (both binary) in order to save disk space. BAM is likely more appropriate, as many programs do not yet explicitly support CRAM.

ADD REPLYlink written 19 days ago by Kevin Blighe23k

Hi @Kevin, Now I have 6 .sam files for my 12 fastq files (3 for cond1 and 3 for cond2), and my final goal is DEG analysis.

Should I use them (6 SAM files) directly in StringTie or I should merge them to just one file? or maybe use SAMtools/BCFtools before StringTie?

ADD REPLYlink written 18 days ago by Farbod3.1k
1

You should keep them separate. If you merge them, you cannot then obtain any useful statistics because it would be a 1 versus 1 comparison. By keeping them separate, you will have 3 versus 3, which is the bare minimum that anyone should have for differential expression analysis.

ADD REPLYlink written 18 days ago by Kevin Blighe23k

You are right. Thank you very much.

So, I should now proceed to StringTie level. Yes?

ADD REPLYlink written 18 days ago by Farbod3.1k
1

Yes, indeed, Sir. If you have aligned your data with HISAT2, then StringTie is the next step. StringTie will allow you to identify the expression level of your transcripts ('transcript abundances').

You should aim to read through the entire online manual: https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#run

ADD REPLYlink written 18 days ago by Kevin Blighe23k
1

Here is the recommended workflow for differential expression analysis: http://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de

ADD REPLYlink written 18 days ago by Kevin Blighe23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 705 users visited in the last hour