I have around 120GB ran-seq data. There is a draft genome available for organism of my interest but no standard annotations. I wanted to do differential expression analysis.
The approach I have used is:
tophat2 on all samples.
2. Run cufflinks to get a
transcripts.gtf file for each sample, which is supposed to have assembled isoforms structures.
3. Run cuffmerge on all gtf files to create a
merged.gtf file which is kind of master gtf file of all isoforms possible.
cuffdiff/edgeR/DEseq for differential expression analysis using the
5. Convert the
merged.gtf to fasta file of transcripts using
gffread ( of tuxedo suite ) and annotate all the transcripts using Annocript and append this annotation information to
cuffdiff output to know the function of differentially expressed transcripts.
I would like to know wether this approach is ok and would like to hear suggestions to improve the pipeline. I have doubt regarding the merged.gtf file generated in step 4, can I blindly depend on this file assuming it as standard gtf file like that of ensemble ? But I do not see any alternative approach ( tools like
StrigTie does the similar job ). I want to use
merged.gtf as a standard annotation file like that of ensemble to run other tools/pipelines of my interest which deals with differential splicing between different conditions. Can I rely on