8.0 years ago by
The situation, as far as I understand it, is not the standard way of analysis for running DE analysis, which is having an independently sequenced reference genome and a genome annotation with exons, introns and other regions. In your case of de novo assembly, you have a transcriptome assembly yielding a set of transcript contigs based on reads and you want to assess differential expression using the same reads that were used to generate the transcriptome assembly. Please comment if I misunderstood your question and provide additional information. The Bow-Top-Cuff... pipeline is mainly designed for the standard case of a reference genome, but you hava the transcripts already so you don't need cufflinks, cufflinks doesn't need a GFF file as input either, it makes one as an output and also can give you the FPKM, but that would re-do the assembly, which is possibly not what you want.
If you have alignments in SAM,BAM (if not create them aligning the reads to the contigs), you can directly run cuffdiff using your SAM/BAM files and the GFF file you are asking for. Making such a file should be straight forward, following the spec it looks like this :
# Fields are: <seqname> <source> <feature> <start> <end> <score> <strand> <frame>
Contig_1 AssemblySoftware transcript 1 <length of contig1> . . .
Contig_2 AssemblySoftware transcript 1 <length of contig2> . . .
Explanation, you make one entry for each contig, starting at 1 and ending at the last base of the contig, there is no score, strand or frame information, so it's left out.
I think the feature field is not relevant, but not 100% sure if the choice of a certain string is required.
Hope this helps.