One of my collegue outsourced the data analysis and they followed below method in finding differential expression analysis
First they did, denovo assembly of controlA, controlB, treatedA and treated B separtely. Assembled transcripts from all the samples were iteratively clustered to produce uni-transcripts with fewest redudant sequence (they did not mention how they clustered to produce uni-transcripts). Then pre-processed reads were then mapped to the uni-transcritps to carry out the expression analysis and differential expression analysis using tophat-cufflinks.
they followed -clustering parameters
minimum identity for overlaps:96%
minimum overlap length:50bp
maximum length of unmatched overhangs:50bp
they followed -uni-transcipt is considered differentially expressed if
fold change > or equal to 4 and q-value less than 0.05
expression detected only in one smaple condition and qvalue less than 0.05
From my experience, I usually do denovo assemble all the samples (control A, Control B, treatedA and treated B) as one reference transcript assembly. Then I do read map each sample (controlA, controlB and treated A and treated B) and get for each sampe FPKM values. Then do differential expression analysis taking all read mapped ones in any one of differential expression software (edgeR, deseq).
Do you think my collegue outsourced method is correct? If it is correct, why they want to do clustering?