I'm attempting to use cufflinks to examine alternative splicing in my RNAseq data from drosophila. I'm new to the analysis and running into some problems, I'd really appreciate any help offered.
I aligned my data to the dm6 reference genome from UCSC, using the R/BioConductor platform, with qAlign function in QuasR. I had no problems using these bam files to count reads and analyse differential expression in other related BioC packages.
When I try to use cufflinks to analyse alternative splicing in these bam files, I read that the output files can be blank if the reference genome lacks some features, so I updated my reference genome to prepare a .gff file with the correct features using cuffcompare:
cuffcompare -s . -CG -r dmel-all-r6.05.gff dmel-all-r6.05.gff
This produces the file cuffcmp.combined.gtf, which I use with cuffdiff:
mkdir spliceresults cuffdiff -o ./spliceresults cuffcmp.combined.gtf \ sample1.bam,sample2.bam,...,sampleN.bam \ sample1B.bam,sample2B.bam,...,sampleNB.bam
This produces an output file with thousands of genes, all of them have the status 'notest' in the splicing.diff output file. I am aware that problems can be caused by mismatch of the reference and data, but am I right in thinking that I have got over that initial problem with the cuffcompare command? If so, and my files match correctly, why am I getting this results? I read something about not having enough reads in the analysis produces the 'notest' output, but I cannot see that this would be the case here. I have used these bam files for other parts of the RNA analysis and these have worked fine, and I also have 30 biological replicates within each treatment.
I'm sure I am missing something obvious, but my lack of experience with splicing analysis and cufflinks is letting me down. Thanks very much in advance for any help or advice anyone can give me.