Question: cufflinks splicing analysis drosophila - 'notest' in output
0
gravatar for fing
3.4 years ago by
fing0
United Kingdom
fing0 wrote:

Hi everyone,

I'm attempting to use cufflinks to examine alternative splicing in my RNAseq data from drosophila. I'm new to the analysis and running into some problems, I'd really appreciate any help offered.

I aligned my data to the dm6 reference genome from UCSC, using the R/BioConductor platform, with qAlign function in QuasR. I had no problems using these bam files to count reads and analyse differential expression in other related BioC packages.

When I try to use cufflinks to analyse alternative splicing in these bam files, I read that the output files can be blank if the reference genome lacks some features, so I updated my reference genome to prepare a .gff file with the correct features using cuffcompare:

cuffcompare -s . -CG -r dmel-all-r6.05.gff dmel-all-r6.05.gff 

This produces the file cuffcmp.combined.gtf, which I use with cuffdiff:

mkdir spliceresults

cuffdiff -o ./spliceresults cuffcmp.combined.gtf \

sample1.bam,sample2.bam,...,sampleN.bam \

sample1B.bam,sample2B.bam,...,sampleNB.bam

This produces an output file with thousands of genes, all of them have the status 'notest' in the splicing.diff output file. I am aware that problems can be caused by mismatch of the reference and data, but am I right in thinking that I have got over that initial problem with the cuffcompare command? If so, and my files match correctly, why am I getting this results? I read something about not having enough reads in the analysis produces the 'notest' output, but I cannot see that this would be the case here. I have used these bam files for other parts of the RNA analysis and these have worked fine, and I also have 30 biological replicates within each treatment.

I'm sure I am missing something obvious, but my lack of experience with splicing analysis and cufflinks is letting me down. Thanks very much in advance for any help or advice anyone can give me.

Fiona

cufflinks splicing drosophila • 1.3k views
ADD COMMENTlink written 3.4 years ago by fing0

Your cuffcompare command looks a bit funny to me, unless I'm missing something. It looks like you are using the reference gene annotation file (dmel-all-r6.05.gff) both as the reference (-r) and input gff list (instead of using your assembled transcriptome from each experiment).

ADD REPLYlink written 3.4 years ago by Joseph Pearson410

Hi, thanks for your reply. I guess it would make sense if there was something wrong with my reference file, and as this is the first time I used cuffcompare, it's possible my mistake is here. I was trying to follow instructions I found on a cufflinks helppage (http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/index.html) to create a .gtf file with tss_id and p_id attributes. The instruction said:

Note: If an arbitrary GTF/GFF3 file is used as input (instead of the .combined.gtf file produced by Cuffcompare), these attributes will not be present, but Cuffcompare can still be used to obtain these attributes with a command like this:

cuffcompare -s /path/to/genome_seqs.fa -CG -r annotation.gtf annotation.gtf

The resulting cuffcmp.combined.gtf file created by this command will have the tss_id and p_id attributes added to each record and this file can be used as input for cuffdiff.

So that is what I was trying to achieve with my cuffcompare command. Have I misunderstood? I'm afraid I can't see how, so if anyone could point out my mistake, I'd be really grateful.

ADD REPLYlink written 3.4 years ago by fing0
1

My mistake, I guess that would work. It was different than how I had used cuffcompare, so couldn't predict what that ouput would be.

I'm going to take a step back and ask whether you've

  1. Looked at your output annotation (gtf) from cuffcompare to make sure that it actually has full records.
  2. Checked your chromosome names (working in Drosophila, the FlyBase annotations are "X,2L,2R",etc, but UCSC will be "chrX, chr2L,chr2R", etc). I would think this would have caused issues already, but maybe double-check your fasta file names and gtf chromosome columns.
  3. Consider using the FlyBase "filtered" gff file: ftp://ftp.flybase.net/releases/FB2015_04/dmel_r6.07/gff/dmel-all-filtered-r6.07.gff.gz .This has a couple of problem annotations removed (problems for cufflinks). You could also pre-run this through the cufflinks utility gffread to remove non-necessary features (although I suspect cuffcompare will call gffread).
ADD REPLYlink written 3.4 years ago by Joseph Pearson410

Hi again, and a big thank you because you have fixed this issue for me.

Your point (3) did the job, the filtered .gff allows tests to run where the unfiltered version would not. As you suspected, cufflinks called gffread without prompt.

Best, Fiona

ADD REPLYlink written 3.4 years ago by fing0

Fantastic news, glad to help.

ADD REPLYlink written 3.4 years ago by Joseph Pearson410
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1607 users visited in the last hour