More Issues With Cufflinks, Tophat And Cummerbund: No New Transcripts Detected?
1
5
Entering edit mode
8.2 years ago
Sameet ▴ 300

I am currently dealing with a number of RNA-Seq runs. We are using MiSeq for sequencing. We have one Wild Type and a number of mutant samples. I am doing the analysis as follows:

1. Use tophat (with bowtie2) to align the reads to the genome. The genome is indexed properly, and I have used the index for some other analyses with small RNAs earlier.
2. Use cufflinks on the accepted_hits.bam file generated by tophat to generate the transcripts.gtf file for each run. When doing this I use a gtf file to mask rDNA coordinates so that they do not interfere with further analysis.
3. I then create a assembly_list.txt file that contains full path to all the transcripts.gtf files created and that I want to analyse together.
4. I use the cuffmerge with the assembly_list.txt file and provide the original accepted_hits.bam file with appropriate labels to generate a "union" of genes across all the sampled under consideration. I provide the reference gtf file to the cuffmerge, along with the genome fasta file to correct for bias.
5. I then use the cuffdiff with the merged.gtf file, and also provide the respective accepted_hits.bam files with proper label.

I then use cummeRbund to analyse the cuffdiff results.

Most strangely I do not see any new/un-annotated transcripts generated. Further more when I try annotation( genes( cuff ) ) in cummeRbund, I get a table, but the column of class_codes is populated with only NAs.

What am I doing wrong?

UPDATE: While going through the output files created, i realised that the merged.gtf file actually contains the class_codes for the transcripts identified. So clearly, it is the cummeRbund that is not reading these codes correctly. Is this a bug in cummeRbund, or is this a known issue.

UPDATE 2: I also noticed something most bizzare with this analysis. After doing all the above mentioned steps, when i try to visualize the accepted_hits.bam file on a genome browser, i see no reads mapping to the a gene that is supposed to be deleted from the sample. However, the cuffdiff output shows the amount of transcripts present nearly equal to that in WT!!! This has me completely stumped.

tophat cufflinks cummerbund rnaseq • 4.6k views
2
Entering edit mode
8.0 years ago

Its been a while but perhaps you still need help.

One thing is that you must show readCufflinks() where to find your merged.gtf. It does not find this automatically and it is NOT in your cuffdiff results. It is in your cuffmerge results folder.

Something like this is needed when in R:

cuff <- readCufflinks(gtfFile=gtf_path)


Hope that helps with the annotation stuff.

PS: you may want to automate the whole process to some degree. I have written an automation pipeline for the tuxedo protocol that I am looking for people to test. It is meant to alleviate things like this. If you feel like trying it out, please see the docs: http://xguse.github.io/blacktie/index.html.

0
Entering edit mode

I have the exact same issue as the first poster and tried to fix it with your method. It didn't seem to work. Any other suggestions?

I can also see that in my diff_out folder, the genes.fpkm_tracking file has the correct gene names associated with it.

0
Entering edit mode

That one answer is so perfect. most of the time the maesteros think that we (students of bioinformatics) know it all. We are biologists and we need the answers in precision..

This is one complete answer... "

One thing is that you must show readCufflinks() where to find your merged.gtf. It does not find this automatically and it is NOT in your cuffdiff results. It is in your cuffmerge results folder. "