I am currently dealing with a number of RNA-Seq runs. We are using MiSeq for sequencing. We have one Wild Type and a number of mutant samples. I am doing the analysis as follows:
bowtie2) to align the reads to the genome. The genome is indexed properly, and I have used the index for some other analyses with small RNAs earlier.
accepted_hits.bamfile generated by
tophatto generate the
transcripts.gtffile for each run. When doing this I use a
gtffile to mask rDNA coordinates so that they do not interfere with further analysis.
- I then create a
assembly_list.txtfile that contains full path to all the
transcripts.gtffiles created and that I want to analyse together.
- I use the
assembly_list.txtfile and provide the original
accepted_hits.bamfile with appropriate labels to generate a "union" of genes across all the sampled under consideration. I provide the reference
gtffile to the
cuffmerge, along with the genome fasta file to correct for bias.
- I then use the
merged.gtffile, and also provide the respective
accepted_hits.bamfiles with proper label.
I then use
cummeRbund to analyse the
Most strangely I do not see any new/un-annotated transcripts generated. Further more when I try
annotation( genes( cuff ) ) in
cummeRbund, I get a table, but the column of
class_codes is populated with only
What am I doing wrong?
While going through the output files created, i realised that the
merged.gtf file actually contains the
class_codes for the transcripts identified. So clearly, it is the
cummeRbund that is not reading these codes correctly. Is this a bug in
cummeRbund, or is this a known issue.
I also noticed something most bizzare with this analysis. After doing all the above mentioned steps, when i try to visualize the
accepted_hits.bam file on a genome browser, i see no reads mapping to the a gene that is supposed to be deleted from the sample. However, the
cuffdiff output shows the amount of transcripts present nearly equal to that in WT!!! This has me completely stumped.