Question

Prokaryotic Rnaseq Analysis-The Basics

1

Entering edit mode

12.0 years ago

behrendt.l ▴ 10

Dear all, First of all-thanks for your time. I am relatively new to the RNAseq world and I am currently struggling with transcriptome data that I want to align to a reference genome in order to obtain data on differentially expressed genes. I am working with prokaryotic organisms that do NOT have greatly annotated genomes that can be downloaded. Instead I did perform a genome sequencing myself and have a certain amount of larger contigs to which I have a preliminary annotation file from RAST.

Here are the basic steps as far as I understood them:

1) Get sequences

2) Align sequences against the reference genome using BWA/Bowtie (I used BWA), take the SAM files and convert them to sorted and indexed BAM files.

3) Use the GenomicFeatures package in R to summarize the reads by genes in each location.

Here is where I got stuck- I tried to make my own transcript database using the "makeTranscriptDB" command. Unfortunately i do NOT have information concerning splice sites as I work with prokaryotes and I am not sure how to handle this (it is a requisite file for the command). Any good suggestions ?

4) I have not gotten this far but in theory I would need to perform differential expression testing using a package in R - any good suggestions for prokaryotes ?

Is this workflow, at least in theory, correct ? Any help will be greatly appreciated ! Thanks in advance. Lars

rna-seq r • 5.8k views

ADD COMMENT • link updated 11.7 years ago by Leonor Palmeira 3.9k • written 12.0 years ago by behrendt.l ▴ 10

score 1 · Answer 1 · 2012-05-11

1

Entering edit mode

12.0 years ago

Leonor Palmeira 3.9k

As you are working on a prokaryotic organism, I suggest you use the Bioconductor girafe package instead of GenomicFeatures. This will allow you to visualise your reads on the genome. You can deal with your indexed and sorted BAM files with the agiFromBam() function.

If you are more familiar with R, and as you are working on a rather small organism (one chromosome) and -- if I understood correctly -- on your own sequence and annotation, you can also build your own custom script to visualize your reads. This is what I usually do on the large viruses I'm working on. You can ask for help here if needed.

ADD COMMENT • link 12.0 years ago by Leonor Palmeira 3.9k

0

Entering edit mode

Tablet (http://bioinf.scri.ac.uk/tablet/) is also a good option to visualize aligned data (just need your BAM indexed)

ADD REPLY • link 12.0 years ago by Marina Manrique ★ 1.3k

0

Entering edit mode

Thanks Leonor, I will try the girafe package for starters. I am not really experienced in R, is it necessary to use a a custom script ? I might come back and ask additional questions later-thanks !

ADD REPLY • link 12.0 years ago by behrendt.l ▴ 10

0

Entering edit mode

No, I only use custom scripts because I know exactly how I want my figures to look like (I'm just very picky). Using the girafe package should get you where you want :-)

ADD REPLY • link 12.0 years ago by Leonor Palmeira 3.9k

score 0 · Answer 2 · 2012-05-11

Have you considered using the Tophat-Cufflinks-Cuffdiff pipeline?, you would get what you're looking for (I guess) but avoiding the step 3 and 4

This 'pipeline' is specially designed for eukaryotic transcriptomes (since it's oriented to detect isoforms and alternative splicing events) but I think it could also work for bacterial transcriptomes (much less complex),

I think these tools are pretty easy to install, straightforward to use and the manual pages are quite good,

HTH

Marina

EDIT: Tophat http://tophat.cbcb.umd.edu/ Cufflinks http://cufflinks.cbcb.umd.edu/