Ribo-seq Analysis software
4
3
Entering edit mode
7.9 years ago
Anna S ▴ 500

Do you know of any software packages available for ribo-seq analysis?

BTW, does anyone at UP (Penn State) have experience with this? A PI needs to resubmit his grant which would be greatly strengthened if he can find someone with a track record analyzing ribo-seq datasets, as the grant comments came back saying there is no established methods for ribo-seq analysis and that the technology is unproven and that there are no success stories.

I did find a few papers on the subject that seem to point to success though ((e.g. http://www.pnas.org/content/109/43/17394.full, http://www.sciencemag.org/content/324/5924/218.full, http://www.biomedcentral.com/content/pdf/gb-2013-14-4-r32.pdf).

Anyhow, your insights would be greatly appreciated!

Thanks much!!!

ribo-seq • 7.4k views
0
Entering edit mode

Given that Istvan Albert is faculty at PSU, you might just want to contact him and the bioinformatics consulting center.

0
Entering edit mode

................

3
Entering edit mode
7.7 years ago
Anna S ▴ 500

Actually, I have since looked a little bit into this problem and I have found out the following, which I'm sharing in case this is interesting to you.

In at least one of the papers (I think it was even 'the seminal' paper!) the method used to analyze was simply to first remove the ribosomes from the sequences and then proceed with the analysis as in rna-seq. This would be really easy, right (!). Unfortunately, ribo-seq presents a lot of extra challenges. Following are excerpts from the A.M. Michel and P.V. Baranov review:

1. In rna-seq, FKPM units are obtained, which reflect normalization wrt length of the transcript as well as to the total number of mapped reads. This normalization is only a broad approximation in ribo-seq because of the high variance in sequence- and condition-dependent pausing and stalling, including complexity at 5'UTRs. Therefore the density of ribosomes could be similar for 2 mRNAs but one of them would not be translated as efficiently as the other if it is covered with paused ribosomes.
2. uORFs in the 5'UTR that regulate the translation of the main protein or other uORFs that may exist in the same transcript. At a minimum this necessitates the discrimination of ribosome density in the 5'leaders form CDS regions when quantifying RPFs for protein synthesis measurements. This however would not solve the issue of uORFs that overlap with the main ORF and of nonupstream or nested ORFs (nORFs) contained within main ORFs, since in this case footprints aligning to the pORF do not necessarily indicate its translation.
3. A change in mRNA abundance due to changes in transcription or mRNA stability would result in a corresponding change in the number of ribosome footprints.
4. A limitation of ribosome profiling is that it allows to measure only relative changes in gene expression (vs absolute changes), which could cause global suppression of translation to be misinterpreted as the activation of translation of a few unaffected genes.

The paper suggests some ideas to deal with issues 2-4 above at least in some cases, but clearly this is a challenging problem!

3
Entering edit mode
7.4 years ago

A little late to the party, but I've just released riboSeqR; a package of visualisation and analysis methods for riboSeq data on Bioconductor.

1
Entering edit mode

Correction: the data in this article is produced with MNase rather than RNase I. They notice that although MNase is tolerated by fruitfly's ribosomes, MNase has a strong 3'A/T bias, which may yield a small amount of positional uncertainty. So most likely the sub-codon resolution is obsured by this.

Question: how could I use only annotated ORF in fasta? it seems all possible ORFs are used if I used findCDS.

Note: the pipeline I used below does not guarantee the reads in analysis are uniquely mapped to a certain genomic region. Cautions should be taken here.

Hi, I am trying to use riboSeqR on fly data, but what I got is not what I expected. I downloaded SRR942881.sra, and then preprocess it with cutadapt and trimmomatic. Then I mapped to rRNA, tRNA, pseudogene, ncRNA , miscRNA and miRNA to filter these reads out (flybase sequences dmel6.02). With only reads that can not mapped to rRNA and stuff, I mapped them to coding transcripts. Both are based on bowtie 1.1.2. But the tri-nucleotide periodicity is not very well as their original article (Joshua Dunn) states. Anything that I may have done wrong using riboSeqR

#download from flybase can not be trace, but it is from dmel 6.02
bowtie-build ../flybaseFasta/dmel-all-transposon-r6.02.fasta,../flybaseFasta/dmel-all-tRNA-r6.02.fasta,../flybaseFasta/dmel-all-pseudogene-r6.02.fasta,../flybaseFasta/dmel-all-miscRNA-r6.02.fasta,../flybaseFasta/dmel-all-miRNA-r6.02.fasta tRNAmimiPseuTransposon
bowtie-build ../flybaseFasta/dmel-all-transcript-r6.02.fasta transcript

# Now this dataset
fastq-dump SRR942881.sra

split -d -l 10257940 SRR942881.fastq SRR942881.fastq

for I in seq -w 0 29; do mv SRR942881.fastq${i} SRR942881.${i}.fastq; cutadapt -a CTGTAGGC -a ACTGTAGG -o SRR942881.trim${i}.fastq SRR942881.${i}.fastq & done

for I in seq -w 0 29; do java -jar ~/rd/software/Trimmomatic-0.33/trimmomatic-0.33.jar SE -phred33 SRR942881.trim${i}.fastq SRR942881.fine${i}.fastq ILLUMINACLIP:smallRNAv1.5.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:20 2> SRR942881.fine\${i}.err ; done

cat SRR942881.fine[0-9][0-9].fastq > SRR942881.fine.fastq

bowtie ~/rd2/data/flygenome/dmel_r6.02/bowtieIndex/tRNAmimiPseuTransposon SRR942881.fine.fastq -l 20 -p 12 --un SRR942881_bowtieNoRRNA.fastq > /dev/null

# reads with at least one reported alignment: 44794396 (58.41%)
# reads that failed to align: 31890648 (41.59%)

bowtie ~/rd2/data/flygenome/dmel_r6.02/bowtieIndex/transcript SRR942881_bowtieNoRRNA.fastq -l 20 -p 12 --suppress 1,6,7,8 > SRR942881.bowtie.out

# reads with at least one reported alignment: 22202258 (69.62%)
# reads that failed to align: 9688390 (30.38%)

# Now entering R

~/rd/software/R-3.2.3/bin/R

library("riboSeqR")

fastaCDS = findCDS(fastaFile="../fly-transcript-r6.02.fa" , startCodon = c("ATG"), stopCodon = c("TAG", "TAA", "TGA"))
riboDat = readRibodata("SRR942881.bowtie.out", replicates = c("wt3"))
fCs <- frameCounting(riboDat, fastaCDS , lengths = 27:35)
fS <- readingFrame(rC = fCs, lengths = 27:35)

plotFS(fS)

1
Entering edit mode
7.7 years ago
Jason ▴ 910

I would definitely talk with Dr. Edward O'Brien.

http://www.chem.psu.edu/directory/epo2

I've seen him give multiple talks on ribo-seq analysis here at PSU and he seems like he's very knowledgeable on the subject.

0
Entering edit mode

Thank you, Jason!

0
Entering edit mode

0
Entering edit mode
7.7 years ago
Manvendra Singh ★ 2.2k

I am analyzing Ribo-seq data. After removing the reads carrying ribosomal sequence, I mapped it on transcriptome which I had got from RABT (cufflinks) assembly of RNA-seq data from same cell lines (I think that this was right way to do it).

Well it was quite informative for my project, but there are few issues which I could never defend e.g. some genes/isoforms have no expression in RNA-seq data but there is considerable read density in Ribo-seq data.

So there are ups and downs but its good to use Ribo-seq for translational level information, as we know that its not necessary that all transcripts make proteins.

HTH