Ribo and RNA seq analysis
Entering edit mode
7 weeks ago

Dear Biostars community, I am working for a project in which I have to generate features from a RNA sequence to classify if it is gonna translate or not.

I though about adding Ribosome Foot Print (RFP) and Expression levels. For these features, I built a huge Ribo-seq and RNA-seq dataset to cover as many RNA-sequences as possible, and I preprocessed them. My RNA sequences are in Fasta format, I indexed them, and here comes my question:

Should I first align the Ribo-seq and RNA-seq datasets to the reference Genome with its annotation (taken from GENCODE), and then align the aligned sequences to my indexed RNA sequences? or this is just a waste of time so I can directly align Ribo-seq and RNA-seq datasets to my indexed RNA sequences?

Thank you for your time.

RNA-seq genomics Ribo-seq • 380 views
Entering edit mode
4 weeks ago

I got it making use of blastn, creating first a database with the RNA sequences and doing blast with the Ribo-seq sequences as query. The process is computationally expensive, but splitting the big dataset into smaller datasets works surprisingly well. This is the blastn command line:

blastn -query "$file" -db db/dataset -evalue 1e-05 -perc_identity 60 -max_target_seqs 10 -num_threads 14 -outfmt "6 sseqid" |

For the full code you can contact me. I also would like to know if this could be relevant for the main question. Thanks!

Entering edit mode

Hi Manu Ayllon,

It is difficult to advise as I am unclear on the goal of your project. I will tell you what I understand and you can correct me where needed.

The overall goal is to develop a classifier that will accurately determine if a given RNA is translated or not. Meaning translated but not necessarily encoding a stable protein product.

To develop this classifier you want to take publicly available Ribo-Seq data and their paired RNA-Seq to obtain a set of translated RNA's on which you could start to train your model using sequence features (?).

Are you just looking at human data? I cannot tell from your blastn command. What kinds of RNAs are you investigating? I am unclear about the role of blastn and why you wouldn't just use a reference annotation. I am happy to help, just need a bit more info!


Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6