Question: segmentation for RNAseq dataset from fruitfly
gravatar for
5.7 years ago by wrote:
Hello everyone,
I am new in bioinformatics, I have a several tasks to do, but I am realy confused how can I do that.

What is needed is (given from my prof):

given a set of gene expression data (let's use RNAseq-data for fly to 
keep the memory and CPU efforts down), you map them to the genome. 
This gives, for each sample/data set, a single signal "expression value",
i.e., coverage f(x) as a function of the genomic coordinate x. 

Now that task is to compute segmentations of this signal, i.e., find 
a set of intervals on which f is approximately constant. 

First do this for every data set separately. 

Now we have a more difficult problem. Given the f_i(x) for each data set 
i, find a segmentation so that EACH f_i is approximately constant on each 

Of course, you want segmentations that have as few intervals as possible. 

I would suggest to do two things:

(1) find a set of about 12 different RNAseq data sets from the fruitfly 
and map them to the genome. 

(2) re-implement the simplest segmentation algorithms for time series-like 
data and test them. 

(3) check how consistent are the results. 

(4) how can we combine the different signal f_i to define a single 
criterion for segmenting the signal jointly. 

The point now is that, of course, we want that the number of segments that
we are defining only slowly grows with i and eventually saturates, since 
otherwise you just wind up with every genomic position being its own 
interval -- which is of course a useless segmentation.


can any one explain what this tasks mean exactly:
1-from where I can get the RNAseq (GEO, SRA, FLYBase...)

2- what is "single signal expression value" and coverage?

3- the sequences from databases do not contains coverage? should i calcul the coverage?!! if so, from where I get nomber of reads!!

4-What are the segmentation algorithm that should be used?



ADD COMMENTlink modified 5 months ago by Biostar ♦♦ 20 • written 5.7 years ago by

Check How To Ask Good Questions On Technical And Scientific Forums for some guidelines for posting questions on technical and scientific forums. One general recommendation is "do not post homework questions".

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by h.mon32k
gravatar for notthebmovieactor
5.6 years ago by
notthebmovieactor0 wrote:


If you're looking to compute segmentations (or annotations) on the genome using data like RNAseq, I'd recommend using Segway:

I'm currently a programmer on the project and it sounds like your project might fit well for our software. We're constantly working on it and we're always happy to help and provide support on any techincal issues.

Hope that helps!

- Eric

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by notthebmovieactor0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1529 users visited in the last hour