Question: Creating a fastq generator : how to handle the 3' ends of transcripts.
gravatar for irritable_phd_syndrom
20 months ago by
irritable_phd_syndrom50 wrote:

I am currently investigating the different spliceforms in an experimental sample. To get a better understanding of how the different spliceform finding software works, I created a program that generates fake fastq data.

Here is how it works :

  1. Read in GTF file
  2. Select transcript of interest from the GTF file.
  3. Generate random numbers for the start position of the read. So if my random number is 54, my read will start at position 54.

Step 3 is where I get into trouble. I'm not sure how to handle the end of the transcript. For example, say that I want 100 base reads in my fastq file. Let's say the transcript of interest is 2000bases long. If I draw a random number between 1-1900, I am fine. However, if I draw a number between 1901-2000, say 1950, I get into trouble because I don't know what to make the remaining 50 bases of the read.

A couple potential solutions I thought of:

  1. Randomly add sequences to the 3' end
  2. Pretend that I read into the Illumina (or similar) adapter.

What experimentally happens in this situation? Is there a bias against the ends of transcripts when doing size selection in RNA-Seq?

transcript rna-seq • 585 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by irritable_phd_syndrom50

Actually the bias is towards the 3'end if one is doing poly-A selection. I don't think #1 is a good idea, it would not be biologically relevant. You could look into 3'-UTR (or are you already taking those) and/or doing #2.

You could also look at published datasets where the truth is known (to some extent). Someone here may be able to provide a good example.

ADD REPLYlink modified 20 months ago • written 20 months ago by genomax42k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 725 users visited in the last hour