Question: Creating a fastq generator : how to handle the 3' ends of transcripts.
gravatar for irritable_phd_syndrom
15 months ago by
irritable_phd_syndrom40 wrote:

I am currently investigating the different spliceforms in an experimental sample. To get a better understanding of how the different spliceform finding software works, I created a program that generates fake fastq data.

Here is how it works :

  1. Read in GTF file
  2. Select transcript of interest from the GTF file.
  3. Generate random numbers for the start position of the read. So if my random number is 54, my read will start at position 54.

Step 3 is where I get into trouble. I'm not sure how to handle the end of the transcript. For example, say that I want 100 base reads in my fastq file. Let's say the transcript of interest is 2000bases long. If I draw a random number between 1-1900, I am fine. However, if I draw a number between 1901-2000, say 1950, I get into trouble because I don't know what to make the remaining 50 bases of the read.

A couple potential solutions I thought of:

  1. Randomly add sequences to the 3' end
  2. Pretend that I read into the Illumina (or similar) adapter.

What experimentally happens in this situation? Is there a bias against the ends of transcripts when doing size selection in RNA-Seq?

transcript rna-seq • 438 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by irritable_phd_syndrom40

Actually the bias is towards the 3'end if one is doing poly-A selection. I don't think #1 is a good idea, it would not be biologically relevant. You could look into 3'-UTR (or are you already taking those) and/or doing #2.

You could also look at published datasets where the truth is known (to some extent). Someone here may be able to provide a good example.

ADD REPLYlink modified 15 months ago • written 15 months ago by genomax33k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1477 users visited in the last hour