Question: Using Dexseq To Compare Differential Exon Usage From Different Technical Replicates
gravatar for alittleboy
5.4 years ago by
alittleboy210 wrote:

Hi All:

I have a general question about testing differential exon usage using DEXSeq. Suppose I have a sample called SLR (from a cell line), and it is sequenced on 5 lanes, so I have BAM files like L1_SLR, L2_SLR, L3_SLR, L5_SLR and L7_SLR.bam. Here, the letter "L" denotes "lanes". In a typical RNA-Seq experiment, we should expect that reads from those 5 lanes have little variations.

Now, similarly, I have three technical replicates of the cell line SLR, let's call them TEC1, TEC2 and TEC3. For each technical replicate (say TEC1), it is also sequenced using 5 lanes. For example, I have L1_TEC1, L3_TEC1, L4_TEC1, L5_TEC1 and L7_TEC1. Similarly for TEC2 and TEC3.

Since TEC1, TEC2 and TEC3 are technical replicates of cell line SLR, ideally there should not be any differential exon usage detected. Our goal is to focus on these technical replicates and see if we'll obtain larger than expected amounts of differential exon usage. The tool we consider is DEXSeq. I looked at this manual here: In Table 1, I think the author "concatenated" all the lanes (BAM files) into a single one for each replicate (treat1fb, etc), so we see "total" number of reads and exon counts. In my situation, can I know how to obtain

(1) the combined BAM file for SLR? Shall I cat the 5 lanes of BAM files? (after combination, I can get SAM file using samtools, and then use GFF_file \ SLR_sorted.sam SLR.txt)

(2) the GFF file is converted from GTF file using; can I know if the GTF file is just downloaded from Ensembl website (choose Homo Sapiens if my cell line is from human)?

(3) since my ultimate objective is to compare (maybe pairwise?) those technical replicates: SLR, TEC1, TEC2, TEC3, instead of the traditional situation of comparing biological sample 1 (with several tech reps) with biological sample 2 (with several tech reps). Would there be any problem? I can see that maybe comparing SLR vs. TEC1 is impossible (also for other pairwise comparisons) as there is no "replicates" of the replicate.

Thank you so much!

exon • 3.3k views
ADD COMMENTlink modified 5.4 years ago by venkateshr89690 • written 5.4 years ago by alittleboy210
gravatar for venkateshr89
5.4 years ago by
United States
venkateshr89690 wrote:


Since one sample is run on five lanes I don't see any problem in concatenating all the BAM files. Once you have a single BAM file you can prepare your reference GFF file and run I used refseq from UCSC browser since Ensembl genome had many genes compared to UCSC (which would inturn get your DEXSEQ run fairly faster). Look for the counts in count file and try visualizing in IGV browser. From my understanding you wanted to do pairwise comparison but still I couldn't understand what you wanted to do? or what your looking for?

Good luck

ADD COMMENTlink written 5.4 years ago by venkateshr89690

Thank you for the reply! I think you're right -- the lanes need to be combined so that a single sample has a BAM file. One more question: for preparation of GFF file, if I want to use Ensembl annotation, shall I download the GTF file and then pass the uncompressed gtf file to to get the GFF file? The GFF file should apply to that species (regardless of the BAM files for particular samples), am I right? Sorry this may be a naive question... Just wanna clarify ;-)

ADD REPLYlink written 5.4 years ago by alittleboy210

You might not even want to change the format to .GFF. You can still use and retain it as .gtf and get the ECS. Yes, the ensembl .gtf file should be for respective species. Hope it helps. It might looks little tricky in the beginning but once you finish every step you will get the big picture of it.

ADD REPLYlink written 5.4 years ago by venkateshr89690
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 810 users visited in the last hour