Question: Cuffdiff Locus - What is it?
gravatar for andrew.j.skelton73
4.8 years ago by
andrew.j.skelton735.6k wrote:

I've ran the tuxedo pipeline on some RNA Seq data I have and I'm confused about what the Locus is meant to represent. As you can see in my example below, they all have the same locus (a region of 19,241 bases), however, everything else about them is different, Tracking ID, Transcript ID, TSS ID, etc.

I thought it might have been linked to XLOC ID, but if there are multiple XLOX IDs per Locus then that doesn't make sense. Does anyone know how the "Locus" field is determined in the Tuxedo package? 

The manual states : "Genomic coordinates for easy browsing to the object"


rna-seq cuffdiff • 3.7k views
ADD COMMENTlink written 4.8 years ago by andrew.j.skelton735.6k

Hint: Look at that region in a genome browser. Note how there are multiple overlapping genes...

ADD REPLYlink written 4.8 years ago by Devon Ryan89k

Yes, those genes overlap, and the locus is there so that you can see that region in a genome browser, that part I get. My question is how is the locus determined? The above example shows two different gene names within the same locus, under two different XLOC codes. 

When you visualise that in a genome browser you see this:

There are no overlapping transcripts between. 

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by andrew.j.skelton735.6k

The only real answer would be to look through the cufflinks source code, since this isn't documented anywhere. I would guess that these are merged into a single locus for processing because the annotation file you gave to cufflinks, likely combined with the modifications it made to the annotated transcripts given your alignments, produced possibly overlapping features (genes in this case) that might need to be processed as a single unit. If you used an unstranded library where WASH7P was expressed, then cufflinks might have just merged that, DDX11L1, and MIR1302-10 into a single transcript, in which case treating the whole region as a single locus would make more sense. I suspect that cufflinks pre-bins the genome according to possible cases like this and then processes them separately, often producing multiple final loci. That's a slightly educated guess, at least.

Welcome to the wonderful world of completely undocumented features :P

ADD REPLYlink modified 4.8 years ago • written 4.8 years ago by Devon Ryan89k

Hi, Have you found the reason why multiple XLOC ids have the same locus? I recently ran cuffnorm and output have same locus for multiple XLOC ids.

ADD REPLYlink written 2.2 years ago by hothriananya50

I never did find out why, but I suspect Devon's answer above is on the money about binning chunks. I'd honestly suggest you stay away from the tuxedo pipeline and try DESeq2's workflow, or even Kallisto+Sleuth for isoform level events.

ADD REPLYlink written 2.2 years ago by andrew.j.skelton735.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1291 users visited in the last hour