Question: Why does cufflinks split this transcript?
5 days ago by
corend20 wrote:

I have a RNA-seq dataset. I used Tophat and Cufflinks to align to my reference genome and make an assembly of my transcripts driven by a GTF of my genome.

I have an interest gene (9 exons), I am looking if it is expressed in my data. When I look at it in the assembly, I found it split in 2 parts. The 3 first exons in a transcript. And the 6 last exons in another transcript. I was thinking that the expression level was too low to make a good assembly of this transcript. But when I look at the sashimiplot from the .bam file used for the assembly, here is what I have:

Sashimi Plot of my gene

I have enough reads supporting the junction, but cufflinks reports 2 transcripts.

Any idea about why is this happening?

EDIT: Could it be comming from my parameters?

--min-frags-per-transfrag 10 
--max-multiread-fraction 0.99
--trim-3-avgcov-thresh 5 
--overlap-radius 50
ADD COMMENTlink modified 4 days ago • written 5 days ago by corend20

There supposingly are too few reads to support the junction between exon 3 and 4. Try to reduce the threshold (I guess it's --min-frags-per-transfrag) to a lower value, see what happens!

ADD REPLYlink written 4 days ago by Macspider1.9k

that's the junction with the most reads in the whole transcript

do you know if these reads are uniquely mapped?

ADD REPLYlink modified 3 days ago • written 3 days ago by Martombo1.8k

I just filtered out multimapped reads, and I have 67 reads supporting the junction instead of 77. It seems that this is not a multimapping problem :/

ADD REPLYlink written 1 day ago by corend20

did you check how do they overlap the junction? maybe a majority of them only has few bases on one exon...

ADD REPLYlink written 21 hours ago by Martombo1.8k

Nope, they overlap very well (not worse than any other junctions). I changed all parameters one by one, I always get this transcript split in2. The only way to have it full is to use faux-reads from a guide annotation (and I don't want to use a guide annotation).

ADD REPLYlink written 19 hours ago by corend20

I'm sorry I don't have any other ideas as I'm not an expert. For transcript quantification cufflinks has long been known for being sub-optimal, maybe some other software that is specifically built for assembly like SPAdes or Trinity would be a better choice?

ADD REPLYlink written 15 hours ago by Martombo1.8k

How is the reference in that region? Is there an N-stretch in between?

ADD REPLYlink written 2 hours ago by Macspider1.9k
