Question: Why does cufflinks split this transcript?
1
gravatar for corend
5 days ago by
corend20
corend20 wrote:

I have a RNA-seq dataset. I used Tophat and Cufflinks to align to my reference genome and make an assembly of my transcripts driven by a GTF of my genome.

I have an interest gene (9 exons), I am looking if it is expressed in my data. When I look at it in the assembly, I found it split in 2 parts. The 3 first exons in a transcript. And the 6 last exons in another transcript. I was thinking that the expression level was too low to make a good assembly of this transcript. But when I look at the sashimiplot from the .bam file used for the assembly, here is what I have:

Sashimi Plot of my gene

I have enough reads supporting the junction, but cufflinks reports 2 transcripts.

Any idea about why is this happening?

EDIT: Could it be comming from my parameters?

-u 
--min-frags-per-transfrag 10 
--max-multiread-fraction 0.99
--trim-3-avgcov-thresh 5 
--trim-3-dropoff-frac=0.1 
--overlap-radius 50
rna-seq cufflinks assembly • 108 views
ADD COMMENTlink modified 4 days ago • written 5 days ago by corend20
1

There supposingly are too few reads to support the junction between exon 3 and 4. Try to reduce the threshold (I guess it's --min-frags-per-transfrag) to a lower value, see what happens!

ADD REPLYlink written 4 days ago by Macspider1.9k
1

that's the junction with the most reads in the whole transcript

do you know if these reads are uniquely mapped?

ADD REPLYlink modified 3 days ago • written 3 days ago by Martombo1.8k

I just filtered out multimapped reads, and I have 67 reads supporting the junction instead of 77. It seems that this is not a multimapping problem :/

ADD REPLYlink written 1 day ago by corend20

did you check how do they overlap the junction? maybe a majority of them only has few bases on one exon...

ADD REPLYlink written 21 hours ago by Martombo1.8k

Nope, they overlap very well (not worse than any other junctions). I changed all parameters one by one, I always get this transcript split in2. The only way to have it full is to use faux-reads from a guide annotation (and I don't want to use a guide annotation).

ADD REPLYlink written 19 hours ago by corend20
1

I'm sorry I don't have any other ideas as I'm not an expert. For transcript quantification cufflinks has long been known for being sub-optimal, maybe some other software that is specifically built for assembly like SPAdes or Trinity would be a better choice?

ADD REPLYlink written 15 hours ago by Martombo1.8k

How is the reference in that region? Is there an N-stretch in between?

ADD REPLYlink written 2 hours ago by Macspider1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1277 users visited in the last hour