Question

Cufflinks Skips Loci, Marks With Hidata

1

Entering edit mode

10.1 years ago

sanderrr010 ▴ 10

Hi Guys,

When I try to run cufflinks, with the command:

cufflinks --GTF /.../B0510_manual_reindexed_v2.gff --min-isoform-fraction 0.5 --pre-mrna-fraction 0.05 --max-intron-length 2000 --small-anchor-fraction 0.06 --min-intron-length 30 --overlap-radius 1 --3-overhang-tolerance 0 --intron-overhang-tolerance 0 --no-faux-reads -p 8 -o /.../cufflinks_out_V3/Apo12B/ /media/cinerea/BGI_RNAseq_V2/.../Apo12B/accepted_hits.bam

Cufflinks just skips a huge part (+- 3.4Mb) of a scaffold, at the following step:

You are using Cufflinks v2.1.1, which is the most recent release. [14:00:50] Loading reference annotation. [14:00:50] Inspecting reads and determining fragment length distribution. Processing Locus B0510_5C01:490546-492362 [ ] 0%

I tried to tweak the parameter --max-bundle-frags up and down, but this does not make any difference. In isoforms.fpkm_tracking the transcripts are marked with HIDATA. The reads seem fine at this locus.

What is wrong? any ideas?

EDIT: I inspected the -verbose logs, and I see that exactly this part that's being skipped, is taken by cufflinks as one big bundle, with 1M reads on it. I lowered the --max-bundle-length flag, but this does not seem to have any effect at all?

EDIT2: It filters the large bundle after the "processing-step" resulting in no outcome at all for the genes in that locus. Where does cufflinks get it's bundle sizes from? Can I adjust this?

cufflinks • 3.9k views

ADD COMMENT • link updated 10.1 years ago by Mikael Huss 4.8k • written 10.1 years ago by sanderrr010 ▴ 10

score 0 · Answer 1 · 2014-03-05

0

Entering edit mode

10.1 years ago

Mikael Huss 4.8k

You have to increase the --max-bundle-frags option to a large enough number. By default, Cufflinks will skip those transcripts/regions that have >1 million reads mapped to them. I usually use something like 10^9 to be on the safe side :-)

ADD COMMENT • link 10.1 years ago by Mikael Huss 4.8k

0

Entering edit mode

It does not make any difference. When I adjust this paramer it does not have any influence on the result. Hence the number of reads on that locus is not too high, but I think the locus is too big. I want to know where Cufflinks get its bundle sizes from, and if I can change this.

ADD REPLY • link 10.1 years ago by sanderrr010 ▴ 10

0

Entering edit mode

OK. I think you need to increase the max bundle length then. Have you tried that? In the explanation above, you only wrote that you had lowered it. In a run I looked at yesterday I used 10 million for both the size and length flags and it worked in that case. I am not too sure how the bundles are defined, unfortunately.

ADD REPLY • link 10.1 years ago by Mikael Huss 4.8k

0

Entering edit mode

Something really stupid was the cause of this all. Inside my Gff file, there was a gene of size 3,5MB... So therefore cufflinks takes it as one bundle. After all no cufflinks problem.

ADD REPLY • link 10.1 years ago by sanderrr010 ▴ 10

0

Entering edit mode

So that would have been solved by increasing --max-bundle-length to 10 million I guess.

ADD REPLY • link 10.1 years ago by Mikael Huss 4.8k