Question: Cufflinks Skips Loci, Marks With Hidata
1
gravatar for sanderrr010
5.1 years ago by
sanderrr01010
sanderrr01010 wrote:

Hi Guys,

When I try to run cufflinks, with the command:

cufflinks --GTF /.../B0510_manual_reindexed_v2.gff --min-isoform-fraction 0.5 --pre-mrna-fraction 0.05 --max-intron-length 2000 --small-anchor-fraction 0.06 --min-intron-length 30 --overlap-radius 1 --3-overhang-tolerance 0 --intron-overhang-tolerance 0 --no-faux-reads -p 8 -o /.../cufflinks_out_V3/Apo12B/ /media/cinerea/BGI_RNAseq_V2/.../Apo12B/accepted_hits.bam

Cufflinks just skips a huge part (+- 3.4Mb) of a scaffold, at the following step:

You are using Cufflinks v2.1.1, which is the most recent release. [14:00:50] Loading reference annotation. [14:00:50] Inspecting reads and determining fragment length distribution. Processing Locus B0510_5C01:490546-492362 [ ] 0%

I tried to tweak the parameter --max-bundle-frags up and down, but this does not make any difference. In isoforms.fpkm_tracking the transcripts are marked with HIDATA. The reads seem fine at this locus.

What is wrong? any ideas?

EDIT: I inspected the -verbose logs, and I see that exactly this part that's being skipped, is taken by cufflinks as one big bundle, with 1M reads on it. I lowered the --max-bundle-length flag, but this does not seem to have any effect at all?

EDIT2: It filters the large bundle after the "processing-step" resulting in no outcome at all for the genes in that locus. Where does cufflinks get it's bundle sizes from? Can I adjust this?

cufflinks • 2.7k views
ADD COMMENTlink modified 5.1 years ago by Mikael Huss4.6k • written 5.1 years ago by sanderrr01010
0
gravatar for Mikael Huss
5.1 years ago by
Mikael Huss4.6k
Stockholm
Mikael Huss4.6k wrote:

You have to increase the --max-bundle-frags option to a large enough number. By default, Cufflinks will skip those transcripts/regions that have >1 million reads mapped to them. I usually use something like 10^9 to be on the safe side :-)

ADD COMMENTlink written 5.1 years ago by Mikael Huss4.6k

It does not make any difference. When I adjust this paramer it does not have any influence on the result. Hence the number of reads on that locus is not too high, but I think the locus is too big. I want to know where Cufflinks get its bundle sizes from, and if I can change this.

ADD REPLYlink written 5.0 years ago by sanderrr01010

OK. I think you need to increase the max bundle length then. Have you tried that? In the explanation above, you only wrote that you had lowered it. In a run I looked at yesterday I used 10 million for both the size and length flags and it worked in that case. I am not too sure how the bundles are defined, unfortunately.

ADD REPLYlink written 5.0 years ago by Mikael Huss4.6k

Something really stupid was the cause of this all. Inside my Gff file, there was a gene of size 3,5MB... So therefore cufflinks takes it as one bundle. After all no cufflinks problem.

ADD REPLYlink written 5.0 years ago by sanderrr01010

So that would have been solved by increasing --max-bundle-length to 10 million I guess.

ADD REPLYlink written 5.0 years ago by Mikael Huss4.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 927 users visited in the last hour