Question: Cufflinks Hangs On
2
gravatar for andrew.j.skelton73
5.0 years ago by
London
andrew.j.skelton735.6k wrote:

When running Cufflinks, the run hangs at the same position each time. This is my command and I have no explanation for this, if anyone has any suggestions please let me know!

cufflinks -p 20 -v -M /opt/databases/genomes/Ensembl_2/Ensembl_Mask.gtf --max-bundle-frags 10000000000 --GTF-guide /opt/databases/genomes/Ensembl_2/genome.gtf -o ./Cufflinks ./Tophat/accepted_hits.bam

EDIT: By position, I mean chromosomal position during the bundle inspection process of the Cufflinks pipeline.

EDIT2: I reduced the --max-bundle-frags switch to 1000000 and after 19 hours, Cufflinks still hangs on the bundle inspection process at the same chromosomal position. This data is big (174 Million reads), but I thought the bundle inspection process was meant to be pretty quick, can anyone correct me on this? It seems that the bundle inspection does not use the --max-bundle-frags switch.

EDIT3: After checking the log more thoroughly, I came across this entry:

Inspecting bundle 14:50035707-50105467 with 6191576 reads

Which clearly demonstrates that the bundle inspection process of cufflinks is not using --max-bundle-frags. I've not seen anything that would suggest this is by design.

UPDATE: After speaking with a Cufflinks dev - The CURRENT (Cufflinks 2.2.0) behaviour is that enabling --GTF-guide WILL OVERRIDE the --max-bundle-frags parameter. This means that if you want to do novel discovery, there is no option to set a skip parameter for large bundles. I've been told that it's on the to-do list for implementation.

cufflinks • 4.7k views
ADD COMMENTlink modified 4.6 years ago by chris.smowton10 • written 5.0 years ago by andrew.j.skelton735.6k
2

For some reason, the title of this post made me think of a bioinformatics-themed children's book.

ADD REPLYlink written 5.0 years ago by Dan D6.7k
1

The Little Tophat That Could

ADD REPLYlink written 5.0 years ago by Istvan Albert ♦♦ 80k

The typo that lead to a bioinformatics children's book? I'll take it.

ADD REPLYlink written 5.0 years ago by andrew.j.skelton735.6k

what does it mean "same position", what position is that?

ADD REPLYlink written 5.0 years ago by Istvan Albert ♦♦ 80k
3
gravatar for Daniel Swan
5.0 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

Have you tried running with a lower max-bundle-frags? The value seems awfully high. I'm assuming you're fiddling with this to prevent it skipping very deeply sequenced loci, but it might be causing a bottleneck. It would help though to know what point you think the process is stalling at. Can you post that in a comment or amend the question?

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Daniel Swan13k
3

This is a good point. I definitely recommend increasing max-bundle-frags above the default 500K value. However, setting it to to 10B is very excessive. We typically have RNAseq data with 1 lane of hiseq 2000 or 2500 with ~400M reads. In tests, with the default setting we see up to ~100 genes being marked as HIDATA without FPKM values calculated, some of which are genes of clear interest to your studies. Almost all of these are recovered by increasing max-bundle-frags to 10M with minimal increase in memory/runtime. Bumping up to 50M eliminates all HIDATA cases in our tests but also significantly increased runtime for reference-guided or denovo modes. In any case, there should be no reason to set it to 10B as in the case above.

ADD REPLYlink written 5.0 years ago by Obi Griffith17k
3

UPDATE: After speaking with a Cufflinks dev - The CURRENT (Cufflinks 2.2.0) behaviour is that enabling --GTF-guide WILL OVERRIDE the --max-bundle-frags parameter. This means that if you want to do novel discovery, there is no option to set a skip parameter for large bundles. I've been told that it's on the to-do list for implementation.

ADD REPLYlink written 5.0 years ago by andrew.j.skelton735.6k

Very useful information - good to know.

ADD REPLYlink written 5.0 years ago by Daniel Swan13k
1

Sorry for the late reply....

By position, I mean chromosomal position when Cufflinks inspects bundles. It is set to verbose, so I could see where the bottleneck was coming from. It hangs at....

Inspecting bundle 14:50232514-50232850 with 2 reads

So the next loci along I suspect has a really really deep region of reads. When I extract the reads slightly downstream of the above chromosomal position, it does have a very deep set of reads. Bundle inspection should be a fairly quick process in the Cufflinks pipeline.

I think the suggestion that max-bundle-frags needs to be reduced, is a good approach, so I'll give that a go and report back. I was just trying to avoid HI-DATA!

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by andrew.j.skelton735.6k
1

UPDATE: I reduced the --max-bundle-frags switch to 1000000 and after 19 hours, Cufflinks still hangs on the bundle inspection process at the same chromosomal position. This data is big (174 Million reads), but I thought the bundle inspection process was meant to be pretty quick, can anyone correct me on this? It seems that the bundle inspection does not use the --max-bundle-frags switch.

ADD REPLYlink written 5.0 years ago by andrew.j.skelton735.6k
2
gravatar for Obi Griffith
5.0 years ago by
Obi Griffith17k
Washington University, St Louis, USA
Obi Griffith17k wrote:

I also am not sure what you mean by "hangs at the same position each time". But, if you mean that you run the above command and it silently just hangs there without proceeding to even the first step of a cufflinks run then it sounds a lot like a problem I recently had. For unknown reasons I was having issues connecting to the cufflinks server. When I ran the same command with "--no-update-check" flag specified the problem went away. Does that help?

ADD COMMENTlink modified 4.6 years ago • written 5.0 years ago by Obi Griffith17k

Thanks for your suggestion, but Cufflinks starts fine, it's on the bundle inspection process of Cufflinks that I seem to be getting the bottleneck. I've updated the post to give some more context.

ADD REPLYlink written 5.0 years ago by andrew.j.skelton735.6k
2
gravatar for jjenny
5.0 years ago by
jjenny20
jjenny20 wrote:

I'm having the exact same problem and would love to know if you find a fix-around. I made a mask file with the suspect region which worked for that region, but then cufflinks just got hung on a different region, so clearly this is not a very sustainable solution.

ADD COMMENTlink written 5.0 years ago by jjenny20
2

Absolutely. I've talked (albeit very briefly) with one of the Cufflinks Devs. His suggestion was to use the -G flag as an alternative to --GTF-guide. Using -G over --GTF-guide means that you can't discover any novel isoforms or genes, however it works in conjunction with the --max-bundle-frags flag.

This is my understanding of the problem (Cufflinks 2.2.0) - - --GTF-guide Allows for the discovery of Novel Isoforms and genes but can't be used with --max-bundle-frags to streamline for extreme depth. - -G allows you to run against the reference GTF but without novel discovery. This works with --max-bundle-frags.

The mask file is good for a case by case loci but it's just not practical for the whole run.

I'm still waiting to hear back from the dev team if this a bug, or by design. I'll update the post if I hear anything.

ADD REPLYlink written 5.0 years ago by andrew.j.skelton735.6k

Hi Andrew

did you ever hear anything back from the Cufflinks developers regarding this problem?

thanks

Simon

ADD REPLYlink written 5.0 years ago by simon rayner0
1

I updated the post at the top...

UPDATE: After speaking with a Cufflinks dev - The CURRENT (Cufflinks 2.2.0) behaviour is that enabling --GTF-guide WILL OVERRIDE the --max-bundle-frags parameter. This means that if you want to do novel discovery, there is no option to set a skip parameter for large bundles. I've been told that it's on the to-do list for implementation.

ADD REPLYlink written 5.0 years ago by andrew.j.skelton735.6k
1
gravatar for chris.smowton
4.6 years ago by
United Kingdom
chris.smowton10 wrote:

Hi,

You might want to try the patch (and patched binaries) described here: https://groups.google.com/forum/#!topic/tuxedo-tools-users/UzLCJhj3lUE

Please let me know if this resolves the situation!

Chris

ADD COMMENTlink written 4.6 years ago by chris.smowton10

I had a look at the thread Chris, interesting that you found the bottleneck, great job! 

So from your post, I understand the problem was in the data structure of the open mates (a map of lists)? Your solution was to replace this data structure with an unordered multimap from boost, indexed with a hash of qname? 

Can you explain why this caused the bottleneck in a little bit more detail? (for my own curiosity and the benefit of the original post!)

Have you tested this and seen a noticeable improvement?

Thanks for your work on this! 

 

ADD REPLYlink written 4.6 years ago by andrew.j.skelton735.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1430 users visited in the last hour