Cufflinks Hangs On
4
2
Entering edit mode
10.5 years ago

When running Cufflinks, the run hangs at the same position each time. This is my command and I have no explanation for this, if anyone has any suggestions please let me know!

cufflinks -p 20 -v -M /opt/databases/genomes/Ensembl_2/Ensembl_Mask.gtf --max-bundle-frags 10000000000 --GTF-guide /opt/databases/genomes/Ensembl_2/genome.gtf -o ./Cufflinks ./Tophat/accepted_hits.bam

EDIT: By position, I mean chromosomal position during the bundle inspection process of the Cufflinks pipeline.

EDIT2: I reduced the --max-bundle-frags switch to 1000000 and after 19 hours, Cufflinks still hangs on the bundle inspection process at the same chromosomal position. This data is big (174 Million reads), but I thought the bundle inspection process was meant to be pretty quick, can anyone correct me on this? It seems that the bundle inspection does not use the --max-bundle-frags switch.

EDIT3: After checking the log more thoroughly, I came across this entry:

Inspecting bundle 14:50035707-50105467 with 6191576 reads

Which clearly demonstrates that the bundle inspection process of cufflinks is not using --max-bundle-frags. I've not seen anything that would suggest this is by design.

UPDATE: After speaking with a Cufflinks dev - The CURRENT (Cufflinks 2.2.0) behaviour is that enabling --GTF-guide WILL OVERRIDE the --max-bundle-frags parameter. This means that if you want to do novel discovery, there is no option to set a skip parameter for large bundles. I've been told that it's on the to-do list for implementation.

cufflinks • 7.0k views
ADD COMMENT
2
Entering edit mode

For some reason, the title of this post made me think of a bioinformatics-themed children's book.

ADD REPLY
1
Entering edit mode

The Little Tophat That Could

ADD REPLY
0
Entering edit mode

The typo that lead to a bioinformatics children's book? I'll take it.

ADD REPLY
0
Entering edit mode

what does it mean "same position", what position is that?

ADD REPLY
3
Entering edit mode
10.5 years ago
User 59 13k

Have you tried running with a lower max-bundle-frags? The value seems awfully high. I'm assuming you're fiddling with this to prevent it skipping very deeply sequenced loci, but it might be causing a bottleneck. It would help though to know what point you think the process is stalling at. Can you post that in a comment or amend the question?

ADD COMMENT
3
Entering edit mode

This is a good point. I definitely recommend increasing max-bundle-frags above the default 500K value. However, setting it to to 10B is very excessive. We typically have RNAseq data with 1 lane of hiseq 2000 or 2500 with ~400M reads. In tests, with the default setting we see up to ~100 genes being marked as HIDATA without FPKM values calculated, some of which are genes of clear interest to your studies. Almost all of these are recovered by increasing max-bundle-frags to 10M with minimal increase in memory/runtime. Bumping up to 50M eliminates all HIDATA cases in our tests but also significantly increased runtime for reference-guided or denovo modes. In any case, there should be no reason to set it to 10B as in the case above.

ADD REPLY
3
Entering edit mode

UPDATE: After speaking with a Cufflinks dev - The CURRENT (Cufflinks 2.2.0) behaviour is that enabling --GTF-guide WILL OVERRIDE the --max-bundle-frags parameter. This means that if you want to do novel discovery, there is no option to set a skip parameter for large bundles. I've been told that it's on the to-do list for implementation.

ADD REPLY
0
Entering edit mode

Very useful information - good to know.

ADD REPLY
1
Entering edit mode

Sorry for the late reply....

By position, I mean chromosomal position when Cufflinks inspects bundles. It is set to verbose, so I could see where the bottleneck was coming from. It hangs at....

Inspecting bundle 14:50232514-50232850 with 2 reads

So the next loci along I suspect has a really really deep region of reads. When I extract the reads slightly downstream of the above chromosomal position, it does have a very deep set of reads. Bundle inspection should be a fairly quick process in the Cufflinks pipeline.

I think the suggestion that max-bundle-frags needs to be reduced, is a good approach, so I'll give that a go and report back. I was just trying to avoid HI-DATA!

ADD REPLY
1
Entering edit mode

UPDATE: I reduced the --max-bundle-frags switch to 1000000 and after 19 hours, Cufflinks still hangs on the bundle inspection process at the same chromosomal position. This data is big (174 Million reads), but I thought the bundle inspection process was meant to be pretty quick, can anyone correct me on this? It seems that the bundle inspection does not use the --max-bundle-frags switch.

ADD REPLY
2
Entering edit mode
10.5 years ago

I also am not sure what you mean by "hangs at the same position each time". But, if you mean that you run the above command and it silently just hangs there without proceeding to even the first step of a cufflinks run then it sounds a lot like a problem I recently had. For unknown reasons I was having issues connecting to the cufflinks server. When I ran the same command with "--no-update-check" flag specified the problem went away. Does that help?

ADD COMMENT
0
Entering edit mode

Thanks for your suggestion, but Cufflinks starts fine, it's on the bundle inspection process of Cufflinks that I seem to be getting the bottleneck. I've updated the post to give some more context.

ADD REPLY
2
Entering edit mode
10.5 years ago
jjenny ▴ 20

I'm having the exact same problem and would love to know if you find a fix-around. I made a mask file with the suspect region which worked for that region, but then cufflinks just got hung on a different region, so clearly this is not a very sustainable solution.

ADD COMMENT
2
Entering edit mode

Absolutely. I've talked (albeit very briefly) with one of the Cufflinks Devs. His suggestion was to use the -G flag as an alternative to --GTF-guide. Using -G over --GTF-guide means that you can't discover any novel isoforms or genes, however it works in conjunction with the --max-bundle-frags flag.

This is my understanding of the problem (Cufflinks 2.2.0) - - --GTF-guide Allows for the discovery of Novel Isoforms and genes but can't be used with --max-bundle-frags to streamline for extreme depth. - -G allows you to run against the reference GTF but without novel discovery. This works with --max-bundle-frags.

The mask file is good for a case by case loci but it's just not practical for the whole run.

I'm still waiting to hear back from the dev team if this a bug, or by design. I'll update the post if I hear anything.

ADD REPLY
0
Entering edit mode

Hi Andrew

did you ever hear anything back from the Cufflinks developers regarding this problem?

thanks

Simon

ADD REPLY
1
Entering edit mode

I updated the post at the top...

UPDATE: After speaking with a Cufflinks dev - The CURRENT (Cufflinks 2.2.0) behaviour is that enabling --GTF-guide WILL OVERRIDE the --max-bundle-frags parameter. This means that if you want to do novel discovery, there is no option to set a skip parameter for large bundles. I've been told that it's on the to-do list for implementation.

ADD REPLY
1
Entering edit mode
10.1 years ago

Hi,

You might want to try the patch (and patched binaries) described here.

Please let me know if this resolves the situation!

Chris

ADD COMMENT
0
Entering edit mode

I had a look at the thread Chris, interesting that you found the bottleneck, great job!

So from your post, I understand the problem was in the data structure of the open mates (a map of lists)? Your solution was to replace this data structure with an unordered multimap from boost, indexed with a hash of qname?

Can you explain why this caused the bottleneck in a little bit more detail? (for my own curiosity and the benefit of the original post!)

Have you tested this and seen a noticeable improvement?

Thanks for your work on this!

ADD REPLY

Login before adding your answer.

Traffic: 807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6