DiscoSNP++ 2.2.0 Huge number of temporary files
1
0
Entering edit mode
8.6 years ago
tkitapci ▴ 60

Hi,

I noticed the program created a lot of temporary files named trashme_* at one point I checked and there were 8000+ files that were open. Having this many files open seemed a little strange to me. Is this something intentional? Can there be a bug somewhere in the code (like forgetting to close files ?). If this is the intention, I am curios to understand the use of that many files.

Thanks

Best Regards

T. Hamdi Kitapci

discosnp • 1.7k views
ADD COMMENT
0
Entering edit mode
8.6 years ago
Rayan Chikhi ★ 1.5k

The trashme_* files are temporary partitions for the first step of DiscoSNP (k-mer counting). It is normal to have many of them (around a thousand), however 8000+ seems excessive. What were your command line parameters and the input data size?

ADD COMMENT
0
Entering edit mode

Hi,

The problem is not the number of files but the number of effective "open" files. I did not look at the source code but it seems odd to me that all 8000+ files are "open". I am wondering if keeping all those files in "open" state is the intention of the program or if the programmer forgot to close some of the files after the processing is completed. This is the output I get from my run:

https://docs.google.com/document/d/1jpooJySV1rKTQEronzGyprgdSISPwdhcbBJSDOflR_4/edit?usp=sharing

total input file size is 28 GB (2 fastq files each 14GB)

Thanks

Best Regards
T. Hamdi Kitapci

ADD REPLY
0
Entering edit mode

Thanks. This is odd, it says here on the log that there are 880 partitions. Each partition indeed corresponds to a file that will be opened at the same at the others partition files. So there should be only 880 opened files (but I'm not fully familiar with the other graph creation steps, "debloom").

Can you let me know the complete prefix of those trashme* files? Should be trashme_[pid]_[informative prefix I'd like to know about].parts.[number]

ADD REPLY

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6