Like other NGS data analysts, as a new user i am also utilizing java applications designed for NGS data analysis. However, i saw various Java jar's optimizing parameters in command lines while processing large data sets. For example, recently i have come across two example commands, i.e.:
java -Dsamjdk.buffer_size=131072 -Dsamjdk.compression_level=1 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx128m -jar /path/picard.jar SamToFastq ...
java -Dsamjdk.buffer_size=131072 -Dsamjdk.use_async_io=true -Dsamjdk.compression_level=1 -XX:+UseStringCache -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx5000m -jar /path/picard.jar MergeBamAlignment ... .
Parameters: -Dsamjdk.buffer_size -XX:GCTimeLimit -XX:GCHeapFreeLimit -Xmx128m -XX:+UseStringCache -Dsamjdk.use_async_io=true
i try to learn about them from internet but not clear about how to set these parameters (bold text in above mentioned command-line) for large data sets while piping various tools together. As i have biological background, could anyone please explain a bit in detail how to use them with one-line comprehensive definition and purpose.
Thank you very much!