I have a around 50 files which are named in the format: ERR*.log (i.e. ERR23432.log, ERR12356.log, and so on...). From each file I want to extract a specific information (value).
Within each file, there are values at the end of the lines:
final pair1 : Total reads after merging results from multiple database... and
final pair2 : Total reads after merging results from multiple databases... You can see these lines in the 62nd and 63rd line of the Link to GoogleDrive log file file, also shown below:
09/06/2020 09:51:45 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /folder/directory/Desktop/srr00823_ob/kneaddata_output/ERR260136_1_kneaddata_paired_1.fastq ): 12818370.0 09/06/2020 09:51:52 PM - kneaddata.utilities - INFO: READ COUNT: final pair2 : Total reads after merging results from multiple databases ( /folder/directory/Desktop/srr00823_ob/kneaddata_output/ERR260136_1_kneaddata_paired_2.fastq ): 12818370.0
Now, I want a script that will extract these values and add them to get a single value for each file. And then, it will give an output file with the extracted information where the first column will be the name of the file without the extension (i.e. ERR45666 in the attached example) and the second column with the added value.
Can anyone please help me out?
Here is the
head of my example log file:
09/06/2020 09:35:12 PM - kneaddata.knead_data - INFO: Running kneaddata v0.7.10 09/06/2020 09:35:12 PM - kneaddata.knead_data - INFO: Output files will be written to: /folder/directory/Desktop/srr00823_ob/kneaddata_output 09/06/2020 09:35:12 PM - kneaddata.knead_data - DEBUG: Running with the following arguments: verbose = False input = /folder/directory/Desktop/srr00823_ob/ERR260136_1.fastq /folder/directory/Desktop/srr00823_ob/ERR260136_2.fastq output_dir = /folder/directory/Desktop/srr00823_ob/kneaddata_output reference_db = /home/deepchandaaws/kneaddata_db/hg37dec_v0.1 bypass_trim = True output_prefix = ERR260136_1_kneaddata threads = 8