How to extract out specific information from files within a directory?
1
1
Entering edit mode
14 months ago

I have a around 50 files which are named in the format: ERR*.log (i.e. ERR23432.log, ERR12356.log, and so on...). From each file I want to extract a specific information (value).

Within each file, there are values at the end of the lines: final pair1 : Total reads after merging results from multiple database... and final pair2 : Total reads after merging results from multiple databases... You can see these lines in the 62nd and 63rd line of the Link to GoogleDrive log file file, also shown below:

09/06/2020 09:51:45 PM - kneaddata.utilities - INFO: READ COUNT: final pair1 : Total reads after merging results from multiple databases ( /folder/directory/Desktop/srr00823_ob/kneaddata_output/ERR260136_1_kneaddata_paired_1.fastq ): 12818370.0


Now, I want a script that will extract these values and add them to get a single value for each file. And then, it will give an output file with the extracted information where the first column will be the name of the file without the extension (i.e. ERR45666 in the attached example) and the second column with the added value.

Here is the head of my example log file:

09/06/2020 09:35:12 PM - kneaddata.knead_data - INFO: Running kneaddata v0.7.10
09/06/2020 09:35:12 PM - kneaddata.knead_data - DEBUG: Running with the following arguments:
verbose = False
input = /folder/directory/Desktop/srr00823_ob/ERR260136_1.fastq /folder/directory/Desktop/srr00823_ob/ERR260136_2.fastq
bypass_trim = True

python grep regex • 402 views
0
Entering edit mode
14 months ago
zx8754 10k

grep should work, something like:

grep 'final pair' ERR*.log

0
Entering edit mode

Thanks sir. But, to my knowledge it will just print the all lines containing the term "final pair".

0
Entering edit mode

0
Entering edit mode

OP wants ERR260136 12818370.0.