Bash: updating file by duplicating lines
1
0
Entering edit mode
2.5 years ago

I have a text file that contains some metrics about sequencing data as output from FastQC programm. as in the image:

This data represents the quality of calling each base in a group of sequenced reads. The columns are: Base No. | Mean | Median | Lower Quartile | Upper Quartile | 10th Percentile | 90th Percentile

The problem is that after base No.9, each two bases are represented by one single line, which is not convenient for how I am going to manipulate this data.

Therefore, I need to update this file using bash command line to have each line representing 2 bases be split into 2 identical lines, only the number of the base is changed. Example: line before any edits:

16-17   36.65222632355253       39.0    36.0    40.0    30.0    41.0


After splitting:

16   36.65222632355253       39.0    36.0    40.0    30.0    41.0
17   36.65222632355253       39.0    36.0    40.0    30.0    41.0


and so on for all the lines representing 2 bases.

## I believe this will be by a for loop; however, I do not know how this could be written in bash.

Also, I am not sure how to deal with values in the first column that are written in the form of (number-number) (i.e. 16-17) it seems that I cannot use them in the regular comparisons using ( =, > and <)

bash linux command line for loops • 477 views
0
Entering edit mode

0
Entering edit mode

Use these directions: How to add images to a Biostars post

3
Entering edit mode
2.5 years ago
GenoMax 107k

You should run fastqc again with following option to get this data for each base.

 --nogroup       Disable grouping of bases for reads >50bp. All reports will
show data for every base in the read.


No manipulation needed for the files you have.