Bash: updating file by duplicating lines
Entering edit mode
2.5 years ago

I have a text file that contains some metrics about sequencing data as output from FastQC programm. as in the image: FastQC output, per base sequence quality

This data represents the quality of calling each base in a group of sequenced reads. The columns are: Base No. | Mean | Median | Lower Quartile | Upper Quartile | 10th Percentile | 90th Percentile

The problem is that after base No.9, each two bases are represented by one single line, which is not convenient for how I am going to manipulate this data.

Therefore, I need to update this file using bash command line to have each line representing 2 bases be split into 2 identical lines, only the number of the base is changed. Example: line before any edits:

16-17   36.65222632355253       39.0    36.0    40.0    30.0    41.0

After splitting:

16   36.65222632355253       39.0    36.0    40.0    30.0    41.0
17   36.65222632355253       39.0    36.0    40.0    30.0    41.0

and so on for all the lines representing 2 bases.

I believe this will be by a for loop; however, I do not know how this could be written in bash.

Also, I am not sure how to deal with values in the first column that are written in the form of (number-number) (i.e. 16-17) it seems that I cannot use them in the regular comparisons using ( =, > and <)

Thank you in advance.

bash linux command line for loops • 477 views
Entering edit mode

edit: this is how the data in the file looks like

Entering edit mode

Use these directions: How to add images to a Biostars post

Entering edit mode
2.5 years ago
GenoMax 107k

You should run fastqc again with following option to get this data for each base.

 --nogroup       Disable grouping of bases for reads >50bp. All reports will
                    show data for every base in the read.

No manipulation needed for the files you have.


Login before adding your answer.

Traffic: 2702 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6