Parsing columns with awk
0
1
Entering edit mode
6 months ago
pablo ▴ 300

Hi,

I have this test.txt file :

gene 1:362273700-362275735
exon 1:362275166-362275246
exon 1:362274811-362275058
exon 1:362274230-362274685
gene 1:362279796-362287281
exon 1:362279796-362280179
exon 1:362280576-362280662
exon 1:362280858-362280958
exon 1:362281056-362281106

I need to get this output :

gene-1 1:362275166-362275246
gene-1 1:362274811-362275058
gene-1 1:362274230-362274685
gene-2 1:362279796-362280179
gene-2 1:362280576-362280662
gene-2 1:362280858-362280958
gene-2 1:362281056-362281106

-> Actually, I need to remove the "gene" lines, and replace each "exon" lines with "gene-X" (where X starts by 1).

I struggle with that.

 awk '$1~/exon/ {print $0 (/^exon/ ? "-" (++c) : "")}' test.txt

exon 1:362275166-362275246-1
exon 1:362274811-362275058-2
exon 1:362274230-362274685-3
exon 1:362279796-362280179-4
exon 1:362280576-362280662-5
exon 1:362280858-362280958-6
exon 1:362281056-362281106-7

awk '$1~/exon/ {$1=$1 "-" (++count[$1])}1' test.txt

gene 1:362273700-362275735
exon-1 1:362275166-362275246
exon-2 1:362274811-362275058
exon-3 1:362274230-362274685
gene 1:362279796-362287281
exon-4 1:362279796-362280179
exon-5 1:362280576-362280662
exon-6 1:362280858-362280958
exon-7 1:362281056-362281106
awk • 487 views
ADD COMMENT
2
Entering edit mode

Can you try it?

# Read the input file
with open("test.txt", "r") as input_file:
    lines = input_file.readlines()

gene_count = 0
output_lines = []

for line in lines:
    if line.startswith("gene"):
        gene_count += 1
    elif line.startswith("exon"):
        output_lines.append(f"gene-{gene_count} {line}")

# Write the modified lines to a new file
with open("output.txt", "w") as output_file:
    output_file.writelines(output_lines)
ADD REPLY
0
Entering edit mode

Any chance you can use easier to write and comprehend language like Python?

ADD REPLY

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6