Formatting problem gene table conversion
2
0
Entering edit mode
6.3 years ago
lessismore ★ 1.3k

Hey all, a question about a problem in formatting. I have this table, i would like to filter it based on the genes that have matches (if they have, the matches are present in the lines below them until the next gene).

>gene1
match1
match2
match3
>gene2
>gene3
match1
match2
match3
>gene4
>gene5
>gene6

Desired output:

>gene1 match1
>gene1 match2
>gene1 match3
>gene3 match1
>gene3 match2
>gene3 match3
awk bash python • 1.1k views
ADD COMMENT
0
Entering edit mode

lessismore : Please accept (green check mark) answers for this and your past questions (you can select multiple answers) to validate them.

ADD REPLY
2
Entering edit mode
6.3 years ago
awk '/^>/{G=$0;next;}{printf("%s\t%s\n",G,$0);}'  input.txt
ADD COMMENT
1
Entering edit mode
6.3 years ago
ReWeeda ▴ 120

Alternatively, if you are not familiar with awk, you can use a simple script like the following one written in python:

with open('file_name_or_path_here','r') as file_:
for line in file_:
     line=line.rstrip()
     if line[0]=='>':
          gene = line
     else:
          print ('\t'.join([gene,line])+'\n')
ADD COMMENT

Login before adding your answer.

Traffic: 1697 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6