Python search for header and specific letters in lines print both
1
0
Entering edit mode
2.2 years ago

Search for "Cluster" and specific letters in lines st104, pK in (st104H_20170,pKH911_25081).

If the lines below the header have both the initials st104,pK print header and the lines.

input.txt\

->Cluster 1

0 673aa -st104P_06575

1 673aa -st104H_22488

3 673aa -pKH911_09284

4 673aa -pKP911_09288

->Cluster 2

0 690aa -st104H_20170

1 690aa -KH911_25081

2 687aa -NE95031.1

3 685aa -TIG_004920

->Cluster 3

0 685aa -st104H_27649

1 690aa -st104P_11877

2 685aa -pKP911_15300

->Cluster 4

0 685aa -st104H_27649

1 690aa -st104P_11877

output

->Cluster 1

0 673aa -st104P_06575

1 673aa -st104H_22488

3 673aa -pKH911_09284

4 673aa -pKP911_09288

->Cluster 3

0 685aa -st104H_27649

1 690aa -st104P_11877

2 685aa -pKP911_15300

Tried:

def cluster_filter(cluster_content):
if 'st104' in cluster_content and 'pK' in cluster_content:
    print(cluster_content)  

with open("input.txt") as fh:
       result = ""
       luster_content = ""
       for line in fh:
            if line.startswith("Cluster"):
                  cluster_filter(cluster_content)
                  cluster_content = line
            else:
                  cluster_content += line
      cluster_filter(cluster_content)

  print(result)
Python • 779 views
ADD COMMENT
0
Entering edit mode

You've got an indent issue on line two and shouldn't around line 7 luster_content = "" begin with a c?

If the first issue I asked about is due to pasting here, you should learn how to use triple ticks to post code blocks here to retain the formatting. Read under 'Block code formatting' here because that same style works here. The loss of the formatting makes it hard for people to help you if you've got code provided (thankfully!) but it gets all jumbled when you try to share it.

Also, what you show labeled as 'output' would be better labeled as Expected output or Desired output? Always a good idea when sharing code is to provide the current output you see from your code. Or at least mention what happens in your hands.

ADD REPLY
0
Entering edit mode
2.2 years ago
Wayne ★ 2.0k

Per my comment, your posted code and details were confusing. If I understood correctly, this code accomplishes your goal:

def cluster_filter(cluster_content):
    if 'st104' in cluster_content and 'pK' in cluster_content:
        print(cluster_content)  

with open("input.txt") as fh:
    result = ""
    cluster_content = ""
    for line in fh:
        if line.startswith("->Cluster"):
            cluster_filter(cluster_content)
            cluster_content = line
        else:
            cluster_content += line
    cluster_filter(cluster_content)

This yields the following with the provided input:

->Cluster 1

0 673aa -st104P_06575

1 673aa -st104H_22488

3 673aa -pKH911_09284

4 673aa -pKP911_09288


->Cluster 3

0 685aa -st104H_27649

1 690aa -st104P_11877

2 685aa -pKP911_15300

Which I think is what you wanted because cluster 2 and cluster 4 don't contain pK.

The main issue I noted was you were using line.startswith("Cluster") when that isn't the string that is actually at the start of the header line. Alternatively, you could have used "Cluster" in line in place of that condition if you didn't want to include the full string prefixed to 'Cluster' at the start of the header line.

ADD COMMENT

Login before adding your answer.

Traffic: 2748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6