Question: python newbie - only else statement is printed
0
gravatar for Holly
2.5 years ago by
Holly0
Birmingham, UK
Holly0 wrote:

Complete beginner so I'm sorry if this is obvious!

I have a file which is name | +/- or IG_name | 0 in a long list.

S1      +
IG_S1   0
S2      -
IG_S3   0
S3      +
S4      -
dnaA    +
IG_dnaA 0

Everything which starts with IG_ has a corresponding name. I want to add the + or - to the IG_name.

The information is gene names and strand information, IG = intergenic region. Basically I want to know which strand the intergenic region is on.

what I want:

open file
if starts with IG_*
    find the line with *
    print("IG_" and the line it found)
else 
    print(line)

what I have:

with open(sys.argv[2]) as geneInfo:
    with open(sys.argv[1]) as origin:

            for line in origin:
                    if line.startswith("IG_"):
                            name = line.split("_")[1]
                            nname = name[:-3]
                            for newline in geneInfo:
                                    if re.match(nname, newline):
                                            print("IG_"+newline)
                    else:
                            print(line)

where origin is the mixed list and geneInfo has only the names not IG_names.

With this code I end up with a list containing only the else statements.

S1  +

S2  -

S3  +

S4  -

dnaA    +

My problem is that I don't know what is wrong to search!

python • 796 views
ADD COMMENTlink modified 2.5 years ago by Zaag720 • written 2.5 years ago by Holly0

What 2 files do you start with?

ADD REPLYlink written 2.5 years ago by Zaag720

"where origin is the mixed list and geneInfo has only the names not IG_names"

So origin is the first example, and geneInfo has everything except the ones which start with IG.

ADD REPLYlink written 2.5 years ago by Holly0

What others are saying is that you should show just a few lines of each input file, then show the exact command as you invoke it. These to ingredients are necessary to troubleshoot.

ADD REPLYlink written 2.5 years ago by Istvan Albert ♦♦ 80k

Sorry, I should have made my other file more obvious! second file looks like this:

S1  +
S2  -
S3  +
S4  -
dnaA    +
ADD REPLYlink written 2.5 years ago by Holly0
1
gravatar for Zaag
2.5 years ago by
Zaag720
Amsterdam
Zaag720 wrote:

you need to open the file twice and there is a double loop, so there should be a few better ways to do this.

NIG = []
with open('input.txt') as f:
      for line in f:
              line = line.strip()
              if not line.startswith('IG_'):
                      name, strand = line.split()
                      NIG.append([name, strand])

with open('ig_inout.txt') as f:
      for line in f:
              line = line.strip()

              if line.startswith('IG_'):
                      [print(line,  i[1]) for i in NIG if i[0] == line.split()[0].split('_')[1] ]
              else:
                      print(line)

but this gives me this output:

S1      +
IG_S1   0 +
S2      -
IG_S3   0 +
S3      +
S4      -
dnaA    +
IG_dnaA 0 +
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Zaag720

Excellent! Thank you!

Would you mind explaining the second loop?

[print(line,  i[1]) for i in NIG if i[0] == line.split()[0].split('_')[1] ]

this is confusing me!

ADD REPLYlink written 2.5 years ago by Holly0
1

You can write it like this:

for i in NIG:
    if i[0] == line.split()[0].split('_')[1]:
        print(line,  i[1])
For each entry without IG:

    if NAME is the same as the second part of IG_NAME

        print the line of the file and the + or the -
  
ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Zaag720

Hi Zaag, why not use a dictionary? I reused most of your code: (untested!)

stranddict = {}
with open('input.txt') as f:
      for line in f:
              if not line.startswith('IG_'):
                      name, strand = line.strip().split('\t')
                      NIG[name] = strand

with open('ig_inout.txt') as f:
      for line in f:
              if line.startswith('IG_'):
                      print(line.split('\t')[0] + "\t" + stranddict[line.split('\t')[0].replace("IG_", "")]
              else:
                      print(line)
ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by WouterDeCoster38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 592 users visited in the last hour