Question: (Closed) IndexError: list index out of range
0
gravatar for flogin
18 months ago by
flogin250
Brazil
flogin250 wrote:

Hello guys, I have that file:

Input Data File: GSTE1_EXON.AB.02.fas
   Variable (polymorphic) sites: 0   (Total number of mutations: 0)
Input Data File: GSTE1_EXON.AB.02.fas
 Total number of mutations, Eta: 0
 Theta (per site) from S, Theta-W: 0,0000000000
    Variance of theta (no recombination): 0,0000000
    Variance of theta (free recombination): 0,0000000
 Average number of nucleotide differences, k: 0,000
 Theta (per sequence) from S, Theta-W: 0,000
    Variance of theta (no recombination): 0,000
    Variance of theta (free recombination): 0,000
 Input Data File: GSTE1_EXON.AB.02.fas
 Number of pairwise comparisons: 0
 Number of significant pairwise comparisons by Fisher's exact test: 0
    Number of significant comparisons using the Bonferroni procedure: 0
 Number of significant pairwise comparisons by chi-square test: 0
    Number of significant comparisons using the Bonferroni procedure: 0
Fu and Li's test
 Input Data File: GSTE1_EXON.AB.02.fas
 Total number of mutations, Eta: 0
 Fu and Li's D* test statistic: 0,00000
     Statistical significance: Not significant, P > 0.10
 Fu and Li's F* test statistic: 0,00000
     Statistical significance: Not significant, P > 0.10
 Input Data File: GSTE1_EXON.AB.02.fas
 Total number of mutations, Eta: 0
 Input Data File: GSTE1_EXON.AR.01.fas
   Variable (polymorphic) sites: 0   (Total number of mutations: 0)
 Input Data File: GSTE1_EXON.AR.01.fas
 Total number of mutations, Eta: 0
 Theta (per site) from S, Theta-W: 0,0000000000
    Variance of theta (no recombination): 0,0000000
    Variance of theta (free recombination): 0,0000000
 Average number of nucleotide differences, k: 0,000
 Theta (per sequence) from S, Theta-W: 0,000
    Variance of theta (no recombination): 0,000
    Variance of theta (free recombination): 0,000
 Input Data File: GSTE1_EXON.AR.01.fas
 Number of pairwise comparisons: 0
 Number of significant pairwise comparisons by Fisher's exact test: 0
    Number of significant comparisons using the Bonferroni procedure: 0
 Number of significant pairwise comparisons by chi-square test: 0
    Number of significant comparisons using the Bonferroni procedure: 0
Fu and Li's
 Input Data File: GSTE1_EXON.AR.01.fas
 Total number of mutations, Eta: 0
 Fu and Li's D* test statistic: 0,00000
     Statistical significance: Not significant, P > 0.10
 Fu and Li's F* test statistic: 0,00000
     Statistical significance: Not significant, P > 0.10
 Input Data File: GSTE1_EXON.AR.01.fas
 Total number of mutations, Eta: 0
 Input Data File: GSTE1_EXON.CA.02.fas
   Variable (polymorphic) sites: 0   (Total number of mutations: 0)
 Input Data File: GSTE1_EXON.CA.02.fas
 Total number of mutations, Eta: 0
 Theta (per site) from S, Theta-W: 0,0000000000
    Variance of theta (no recombination): 0,0000000
    Variance of theta (free recombination): 0,0000000
 Average number of nucleotide differences, k: 0,000
 Theta (per sequence) from S, Theta-W: 0,000
    Variance of theta (no recombination): 0,000
    Variance of theta (free recombination): 0,000
 Input Data File: GSTE1_EXON.CA.02.fas
 Number of pairwise comparisons: 0
 Number of significant pairwise comparisons by Fisher's exact test: 0
    Number of significant comparisons using the Bonferroni procedure: 0
 Number of significant pairwise comparisons by chi-square test: 0
    Number of significant comparisons using the Bonferroni procedure: 0
 Input Data File: GSTE1_EXON.CA.02.fas
 Total number of mutations, Eta: 0
 Fu and Li's D* test statistic: 0,00000
     Statistical significance: Not significant, P > 0.10
 Fu and Li's F* test statistic: 0,00000
     Statistical significance: Not significant, P > 0.10
 Input Data File: GSTE1_EXON.CA.02.fas
 Total number of mutations, Eta: 0

And I create this script:

# -*- coding: utf-8 -*-
import pandas as pd # to convert the list of dictionaries to a data frame
file = open("GSTE1.txt.add","r")
# creating a list of terms that I want
keys_order = ["Input Data File","Average number of nucleotide differences, k","Total number of mutations, Eta","Theta (per sequence) from S, Theta-W","Theta (per site) from S, Theta-W","Variance of theta (no recombination)","Variance of theta (free recombination)","Theta (per sequence) from S, Theta-W","Fu and Li's D* test statistic, FLD*","Fu and Li's F* test statistic, FLF*","Fu and Li's D* test statistic","Fu and Li's F* test statistic","Number of pairwise comparisons","Number of significant pairwise comparisons by Fisher's exact test","Number of significant pairwise comparisons by chi-square test","Number of significant comparisons using the Bonferroni procedure"]
dic = {} #creating a dictionary to use in first list
dictio = {} #creating a dicitonary to use in second list
big_list = [] # creating a list to put dicionaries in a first search
list_dictio = [] # creating a list to put dictionaries in a second search
aux = "" 

for line in file:
    if line.strip().split(":")[0] == "Input Data File": 
        atrib = line.strip().split(":")[1]
    if atrib == aux:
        dic[line.strip().split(":")[0]] = line.strip().split(":")[1].strip()
    else: 
        big_list.append(dic)
        aux = atrib
        dic = {} 

for fasta in big_list: 
    for i in keys_order:
        if i in fasta:
            dictio[i]=fasta[i]
        else:
            dictio[i] ='-'
    list_dictio.append(dictio) 
    dictio = dictio.fromkeys(dictio,0) 
table = pd.DataFrame.from_records(list_dictio) 
export_csv = table.to_csv(r'/home/user/Dropbox/jupyter/output.csv', index = None, header=True)

The output:

Traceback (most recent call last):
  File "DNAsp2.py", line 16, in <module>
    dic[line.strip().split(":")[0]] = line.strip().split(":")[1].strip()
IndexError: list index out of range

What doesn't make sense for me is that I run this script for other files and everything runs fine.

Can anyone help me?

dictionary python • 494 views
ADD COMMENTlink written 18 months ago by flogin250
2

Just debug....

    if atrib == aux:
        print(line)
        dic[line.strip().split(":")[0]] = line.strip().split(":")[1].strip()

You print the line and you check how many ":" characters you see

ADD REPLYlink written 18 months ago by gb1.9k

Many lines have more than one ":", the problem is: I run that script for eight other files with the same structure, and everything runs ok.

ADD REPLYlink written 18 months ago by flogin250
1

yes, so that means that in one file the structure is slightly different. To know where that difference is you need to print the line before the error. The error is caused by this line:

dic[line.strip().split(":")[0]] = line.strip().split(":")[1].strip()

And this is because the index is out of range, the index is determined by how many ":" characters are present. So you add print(line) here:

    if atrib == aux:
        print(line)
        dic[line.strip().split(":")[0]] = line.strip().split(":")[1].strip()

If you run the script you will see a lot of output, these are the lines of the input file. When you get the error you need to check the last printed line. You can also compare it with the second last printed line

ADD REPLYlink written 18 months ago by gb1.9k

The output is:

Variable (polymorphic) sites: 0   (Total number of mutations: 0)

Input Data File: GSTE1_EXON.AB.02.fas

 Total number of mutations, Eta: 0

 Theta (per site) from S, Theta-W: 0,0000000000

    Variance of theta (no recombination): 0,0000000

    Variance of theta (free recombination): 0,0000000

 Average number of nucleotide differences, k: 0,000

 Theta (per sequence) from S, Theta-W: 0,000

    Variance of theta (no recombination): 0,000

    Variance of theta (free recombination): 0,000

 Input Data File: GSTE1_EXON.AB.02.fas

 Number of pairwise comparisons: 0

 Number of significant pairwise comparisons by Fisher's exact test: 0

    Number of significant comparisons using the Bonferroni procedure: 0

 Number of significant pairwise comparisons by chi-square test: 0

    Number of significant comparisons using the Bonferroni procedure: 0

Fu and Li's test

Traceback (most recent call last):
  File "DNAsp2.py", line 17, in <module>
    dic[line.strip().split(":")[0]] = line.strip().split(":")[1].strip()
IndexError: list index out of range
ADD REPLYlink written 18 months ago by flogin250

Hello flogin!

We believe that this post does not fit the main topic of this site.

gb help and the script runs well

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 18 months ago by flogin250
1
gravatar for gb
18 months ago by
gb1.9k
gb1.9k wrote:

You see the last line? "Fu and Li's test" This line does not contain the ":" character. Open the file and look for this line. You can either manually remove it or add an if else or try except to your code.

ADD COMMENTlink written 18 months ago by gb1.9k

Thanks gb, after I removed those patterns the script runs well.

Thanks for the support.

ADD REPLYlink written 18 months ago by flogin250
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2149 users visited in the last hour