Question: get error in kmer counting in python 3
0
gravatar for anasjamshed1994
12 weeks ago by
anasjamshed199460 wrote:

I want to write a program that

  1. calculates the number of all kmers of a given length across all DNA sequences
  2. displays just the ones that occur more than a given a number of times.

I have tried this script:

import os
import sys
import shutil

# convert command line arguments to variables
kmer_size = int(sys.argv[1])
print(kmer_size)
count_cutoff = int(sys.argv[2])

# define the function to split dna
def split_dna(dna, kmer_size):
    kmers = []
    for start in range(0,len(dna)-kmer_size-1,1):
        kmer = dna[start:start+kmer_size]
        kmers.append(kmer)
    return kmers

# create an empty dictionary to hold the counts
kmer_counts = {}

# process each file with the right name
for file_name in os.listdir("."):
    if file_name.endswith(".fastq"):
        dna_file = open(file_name)

        # process each DNA sequence in a file
        for line in dna_file:
            dna = line.rstrip("\n")

            # increase the count for each k-mer that we find
            for kmer in split_dna(dna, kmer_size):
                current_count = kmer_counts.get(kmer, 0)
                new_count = current_count + 1
                kmer_counts[kmer] = new_count

# print k-mers whose counts are above the cutoff
for kmer, count in kmer_counts.items():
    if count > count_cutoff:
        print(kmer + " : " + str(count))

But it gives an error:

ValueError                                Traceback (most recent call last)
<ipython-input-42-02b791e42fca> in <module>()
      4 
      5 # convert command line arguments to variables
----> 6 kmer_size = int(sys.argv[1])
      7 print(kmer_size)
      8 count_cutoff = int(sys.argv[2])

ValueError: invalid literal for int() with base 10: '-f'

I have been trying from last 3 months I don't know he can I execute it? I can't change the type of any variable

Kindly help me

dna kmer python fasta • 262 views
ADD COMMENTlink modified 12 weeks ago by Ram32k • written 12 weeks ago by anasjamshed199460

It looks like you're using ipython notebook - correct me if that's not the case.

How are you executing the script?

ADD REPLYlink written 12 weeks ago by Ram32k

yes I am using python notebook

ADD REPLYlink written 12 weeks ago by anasjamshed199460

Save the script as a .py file and try executing it from the command line. It looks like you're not passing in command line arguments properly through the ipython interface.

ADD REPLYlink written 11 weeks ago by Ram32k

how can I run through command line after creating .py file

ADD REPLYlink written 11 weeks ago by anasjamshed199460

Please google "run python file on command line"

ADD REPLYlink written 11 weeks ago by Ram32k

What arguments are you passing to this script? As the error notes, -f is not an integer (which would be your kmer size).

ADD REPLYlink written 11 weeks ago by Alex Reynolds31k

yes it's not integer I know but I also tried to do by taking string but it also gimme error

ADD REPLYlink written 11 weeks ago by anasjamshed199460

There are a number of threads on Biostars about enumerating kmers. You should be able to adapt one of those fairly trivially:

How to count and compare k-mer count vectors (and print the top ten highest contributions)?

Finding 16 mer not present in GRCh38

A: Randomly sample motifs from reference sequence

See also the linked gist

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Joe19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1092 users visited in the last hour
_