Python script for ATGC count
1
0
Entering edit mode
4.9 years ago
3335098459 ▴ 30

Hi, I am new to Python so if this query seems to you a piece of cake. Please I apologize in advance.

I have a python script to count ATGC

def readGenome(filename):
genome = ''
with open(filename, 'r') as f:
    for line in f:
        # ignore header line with genome information
        if not line[0] == '>':
            genome += line.rstrip()
return genome 

genome = readGenome(filename)
genome[:100]
# Count the number of occurences of each base
counts = {'A': 0, 'C': 0, 'G': 0, 'T': 0}
for base in genome:
    counts[base] += 1
print(counts)

import collections
print(collections.Counter(genome))

I cannot figure out the problem in this code. As I run it on command prompt such as

<command prompt=""> Python count_ATGC.py gene.fa

It gives me the error that

Traceback (most recent call last):

File "count_ATGC.py", line 9, in <module>

genome = readGenome()

TypeError: readGenome() missing 1 required positional argument: 'filename'

Could somebody help me with this error?

Thanks

genome sequence software error • 2.5k views
ADD COMMENT
0
Entering edit mode

Yes, you are not giving the fasta file string directory to the function readGenome(). So,

genome = readGenome(filename = "/path/to-the-genome-fasta-file.fasta")

filename is an argument that takes, I guess, a string specifying the genome in fasta format (I guess), in your computer. So the error is related with that, you're not giving input to the positional argument filename.

António

ADD REPLY
0
Entering edit mode

3335098459 : If you expect your program to accept command line input/arguments from the command line you need to include code necessary to parse that input.

ADD REPLY
0
Entering edit mode
4.9 years ago

Hi,

Try the following script applied in the same way you did:

import sys

filename = sys.argv[1]

def readGenome(filename):
    genome = ''
    with open(filename, 'r') as f:
        for line in f:
        # ignore header line with genome information
            if not line[0] == '>':
                genome += line.rstrip()
    return genome

genome = readGenome(filename)
genome[:100]
# Count the number of occurences of each base
counts = {'A': 0, 'C': 0, 'G': 0, 'T': 0}
for base in genome:
    counts[base] += 1
print(counts)

import collections
print(collections.Counter(genome))

The filename = sys.argv[1] will import the second input argument after python, i.e., python count_ATGC.py gene.fa, gene.fa and assign it to filename variable.

I hope this helps,

António

ADD COMMENT

Login before adding your answer.

Traffic: 2124 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6