Finding how many times a nucleotide appear in the same position
1
0
Entering edit mode
27 days ago
ran • 0

Hello, I'm new to the world of python and im trying to solve a question which I am given a few dna sequences, for example: sequences = ["GAGGTAAACTCTG", "TCCGTAAGTTTTC", "CAGGTTGGAACTC", "ACAGTCAGTTCAC", "TAGGTCATTACAG", "TAGGTACTGATGC"]

I want to know how many times the nucleotide "A" is in the first position [0] of all of those sequences (the answer should be 1 in that case). Im trying to use for loop but don't really know how to move forward. Ill appreciate any help, Thank you!

Beginer Nucleotide Python DNA • 300 views
1
Entering edit mode

Hi, It is not a norm but it is a good practice to post an attempted resolution first and then other members try to correct it or suggest another answer. You said you are a python beginner so don't worry about judgments you can post any attempt no matter how bad it went. ;)

1
Entering edit mode
0
Entering edit mode
#!/usr/bin/env python3

sequences = ["GAGGTAAACTCTG", "TCCGTAAGTTTTC", "CAGGTTGGAACTC", "ACAGTCAGTTCAC", "TAGGTCATTACAG", "TAGGTACTGATGC"]

def count_first_base(sequences=sequences, base="A"):
count = [int(i[0] == base) for i in sequences]
return [count, sum(count)]

print(*count_first_base(), sep="\n")


[0, 0, 0, 1, 0, 0]

1

0
Entering edit mode
26 days ago
Dunois ▴ 740

Here's a little python function you can work off of:

def count_quer_at_pos_in_seq(seqs, quer = "A"):

#Initialize a list of zeroes, and make it
#as long as the longest input sequence.
#This will be used to store the counts for the
#character counts at each position (along the
#length of the sequences).
out = [0]*max([len(seq) for seq in seqs])

#For each sequence in the list sequences:
for seq in seqs:

#For each position in the current sequence:
for pos in range(len(seq)):

#Check if the character at the current position
#is identical to the query character supplied by
#the user.
if seq[pos] == quer:
#If it is, increment the count in the list "out"
#by one.
out[pos] += 1

#Return out to the calling environment.
return(out)

#----

#Test run.
sequences = ["GAGGTAAACTCTG", "TCCGTAAGTTTTC", "CAGGTTGGAACTC", "ACAGTCAGTTCAC", "TAGGTCATTACAG", "TAGGTACTGATGC"]
count_quer_at_pos_in_seq(sequences, quer = 'A')

#[1, 4, 1, 0, 0, 3, 4, 1, 1, 3, 0, 2, 0]