I'm trying to use Counter() and most_common(), to count the occurrence of amino acids from two lists. Let's call them upper and lower:
counterup = Counter(upperseqs)
counterlow = Counter(lowerseqs)
countermc_up = (counterup.most_common(500))
countermc_low = (counterlow.most_common())
print len(countermc_up)
print len(countermc_low)
countermc_low)
for k,v in countermc_up:
for x,y in countermc_low:
if x == k:
print >> fh1, k, '\t', v, '\t', y
elif x != k:
print >> fh1, k, '\t', v, '\t', "0.00"
else:
print "No Matches found!! Try again!"
So I want the top 500 sequences from my "upper" list, and I want to compare the counts for those, if they are present to, to all of those sequences that would be contained in the second "lower" list. There are approx 36K items with counts in the second list.
When I run the code, without the elif, else statement, I get what I want. All of the matches that are contained in the second list are printed to a fh, that I opened previously, in a tab delimited format: sequence, count for upper, count for lower.
CARYLGYNSNWYPFDYW 589778 427779
CARDYRGYSGYNDAFNIW 294911 29343
CARKIGYSSGSEDYW 187806 90299
CARHLGYNNSWYPFDYW 82820 88700
CARHLGYNSAWYPFDYW 55642 45723
CARHLGYNDSWYPFDYW 44338 30974
CAKDFRGYTGYNDAFDIW 34638 9703
CARHLGYNSDWYPFDYW 23476 15692
CARHLGYNSVWYPFDYW 16223 12220
CARHLGYNSNWYPFDYW 15673 17198
......
CARYLNSWPY 89 0.00
However, there is one that is in the upper 500 list that is not in the lower list, and I need to find out which one. I will also use this for other second lists of varying size where I know there are fewer items that will be found in the first list. What I want the code to do, is to input "0.00" in the third column, if that sequence does not exist in the second list.
What's happening when I run it with the elif, else statement, I get the first row perfect:
Ex: CARYLGYNSNWYPFDYW 589778 427779
However the code continues to only use the first sequence until it goes through all items in the second list. So I get:
CARYLGYNSNWYPFDYW 589778 0.00
CARYLGYNSNWYPFDYW 589778 0.00
CARYLGYNSNWYPFDYW 589778 0.00
CARYLGYNSNWYPFDYW 589778 0.00
CARYLGYNSNWYPFDYW 589778 0.00
CARYLGYNSNWYPFDYW 589778 0.00
for thousands of rows. I've sifted through this file, and found that it does print the next count where the item is found in the second list. Since it already found it's match, I need it to go on to the next one in list one to look for it in list two, since I know the item won't appear again. I also need to keep the sorted order of the lists that were created by Counter().
All help is appreciated.