I have columns of population genetic data, here's an example:
line chromosome fst_score 1 1 0.3 2 1 0.7 3 1 0.3 4 1 0.15 5 1 0.4 6 2 0.6 7 2 0.94 8 2 0.17 9 2 0.19
I want to calculate the average of the values in column 3 (the Fst score) but not for the whole column, I need to bin windows of certain sizes. To start with I'd just like to know how to calculate the average for 10 rows of data column 3. I know that to calculate the average for the whole column I can do something like this:
for line in fileObj: lineList = line.strip().split() if lineList in ['1100', '1200', '1300']: list_to_average = [float(s) for s in lineList[2:]] average = sum(list_to_average)/len(list_to_average)
but I am not sure how to initiate a reliable counter that would do this for every 10 rows and output this along with the value in the first column (so that I know which lines the average comes from).
This is for chromosome data so in the real file the values in column 2 are chromosome positions and I will use these to define the number of rows that need to be average together as 1mb. But this is trickier as I will need to count when the distance between rows has reached 1mb as I iterate through the file. For now I would just like to solve the first challenge of calculating an average for every 10 lines in a file.
Please let me know if I can format the question better or if there is already an answer on this forum (I have looked)
Any help is appreciated. I'm still getting to grips with programming!