How to loop through a file to get a value for the coverage in each line of the samtools pileup file?
1
0
Entering edit mode
5.5 years ago
M.O.L.S ▴ 100

Hi,

I have a question.

I am writing a program in Python. I am using an pileup file generated from samtools. I have loaded the file into python and I want to loop through the file to find the values in the 4th column of each line.

I have managed to do this for only one line so far using the code below...

How can I combine the code in order to loop through the entire file to find the values in the fourth column that are equal to or less than 10 ?

If the number in the fourth column is equal to or less than 10, then I want to print out this line.

This is my code so far:

# A different way to open the mpileup file.         
f = open("/Users/m.o.l.s/outputFile.mpileup","rt")

line_1 = f.readline()
print(line_1)

# Split up the strings based on tabs
individuals = line_1.split("\t")
print(individuals)

# The value we are interested in is in the 3rd index position
The_Coverage = individuals[3]
print(The_Coverage)

# If this number is less than 10, the row needs to be displayed
if The_Coverage <= "10":
    print(The_Coverage)
else:
    print("The Coverage is not less than or equal to 10")

... so this code works , but only for 1 line and I have about 200 lines in the file

Should I turn the file into a pandas data frame? I would appreciate any help possible to further this along. Best.

sequencing • 970 views
ADD COMMENT
0
Entering edit mode

Hello M.O.L.S,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
2
Entering edit mode
5.5 years ago
Joe 21k

You only get 1 line, because that’s all readline() does (you may have been looking for readlines())

To iterate a file you want to do something like:

with open(“myfile.pileup”, “r”) as handle:
     for line in handle:
           if line.split(“\t”)[3] <= 10:
                print(line)

I agree with your assessment of using pandas dataframes instead though (especially if the file is large)*

*assuming you have enough memory

ADD COMMENT

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6