Question: How to loop through a file to get a value for the coverage in each line of the samtools pileup file?
0
gravatar for M.O.L.S
13 months ago by
M.O.L.S10
M.O.L.S10 wrote:

Hi,

I have a question.

I am writing a program in Python. I am using an pileup file generated from samtools. I have loaded the file into python and I want to loop through the file to find the values in the 4th column of each line.

I have managed to do this for only one line so far using the code below...

How can I combine the code in order to loop through the entire file to find the values in the fourth column that are equal to or less than 10 ?

If the number in the fourth column is equal to or less than 10, then I want to print out this line.

This is my code so far:

# A different way to open the mpileup file.         
f = open("/Users/m.o.l.s/outputFile.mpileup","rt")

line_1 = f.readline()
print(line_1)

# Split up the strings based on tabs
individuals = line_1.split("\t")
print(individuals)

# The value we are interested in is in the 3rd index position
The_Coverage = individuals[3]
print(The_Coverage)

# If this number is less than 10, the row needs to be displayed
if The_Coverage <= "10":
    print(The_Coverage)
else:
    print("The Coverage is not less than or equal to 10")

... so this code works , but only for 1 line and I have about 200 lines in the file

Should I turn the file into a pandas data frame? I would appreciate any help possible to further this along. Best.

sequencing • 322 views
ADD COMMENTlink modified 13 months ago by finswimmer13k • written 13 months ago by M.O.L.S10

Hello M.O.L.S,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 13 months ago by finswimmer13k
2
gravatar for Joe
13 months ago by
Joe15k
United Kingdom
Joe15k wrote:

You only get 1 line, because that’s all readline() does (you may have been looking for readlines())

To iterate a file you want to do something like:

with open(“myfile.pileup”, “r”) as handle:
     for line in handle:
           if line.split(“\t”)[3] <= 10:
                print(line)

I agree with your assessment of using pandas dataframes instead though (especially if the file is large)*

*assuming you have enough memory

ADD COMMENTlink modified 13 months ago • written 13 months ago by Joe15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1984 users visited in the last hour