Histogram of Paired End Fragment Lengths
1
0
Entering edit mode
4 months ago
ss3943 • 0

I have various BAM files for which I want to plot the distribution of fragment lengths. I know that deepTools has the bamPEFragmentSize tool, but this only samples the reads from a binned genome. We have relatively small files--maybe only 100k reads that map to the viral genome of interest--so the binning and sampling causes a loss of resolution.

I wanted to just take the 9th column from the SAM file, then turn that into a Numpy array and create a histogram from there. However, I'm getting a histogram that doesn't make sense to me. I would like fragment length on the x axis and frequency of that fragment length on the y axis.

After cutting the 9th column from the SAM file, I loaded this file into Python and converted it into a Numpy array:

samFile="ninth-column.txt"
#read file line by line, then remove /n
with open(samFile) as f:
    content = f.readlines()
content = [x.strip() for x in content]

#convert string list to integer list
for i in range(0, len(content)):
    content[i] = int(content[i])
content[0:10]
>>> [68, 0, 0, 147, 347, 177, 347, 246, 245, -68]

#convert integer list to numpy array
arr = np.array(content)
abArr = np.absolute(arr)
print(abArr[0:10])
>>>[ 68   0   0 147 347 177 347 246 245  68]

#make histogram
readLengths = np.histogram(abArr)
readLengths
>>>(array([39010090,      111,       68,       59,       41,       15,
              10,       16,        7,        4]),
 array([0.00000000e+00, 2.27951284e+07, 4.55902568e+07, 6.83853852e+07,
        9.11805136e+07, 1.13975642e+08, 1.36770770e+08, 1.59565899e+08,
        1.82361027e+08, 2.05156156e+08, 2.27951284e+08]))

plt.hist(readLengths)

plt.show()

enter image description here

numpy python histogram • 188 views
ADD COMMENT
0
Entering edit mode
4 months ago
liorglic ▴ 440

Almost there! I think this will give you what you want:

plt.hist(abArr)
plt.show()

Also note the 0 values - these probably arise from unmapped reads (or mates), so you might want to filter your reads before running your code.

ADD COMMENT

Login before adding your answer.

Traffic: 2827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6