Plotting .tab/.bam file
0
0
Entering edit mode
18 months ago

Hey all, I am trying to find a way to create a scatter plot and histogram using matplotlib for an alignment I generated. I aligned my reads to a bacterial genome and I indexed and sorted the file, and used using:

samtools view -b s_oneidensis_alignemnt_sensitive.sam > alignment.bamsamtools sort alignment.bam > alignment.sorted.bam

samtools index alignment.sorted.bamsamtools depth -a alignment.sorted.bam > pileup.tab


Now I'd like to generate a scatterplot with x-axis = position in genome and y-axis = depth of coverage and then a histogram with x-axis = depth of coverage and y-axis = read count. I'm still new to python and trying to figure out a method using the .tab file or should I use the .bam file? Any help or nudges in the right direction would be greatly appreciated. Thanks!

python matplotlib .tab .bam • 1.3k views
0
Entering edit mode

A similar topic has been discussed in How to plot coverage and depth statistics of a bam file. Tab file (.tab) is just another text file where the columns are tab-separated, read the pileup.tab file using pandas and plot using pyplot.

0
Entering edit mode

So the issue I'm having with this is extracting the alignment .tab file's columns into a list my current code is its indexing the first strings indexed in a row not the column (NCBI ascension ID for the genome) such as A and E with this code:

    %matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

x = []
y = []

alignment = pd.DataFrame(data=table)
for column in alignment:
x.append(int(column[1]))
y.append(int(column[2]))

plt.plot(x, y, 'ro')
plt.xlabel('Position in Genome')
plt.ylabel('Depth of Coverage')
plt.show()