Sliding window plot using Python
2
1
Entering edit mode
3.1 years ago
A_Lh ▴ 30

I want to plot the number of positions in a sliding window of 1000 and a step of 20 for each sample (A-D).

Interpretation:

  • 1: position exists;
  • NA: position does not exist.

I have tested a dozen tools in bash, R and other but I am looking for a Python solution.
Your advice please.

#This is an example of my data:
window = 1000
step = 20

# Example of dataframe
POSITION        A       B       C       D
1250            1       1       1       1 
1750            NA      1       NA      1
1786            1       NA      1       1
1812            1       1       1       1
1855            1       1       1       1
1896            1       NA      1       NA
2635            NA      1       1       1
1689            1       1       NA      NA
3250            1       1       1       1
3655            1       NA      1       1
3589            NA      1       1       1

I am looking for some thing like this:

Polymorphism density plot using a window size of 1,000,000 with an increment (step) of 100,000

Any help will be appreciated!

python SNP PLOT • 2.4k views
ADD COMMENT
0
Entering edit mode

Previous question: How to plot SNPs distribution on each chromosome?

Just need to adjust BEDOPS bedmap commands to count SNPs over sliding windows, and then feed that as input to the provided script.

ADD REPLY
1
Entering edit mode
3.1 years ago
A_Lh ▴ 30

This is a Python solution:

import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame({'POSITION': [1250,
  1750,
  1786,
  1812,
  1855,
  1896,
  2635,
  1689,
  3250,
  3655,
  3589],
 'A': [1.0, np.nan, 1.0, 1.0, 1.0, 1.0, np.nan, 1.0, 1.0, 1.0, np.nan],
 'B': [1.0, 1.0, np.nan, 1.0, 1.0, np.nan, 1.0, 1.0, 1.0, np.nan, 1.0],
 'C': [1.0, np.nan, 1.0, 1.0, 1.0, 1.0, 1.0, np.nan, 1.0, 1.0, 1.0],
 'D': [1.0, 1.0, 1.0, 1.0, 1.0, np.nan, 1.0, np.nan, 1.0, 1.0, 1.0]})



window = 5
step = 2

df = df.set_index('POSITION').rolling(window).count().reset_index().iloc[::step, :]    
df = df.melt(id_vars='POSITION', value_vars=['A','B','C','D'], value_name='polym', var_name='chromop')    
sns.lineplot(data=df, x='POSITION',y='polym',hue='chromop')
ADD COMMENT
1
Entering edit mode
3.1 years ago
4galaxy77 2.9k

In R instead of python, but it will do the job:

library(zoo)
library(tidyverse)
count_positions = function(x) sum(x, na.rm=T)
as.data.frame(rollapply(dat[,-1], FUN=count_positions, width=2, by=2)) %>% 
    mutate(index = 1:n()) %>% 
    pivot_longer(-index) %>% 
    ggplot(aes(x=index, y=value, colour=name)) + 
    geom_line()
ADD COMMENT

Login before adding your answer.

Traffic: 1938 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6