Question

Creating bed files for consecutive genomic windows out from polymorphisms

0

Entering edit mode

6.3 years ago

spiral01 ▴ 110

I have a set of polymorphisms and their position on each chromosome in a text file. I also have vcf files of variants (1000 genomes). I want to identify the number of variants within 1kb windows of each polymorphism.

To do this I think I need to create bed files for each appropriate window.

So for example, if I have polymorphisms at the following positions:

The first bed file would show start and end positions:

Then the second bed file would show:

Because I'd like to know the number in same size windows I'd then subtract the number of variants found from the first bed file from the number in the second.

And so on and so forth for the length of the chromosome..

I know how to count the variants, I just don't know how to automate the creation of such a vast number of bed files. Does anyone have any suggestions?

SNP • 1.5k views

ADD COMMENT • link 6.3 years ago by spiral01 ▴ 110

score 0 · Answer 1 · 2017-12-22

I have ended up using a simple python pandas function. df is the table with the polymorphism position. window is the size of the window to use around each polymorphism. chromSize is the length of the chromosome in question, used to ensure the end co-ordinates do not exceed the length.

def makeBedFiles(df, window, chromSize, fileSuffix):
    #Add 'START' and 'END' columns that are windows of bed file
    df['START'] = np.where(df['POS'] - window >= 0, df['POS'] - window, 0)
    df['END'] = np.where(df['POS'] + window <= chromSize, df['POS'] + window, chromSize)
    #Keep only chromosome, start and end columns
    df = df[['CHROM', 'START', 'END']]
    fileName = fileSuffix + str(window) + '.txt'
    #Write df to file
    df.to_csv(fileName, header=None, index=None, sep='\t', mode='a')

Having done this I then loop through a range from 0 to the chromosome size (249250621) divided by the window size (500) - chromosome is Chr1 from hg19 1000genomes:

for i in range(0, 498502):
    makeBedFiles(table, window = (500*i), chromSize = 249250621, 
                 fileSuffix = ('str(i)))

I am sure there are better, cleaner and quicker ways to do this but so far this is all I have come up with.