Question

Software that finds regions matching characteristics such as number of genes in segment

2

Entering edit mode

9.8 years ago

vlaufer ▴ 290

I have defined 44 linkage blocks as "interesting" through a GWAS - NGS cohort.

I have run certain tests on those, but I wish to understand how typical or atypical those results are by comparing the 44 loci to 44 similarly size loci that are "matched" according to various things.

1. Matched based on the number of genes and total gene content

2. Matched based on the number of regulatory elements (e.g. Conserved TFBS)

Is there a software that does this? If not, how to people do it?

Thank you.

GWAS WGS Software Statistics • 1.7k views

ADD COMMENT • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by vlaufer ▴ 290

Ram · Accepted Answer · 2014-06-30

2

Entering edit mode

9.8 years ago

Alex Reynolds 35k

Assuming your datasets are in sorted BED format, you could use bedmap with the --count operand to calculate counts of mapped genes and regulatory elements over a reference set (docs). Other statistical operations are available, score and non-score-based. You could use this to calculate statistics for treatment and control sets, observed and background sets, etc. Alternatively, if you don't already have a background set, you could, say, map over windows to find regions that have characteristics you want.

ADD COMMENT • link updated 4.5 years ago by Ram 43k • written 9.8 years ago by Alex Reynolds 35k

0

Entering edit mode

Alex - thank you very much for this helpful feedback. The data are not currently in BED format, although I suppose I could put them in that format. Actually, the format I would use would likely be VCF file.

However I suppose what I was asking for would be for a tool that would do it without any input at all. A simple example: if I specify a given region that is 1 Mb in length, it might contain, say, 6 genes. The tool I am seeking would simply find another 1Mb region with 6 genes in it.

ADD REPLY • link updated 2.5 years ago by Ram 43k • written 9.8 years ago by vlaufer ▴ 290

0

Entering edit mode

You could construct one by building sliding 1 Mb windows across your genome, piping them as reference elements into bedmap.

As an example, this awk script makes disjoint 1 Mb windows from 1 Mb to 11 Mb over chr1 and looks within each window to count the number of genes. If the number of genes is equal to 6, then it prints the result:

$ awk 'BEGIN { for(i=1;i<=10;i++) { print "chr1\t"(i*1000000)"\t"((i+1)*1000000) } }' \
    | bedmap --echo --count --delim '\t' - genes.bed \
    | awk '($4==6)' -

This is a very rudimentary and incomplete example — not least because the windows are disjoint and do not span the chromosome — but hopefully it demonstrates the principle. You could modify this approach to make a sliding window that moves over each chromosome, for instance, testing the window for your condition-of-interest.

ADD REPLY • link 9.8 years ago by Alex Reynolds 35k