Question: how many number of elements have a value greater than x and less than y
0
6.4 years ago by
Raghav100
Raghav100 wrote:

Dear All,

I need help retrieve information from the given table

 SNP position start end 98151697 98151197 98152197 98135534 98135034 98136034 98144183 98143683 98144683 98144239 98143739 98144739 98144334 98143834 98144834 98151980 98151480 98152480 98160201 98159701 98160701 98160407 98159907 98160907 98143556 98143056 98144056 98122867 98122367 98123367 98124264 98123764 98124764 98128885 98128385 98129385 98129604 98129104 98130104 98129805 98129305 98130305

How many SNP positions lie in between 1st start position and end position likewise in consecutive start position and end position. and at the last, it tell me ith strat and end position cover highest numbers from column one,

Thank you

modified 6.4 years ago by Alex Reynolds31k • written 6.4 years ago by Raghav100
1

What have you tried?

I was about to ask him the same thing. Show us what have you tried so far and we may help in modifying the code.

You need to be more clear with your question. Also, try giving an example. I mean a real example with a small dataset along with the output you would expect.

``````snp position    start    end    output
2               1        6      3
4               1        3      1
5               7        10     2
8               10       20     0
9               1000     20000  0
``````

In the above example there are three values from column (2,4,5) fall in between 1 and 6 (1st value of 2nd and 3rd columns respectively).

2nd values of 2nd and 3rd column of data contain only one snp position from column1.

1
6.4 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

BEDOPS can solve this problem for you (specifically, the sort-bed and bedmap tools, along with standard core UNIX utilities):

1. Turn the start and end positions of your regions-of-interest from your tab-delimited table file `table.txt` into a sorted BED file called `regions.bed`:

``````\$ awk 'NR > 2 { print \$0; }' table.txt \
| cut -f2,3 - \
| awk '{ print "chrN\t&"\$1"\t"\$2; }' - \
| sort-bed - > regions.bed
``````
2. Turn the SNP positions into a second, sorted BED file called `SNPs.bed`:

``````\$ awk 'NR > 2 { print \$0; }' table.txt \
| cut -f1 - \
| awk '{ print "chrN\t"\$1"\t"(\$1 + 1); }' - \
| sort-bed - > SNPs.bed
``````
3. Count the SNPs that fall within your regions with `bedmap --count`:

``````\$ bedmap --echo --count regions.bed SNPs.bed > answer.bed
``````

Your answer is in the file `answer.bed`, where each region is printed, alongside the number of SNPs that fall across that region's range.