how many number of elements have a value greater than x and less than y
1
0
Entering edit mode
7.4 years ago
Raghav ▴ 100

Dear All,

I need help retrieve information from the given table

SNP position    start        end
98151697        98151197     98152197
98135534        98135034     98136034
98144183        98143683     98144683
98144239        98143739     98144739
98144334        98143834     98144834
98151980        98151480     98152480
98160201        98159701     98160701
98160407        98159907     98160907
98143556        98143056     98144056
98122867        98122367     98123367
98124264        98123764     98124764
98128885        98128385     98129385
98129604        98129104     98130104
98129805        98129305     98130305


How many SNP positions lie in between 1st start position and end position likewise in consecutive start position and end position. and at the last, it tell me ith strat and end position cover highest numbers from column one,

Thank you

less-than elements greater-than • 1.4k views
1
Entering edit mode

What have you tried?

0
Entering edit mode

I was about to ask him the same thing. Show us what have you tried so far and we may help in modifying the code.

0
Entering edit mode

You need to be more clear with your question. Also, try giving an example. I mean a real example with a small dataset along with the output you would expect.

0
Entering edit mode
snp position    start    end    output
2               1        6      3
4               1        3      1
5               7        10     2
8               10       20     0
9               1000     20000  0


In the above example there are three values from column (2,4,5) fall in between 1 and 6 (1st value of 2nd and 3rd columns respectively).

2nd values of 2nd and 3rd column of data contain only one snp position from column1.

0
Entering edit mode

1
Entering edit mode
7.4 years ago

BEDOPS can solve this problem for you (specifically, the sort-bed and bedmap tools, along with standard core UNIX utilities):

1. Turn the start and end positions of your regions-of-interest from your tab-delimited table file table.txt into a sorted BED file called regions.bed:

$awk 'NR > 2 { print$0; }' table.txt \
| cut -f2,3 - \
| awk '{ print "chrN\t&"$1"\t"$2; }' - \
| sort-bed - > regions.bed

2. Turn the SNP positions into a second, sorted BED file called SNPs.bed:

$awk 'NR > 2 { print$0; }' table.txt \
| cut -f1 - \
| awk '{ print "chrN\t"$1"\t"($1 + 1); }' - \
| sort-bed - > SNPs.bed

3. Count the SNPs that fall within your regions with bedmap --count:

\$ bedmap --echo --count regions.bed SNPs.bed > answer.bed


Your answer is in the file answer.bed, where each region is printed, alongside the number of SNPs that fall across that region's range.