Select random lines from the given bed files following the same distribution as given input file
1
0
Entering edit mode
7.9 years ago
Naresh D J ▴ 110

Hi,

How to randomly select lines from a bed file? More specifically, I want to create a smaller bed file of genomic regions (chip-seq peaks) from a larger one, while maintaining the relative proportion of lines from each chromosome. For example if my input file has 1000 lines and want to select 100 lines randomly but maintaining the chromosome proportions relatively same.

It seems that this question was asked earlier here but I did not find the right solution? (How To Randomly Sample A Subset Of Lines From A Bed File)

Can you suggest me some tools or using awk or based on shell script.

Thank you, Naresh D J

ChIP-Seq bedtools • 2.0k views
ADD COMMENT
0
Entering edit mode

In which way does the previous post not answer your question ?

ADD REPLY
0
Entering edit mode

The answers given in the previous post were based on choosing the fixed number of lines from each chromosome and not maintaining the relative proportions.

ADD REPLY
0
Entering edit mode

As I read the first answer there, it does what I understand you want: say you want 100 random lines from your bed file while preserving the proportion of each chromosome in these 100 lines, that's what I understand the solution provided does.

ADD REPLY
0
Entering edit mode
7.9 years ago

awk '($0 ~ /^#/ || rand()<0.1)' input.vcf ?

ADD COMMENT
0
Entering edit mode

How to input our desired number of lines ?

ADD REPLY
0
Entering edit mode

Finally, I wrote a couple of lines in R and seems the best solution.

ADD REPLY

Login before adding your answer.

Traffic: 2811 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6