Find cooccurences of factors across multiple files
1
0
Entering edit mode
3.4 years ago
tirichl ▴ 20

Hey,

I have several hundred files that look like this:

a.file
#genomic positions
1, 3, 4, 9, 10
#factors
a, b, d, g

Each file holds multiple genomic positions (numbers) and factors (characters). I want to investigate, whether there are genomic positions that frequently co-occur with factors across all files, but I have no idea on how to approach this. Can someone point me into the right direction? Is there a tool or a library that might help? Thank you!

ChIP-Seq sequencing gene • 497 views
ADD COMMENT
0
Entering edit mode
3.4 years ago

bedtools intersect accepts multiple file to be intersected. Have you tried with it? You can report the number of occurrences with -c flag.

bedtools intersect -c -a your_file -b factor1 factor2 factorN

Then you could print the lines with more than X occurrences with awk.

awk 'BEGIN{FS="\t";OFS="\t"}{if($4 > X) print $0}'

Hope it helps!

ADD COMMENT

Login before adding your answer.

Traffic: 2824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6