Find cooccurences of factors across multiple files
9 weeks ago
tirichl20 wrote:


I have several hundred files that look like this:

#genomic positions
1, 3, 4, 9, 10
a, b, d, g

Each file holds multiple genomic positions (numbers) and factors (characters). I want to investigate, whether there are genomic positions that frequently co-occur with factors across all files, but I have no idea on how to approach this. Can someone point me into the right direction? Is there a tool or a library that might help? Thank you!

9 weeks ago
jordi.planells330 wrote:

bedtools intersect accepts multiple file to be intersected. Have you tried with it? You can report the number of occurrences with -c flag.

bedtools intersect -c -a your_file -b factor1 factor2 factorN

Then you could print the lines with more than X occurrences with awk.

awk 'BEGIN{FS="\t";OFS="\t"}{if($4 > X) print $0}'

Hope it helps!

