common coordinates from 10 files
0
0
Entering edit mode
4 months ago
K • 0

Hi,

I have 10 files and I am interested to find all the common coordinates occurred in all to manipulate a synteny plot. Is there a way either script or program which I can use?

I know comm and diff command for two files.

Thank you

coordinates common • 342 views
1
Entering edit mode

Couldn't you use a bash for loop and comm -3 to spit out the common portion between each file and the last comm -3 result, to get what must be common between all the files? Any element in common in all the files will always survive the comparison. Assuming all the files are sorted (you can add a sort step), something like:

#!/bin/bash
FILES="file1
file2
file3"

# create a first file
cp file1 common.txt

for f in $FILES do echo "processing:$f"
comm -3 $f common.txt > common.txt done  Otherwise I would imagine a similar strategy with bedtools intersect might work. ADD REPLY 0 Entering edit mode I am looking something like venn diagram analysis where I can see unique coordinates in each scaffold file and common in each subset. I saw some online tools where the datasets in limited to 3 input files but I want to know interactions between 10 scaffolds? ADD REPLY 0 Entering edit mode if your files are tsv file and can make one file as reference, you can try: $ tsv-join -f filter.tsv -k 2,3  data1.tsv data2.tsv data3.tsv


download tsv-utils from here and 2,3 are common columns among the files. Filter.tsv is reference file and rest of the files are files to be joined. Always post representative, example data if you want forum members to understand your query better.

0
Entering edit mode

sorted files 1-10

file 1: 53021-53613

437126-437761

838835-839317

1228237-1233121

1782778-1782914

file 2: 23181-23640

53021-53613

70544-71129

985194-988644

1017828-1018850

file 3: 41052-42706

44618-46770

53136-55912

55909-59236

70402-71600

70544-71129

1228237-1233121

common between: file1 > file2 : 53021-53613

file 1 > file3 : 1228237-1233121

file 2 > file3 : 70544-71129