Mapping names of sequences from one file to another file
2
0
Entering edit mode
4.6 years ago

I have two files file A and file B and I need to map the names of file B in file A.
File A is

bl_bl/Mir7_O-E
bowl_O-E
btd_Ss-Bg
CG9571_O-E
cnc_+5_construct
eve_stripe1
X 6014154 6015890
X 6023769 6025039
X 6022460 6023762
X 6018273 6020650


File B looks like this

X 5987411 5987911 Unspecified_STARR-S2-5230
X 5997666 5998166 Unspecified_STARR-S2-4940
X 6000535 6001035 Unspecified_STARR-OSC-2712
X 6002496 6002996 Unspecified_STARR-S2-4953
X 6027445 6027945 Unspecified_STARR-S2-1989
X 6069973 6072234 Unspecified_VT57592
X 6074286 6074786 Unspecified_STARR-OSC-3266
X 6075128 6075628 Unspecified_STARR-S2-2715
X 6108152 6108652 Unspecified_STARR-OSC-4388
X 6132403 6132903 Unspecified_STARR-OSC-2588
X 6132527 6133027 Unspecified_STARR-S2-1212

How can I map file A to file B to find the common regions? Especially considering the first few hits of File A whose genomic coordinates are not given.

genome • 720 views
1
Entering edit mode

I'm going to go out on a limb and say this is impossible to do in a robust way. The two files seem to have nothing in common. Perhaps you can give more details as to the nature of their contents, but I'm still doubtful there is a solution. Maybe if you can describe in more detail what you want to do and why, someone will have an alternative.

0
Entering edit mode

The file A has two columns in the first few lines and three columns in the next few lines. It threw an error:
bedtools sort -i a.bed
It looks as though you have less than 3 columns at line: 1. Are you sure your files are tab-delimited?

Also, can you give some insights on bedmap function? I have not used it before.

0
Entering edit mode

You're going to need to do some work to fix your inputs. As Brian noted, you haven't provided enough information for anyone to really do this for you. Read up on the UCSC BED format specification. Also, if you click on the link in my answer, you'll see a link to the documentation page that describes bedmap and sort-bed.

0
Entering edit mode
4.6 years ago

Via BEDOPS:

1. Get coordinates of elements in set A; generate a sort-bed-sorted BED file called A.bed
2. Generate a sort-bed-sorted BED file called B.bed, e.g. sort-bed B.unsorted.bed > B.bed
3. Run bedmap --echo --echo-map-id-uniq A.bed B.bed > answer.bed and explore the file answer.bed

You'll probably need to figure out step 1. There's not enough information in your question for others to help, probably.

0
Entering edit mode
4.6 years ago
zjhzwang ▴ 180

I have no experience work with that type of file, but I think u can use Perl script to figure it out.
You can create hash table {"X 5987411 5987911" => "Unspecified_STARR-S2-5230"} from file B, then map to file A.