I have a file (gene_exons.bed
) that I am trying to use to map a bed file with baits to. That is use the bait bed to find if it overlaps the gene_exon.bed
and if it does, output the result.
bedmap --echo --echo-map-id-uniq epilepsy70_medex_edit.bed output.bed > answer.bed
bedmap --echo --echo-map epilepsy70_medex_edit.bed output.bed > gene_exon.bed
I have tried both commands above but can not produce the desired result. Is it possible? Thank you :)
gene_exon.bed
chr1 11868 12227 DDX11L1 1
chr1 12009 12057 DDX11L1 1
chr1 12178 12227 DDX11L1 2
chr1 12612 12697 DDX11L1 3
chr1 12612 12721 DDX11L1 2
chr1 12974 13052 DDX11L1 4
chr1 13220 13374 DDX11L1 5
chr1 13220 14409 DDX11L1 3
chr1 13452 13670 DDX11L1 6
epilepsy70_medex_edit.bed
(baits)
chr1 40539722 40539865
chr1 40542489 40542609
chr1 40544221 40544341
chr1 40546054 40546174
chr1 40555071 40555194
chr1 40556976 40557096
chr1 40557706 40557854
chr1 40558059 40558189
chr1 40562776 40562920
chr1 43392701 43392922
Desired output
chr1 40562776 40562920 gene exon
chr1 43392701 43392922 gene exon
If you need to validate BED files, you can use
bedops --ec --everything
:This will tell you if a BED file is missing information, has malformed fields, or is not sorted correctly (i.e., per BEDOPS sort-bed).
If you need results in a different format, you can use
bedmap --echo-map-id
, but instead pre-process your inputs to contain both gene and exon number in the ID field.When mapping with
bedmap
, the ID field will contain both gene name and exon number and so the output will contain that mapped result.Otherwise, it's not immediately clear to me exactly what you are trying to do with your inputs. In that case, if you want to post a snippet of both input files
epilepsy70_medex_edit.bed
andoutput.bed
, as well as a snippet of the expected output fileanswer.bed
, that might help me help write you an awk statement that does the necessary pre-processing.I don't think either BEDOPS or bedtools will do what you want without some extra work, but please feel free to post more info and I'll try to help show you how to do things with BEDOPS.
The below command:
creates a answer.bed that is close (that is it includes the gene name but not the exon):
answer.bed
Basically, I am just trying to ge tthe exon number included. I do have a file (
gene_exon.bed
) contains that info.epilepsy70_medex_edit.bed
(baits)Desired
answer.bed
Thank you very much :). I will also try to concatenate all the values in ID)
Can the below command be used to identify the gene and exon of a given overlap? I have a file with the gene:exon concatenated in column 4, but the results (as of now) do not seem to be using this file. Thank you :).
test.bed
epilepsy70_medex_edit.bed
(baits)Desired output
Output (as of now)
I think you are missing a hyphen in
--skip-unmapped
, but--echo-map-id-uniq
is working correctly.The ID field includes the exon number, in this case, so an ID like
TDRD10:10
is treated (correctly) as a distinct string fromTDRD10:2
.If you want a tab between range and IDs, use
--delim '\t'
with bedmap. This replaces the pipe character ('|
') with a tab.Otherwise, you could collapse the IDs by prefix by post-processing the current output with Perl or Python, etc.
For instance:
Untested, but this should perhaps spit out something more readable, like:
Note that you would want to use
--skip-unmapped
to prepare input for this script. Or else you'll likely get one or more lines with an empty fourth field where there are no mapped elements.I named the script gene_exon.pl and placed the file bedmap created (answer.bed) in a directory named file.
I did a cd to that directory and then:
Thank you :)
Try:
Like I said, this is mainly to show what could be done generally to collapse IDs by prefix. You may need to tweak this further depending on your input.