I am trying to compare, bed files of multiple human exome (hg38) capture kits using Bedtools Intersect to get differences in terms of total number of genes, omim genes, cds, exons and total number of bases.
For this I need to get an updated reference bed file of each parameter for comparison.
Please let me know how can I get these files.
bedmap --echo --echo-map --count exome.bed annotations.bed > answer.bed
annotations.bed file can be GENCODE genes, OMIM genes, etc. which have CDSs, exons, etc.
For example, GENCODE gene counts can be obtained via
wget -qO- https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_44/gencode.v44.annotation.gff3.gz \
| gunzip --stdout - \
| awk '$3 == "gene"' - \
| convert2bed -i gff - \
bedmap --echo --echo-map-id-uniq --count exome.bed genes.bed > answer.bed
The last column of
answer.bed will have the number of genes that overlap an entry in
You can repeat this for other sets and modify
awk statements etc. to get specific subsets of annotations.
OMIM annotations may need an intermediate map step to go from
MIM numbers to ENSG to
bedmap back to BED entries.
Traffic: 2174 users visited in the last hour