Question: Annotating gene ID and genomic features from bed files
0
gravatar for nornorbinks
7 months ago by
nornorbinks0 wrote:

Hello, I have a bed file from a CLIP-seq experiment that has the columns: chr, start, stop, -log10(p-value), log2fold enrichment and strand (The bed file was generated with bedtools intersect of three bed files). I wish to annotate the intervals with gene name and genomic features (3'UTR, exon, intron, etc). I have found several ways to just annotate the gene name but I would really like to include the genomic features if possible. Any recommendations? Thanks!

clip genome • 191 views
ADD COMMENTlink modified 7 months ago by Alex Reynolds30k • written 7 months ago by nornorbinks0
2
gravatar for Alex Reynolds
7 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Using BEDOPS convert2bed, you can make a file of named features:

$ wget -qO- ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gff3.gz \
    | gunzip --stdout - \
    | convert2bed -i gff - \
    | awk -vFS="\t" -vOFS="\t" '{ print $1, $2, $3, $4"|"$8 }' - \
    > named-features.bed

Adjust the GENCODE download URL as needed for your reference genome and desired release version.

Then map your sorted BED file to the named features (assuming your BED file is called clip-seq.bed):

$ bedmap --echo --echo-map-id --delim '\t' <(sort-bed clip-seq.bed) named-features.bed > answer.bed

The name and feature category will be in the last column of answer.bed.

If you need to translate Ensembl IDs to HGNC names, one option is to modify the awk statement above to parse out the HGNC name from the optional attributes in the GFF file. As they may not always be present, another option is the use of the mygene service. An example Python script is available here that demonstrates retrieval for mm10 records: A: Get gene names from ensembl ID or gene region

ADD COMMENTlink modified 7 months ago • written 7 months ago by Alex Reynolds30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 678 users visited in the last hour