Question

Is a genome position in an exon, intron or intergenic region?

0

Entering edit mode

9.1 years ago

fuhsuyuan • 0

Hi,

I have a list of genome position, e.g.

scaffold 1 [tab] position 1 [tab] reads1
scaffold 1 [tab] position 2 [tab] reads2
.
.
.
scaffold 2 [tab] positionN [tab] readsN

And my annotation file could be like

scaffold 1 [tab] exon_start [tab] exon_end [tab] annotation1
scaffold 1 [tab] exon_start [tab] exon_end [tab] annotation2
scaffold 1 [tab] exon_start [tab] exon_end [tab] annotation3
.
.
scaffold N [tab] exon_start [tab] exon_end [tab] annotationN

How do I make a output file programmatically like

output.txt

scaffold 1 [tab] position1 [tab] reads1 [tab] intron [tab] annotation
scaffold 1 [tab] position2 [tab] reads2 [tab] exon [tab] annotation
.
.
scaffold N [tab] positionN [tab] readsN [tab] intergenic_region [tab] neighbor gene annotation

Thanks.

genome • 2.4k views

ADD COMMENT • link updated 3.1 years ago by Ram 43k • written 9.1 years ago by fuhsuyuan • 0

0

Entering edit mode

Go through bedtools or bedops. You may have to reformat your tables a little otherwise both of them work wonders.

ADD REPLY • link updated 3.1 years ago by Ram 43k • written 9.1 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Thank you. I will try it.

ADD REPLY • link 9.1 years ago by fuhsuyuan • 0

score 0 · Answer 1 · 2015-04-13

One way is to get standard annotations into BED format, e.g.:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \
    | gunzip --stdout - \
    | convert2bed -i gff - \
    > annotations.bed

Then do a search against those annotations with your regions of interest in BED format (say, in a file called regions.bed):

$ bedops --element-of 1 regions.bed annotations.bed > answer.bed

You can then use grep to filter the answer.bed for subsets of interest (exon, etc.):

$ grep -w exon answer.bed > answer_exon.bed

You could also use awk:

$ awk '$8=="exon"' answer.bed > answer_exon.bed

Your search just has to match the biotype category name.

If you want to use your own annotations and regions, just get them into sorted BED format.