Is a genome position in an exon, intron or intergenic region?
1
0
Entering edit mode
9.1 years ago
fuhsuyuan • 0

Hi,

I have a list of genome position, e.g.


scaffold 1 [tab] position 1 [tab] reads1
scaffold 1 [tab] position 2 [tab] reads2
.
.
.
scaffold 2 [tab] positionN [tab] readsN

And my annotation file could be like


scaffold 1 [tab] exon_start [tab] exon_end [tab] annotation1
scaffold 1 [tab] exon_start [tab] exon_end [tab] annotation2
scaffold 1 [tab] exon_start [tab] exon_end [tab] annotation3
.
.
scaffold N [tab] exon_start [tab] exon_end [tab] annotationN

How do I make a output file programmatically like


output.txt

scaffold 1 [tab] position1 [tab] reads1 [tab] intron [tab] annotation
scaffold 1 [tab] position2 [tab] reads2 [tab] exon [tab] annotation
.
.
scaffold N [tab] positionN [tab] readsN [tab] intergenic_region [tab] neighbor gene annotation

Thanks.

genome • 2.4k views
ADD COMMENT
0
Entering edit mode

Go through bedtools or bedops. You may have to reformat your tables a little otherwise both of them work wonders.

ADD REPLY
0
Entering edit mode

Thank you. I will try it.

ADD REPLY
0
Entering edit mode
9.0 years ago

One way is to get standard annotations into BED format, e.g.:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \
    | gunzip --stdout - \
    | convert2bed -i gff - \
    > annotations.bed

Then do a search against those annotations with your regions of interest in BED format (say, in a file called regions.bed):

$ bedops --element-of 1 regions.bed annotations.bed > answer.bed

You can then use grep to filter the answer.bed for subsets of interest (exon, etc.):

$ grep -w exon answer.bed > answer_exon.bed

You could also use awk:

$ awk '$8=="exon"' answer.bed > answer_exon.bed

Your search just has to match the biotype category name.

If you want to use your own annotations and regions, just get them into sorted BED format.

ADD COMMENT

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6