Question: Is a genome position in an exon, intron or intergenic region?
0
gravatar for fuhsuyuan
5.9 years ago by
fuhsuyuan0
fuhsuyuan0 wrote:

Hi,

I have a list of genome position, e.g.


scaffold 1 [tab] position 1 [tab] reads1

scaffold 1 [tab] position 2 [tab] reads2

.

.

.

scaffold 2 [tab] positionN [tab] readsN


 

And my annotation file could be like


scaffold 1 [tab] exon_start [tab] exon_end [tab] annotation1

scaffold 1 [tab] exon_start [tab] exon_end [tab] annotation2

scaffold 1 [tab] exon_start [tab] exon_end [tab] annotation3

.

.

scaffold N [tab] exon_start [tab] exon_end [tab] annotationN


 

How do I make a output file programmatically like 


output.txt

scaffold 1 [tab] position1 [tab] reads1 [tab] intron [tab] annotation

scaffold 1 [tab] position2 [tab] reads2 [tab] exon [tab] annotation

.

.

scaffold N [tab] positionN [tab] readsN [tab] intergenic_region [tab] neighbor gene annotation


 

Thanks.

genome • 1.8k views
ADD COMMENTlink modified 5.8 years ago by Alex Reynolds31k • written 5.9 years ago by fuhsuyuan0

Go through bedtools (http://bedtools.readthedocs.org/en/latest/) or bedops (http://bedops.readthedocs.org/en/latest/). You may have to reformat your tables a little otherwise both of them work wonders. 

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Ashutosh Pandey12k

Thank you. I will try it.

ADD REPLYlink written 5.9 years ago by fuhsuyuan0
0
gravatar for Alex Reynolds
5.8 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

One way is to get standard annotations into BED format, e.g.:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_21/gencode.v21.annotation.gff3.gz \
    | gunzip --stdout - \
    | convert2bed -i gff - \
    > annotations.bed

Then do a search against those annotations with your regions of interest in BED format (say, in a file called regions.bed):

$ bedops --element-of 1 regions.bed annotations.bed > answer.bed

You can then use grep to filter the answer.bed for subsets of interest (exon, etc.):

$ grep -w exon answer.bed > answer_exon.bed

You could also use awk:

$ awk '$8=="exon"' answer.bed > answer_exon.bed

Your search just has to match the biotype category name.

If you want to use your own annotations and regions, just get them into sorted BED format.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2026 users visited in the last hour
_