Question

Exon parsing from bed file

1

Entering edit mode

6.3 years ago

1769mkc ★ 1.2k

This is my bed file for all the exon coordinates ..small subset so i want to take out all the exons that of a given gene let say i have gene in chr 1 which starts from chr1 11868 12227 so i want to parse out all the exons that comes in between 11868 12227

this is my small subset

cat exon.bed | head -10
chr1    11868   12227   +   exon
chr1    11871   12227   +   exon
chr1    11873   12227   +   exon
chr1    12009   12057   +   exon
chr1    12178   12227   +   exon
chr1    12594   12721   +   exon
chr1    12612   12697   +   exon
chr1    12612   12721   +   exon
chr1    12612   12721   +   exon
chr1    12974   13052   +   exon

How do i parse out , i use mostly R and bit of shell script but I m not sure if i can use R , may a few lines of perl or shell script can help me solve my problem.

Any help or suggestion would be highly appreciated

rna-seq • 2.1k views

ADD COMMENT • link updated 6.3 years ago by Alex Reynolds 35k • written 6.3 years ago by 1769mkc ★ 1.2k

2

Entering edit mode

how about just using awk ?

awk '($1=="chr1"  && int($2)>=11868 && int($3)<=12227 && $5=="exon")' input.bed

if you need a faster solution, query your file using tabix.

ADD REPLY • link 6.3 years ago by Pierre Lindenbaum 161k

1

Entering edit mode

@Pierre thank you very much for the quick solution at least some start for me to think , the way you suggested what if I have to do for all the genes with their respective coordinates ,how do i do that, because some gene might have one exons and some might have multiple exon...I hope i am kind of making you understand my problem

ADD REPLY • link 6.3 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

You might also want to look at txdb packages in Bioconductor.

ADD REPLY • link 6.3 years ago by Sean Davis 26k

1

Entering edit mode

Take a look at the rtracklayer Bioconductor package and import. Then, after importing the bedfile, look at the Bioconductor GenomicRanges %over% method. These are big hammers for a small problem, but if you use R and are doing genomics, GenomicRanges can quickly become your best friend.

ADD REPLY • link 6.3 years ago by Sean Davis 26k

0

Entering edit mode

okay that sounds really cool , yes i mostly use R for all the genomics work I will try the library and let know

ADD REPLY • link 6.3 years ago by 1769mkc ★ 1.2k

1

Entering edit mode

Hello krushnach80!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/3239/parse-out-exon-coordinates-from-bed-file-for-each-gene

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 6.3 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

@Pierre i regret that i posted in earlier but as I didn't get any response so i posted in both communities i would keep in mind not to repeat it

ADD REPLY • link 6.3 years ago by 1769mkc ★ 1.2k

2

Entering edit mode

Oh you didn't get a response after 2 hours on a Sunday, that is indeed unreasonably long. Quite a lazy community indeed, next thing you know we'll have a personal life to take care of.

ADD REPLY • link 6.3 years ago by WouterDeCoster 47k

0

Entering edit mode

@ WouterDeCoster Im sorry for that i was talking about this question which i asked earlier realted to this which was kind of not specific

Parse out exon for divergent primer design

ADD REPLY • link 6.3 years ago by 1769mkc ★ 1.2k

score 4 · Accepted Answer · 2018-01-07

Via BEDOPS bedops -n and Unix I/O streams:

$ echo -e "chr1\t11868\t12227" | bedops -n 1 exon.bed - > answer.bed

Or, if you have your genes in a BED file called genes.bed:

$ bedops -n 1 exon.bed genes.bed > answer.bed

If you have your genes in some other format, like GFF or GTF, you can use gff2bed or gtf2bed, e.g.:

$ bedops -n 1 exon.bed <(gff2bed < genes.gff) > answer.bed

Or:

$ bedops -n 1 exon.bed <(gtf2bed < genes.gtf) > answer.bed

The file answer.bed will contain exons that do not overlap a gene annotation.