Question: Exon parsing from bed file
1
gravatar for krushnach80
3.1 years ago by
krushnach80890
krushnach80890 wrote:

This is my bed file for all the exon coordinates ..small subset so i want to take out all the exons that of a given gene let say i have gene in chr 1 which starts from chr1 11868 12227 so i want to parse out all the exons that comes in between 11868 12227

this is my small subset

cat exon.bed | head -10
chr1    11868   12227   +   exon
chr1    11871   12227   +   exon
chr1    11873   12227   +   exon
chr1    12009   12057   +   exon
chr1    12178   12227   +   exon
chr1    12594   12721   +   exon
chr1    12612   12697   +   exon
chr1    12612   12721   +   exon
chr1    12612   12721   +   exon
chr1    12974   13052   +   exon

How do i parse out , i use mostly R and bit of shell script but I m not sure if i can use R , may a few lines of perl or shell script can help me solve my problem.

Any help or suggestion would be highly appreciated

rna-seq • 1.1k views
ADD COMMENTlink modified 3.1 years ago by Alex Reynolds31k • written 3.1 years ago by krushnach80890
2

how about just using awk ?

awk '($1=="chr1"  && int($2)>=11868 && int($3)<=12227 && $5=="exon")' input.bed

if you need a faster solution, query your file using tabix.

ADD REPLYlink written 3.1 years ago by Pierre Lindenbaum134k
1

@Pierre thank you very much for the quick solution at least some start for me to think , the way you suggested what if I have to do for all the genes with their respective coordinates ,how do i do that, because some gene might have one exons and some might have multiple exon...I hope i am kind of making you understand my problem

ADD REPLYlink written 3.1 years ago by krushnach80890

You might also want to look at txdb packages in Bioconductor.

ADD REPLYlink written 3.1 years ago by Sean Davis26k
1

Take a look at the rtracklayer Bioconductor package and import. Then, after importing the bedfile, look at the Bioconductor GenomicRanges %over% method. These are big hammers for a small problem, but if you use R and are doing genomics, GenomicRanges can quickly become your best friend.

ADD REPLYlink written 3.1 years ago by Sean Davis26k

okay that sounds really cool , yes i mostly use R for all the genomics work I will try the library and let know

ADD REPLYlink written 3.1 years ago by krushnach80890
1

Hello krushnach80!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/3239/parse-out-exon-coordinates-from-bed-file-for-each-gene

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 3.1 years ago by Pierre Lindenbaum134k

@Pierre i regret that i posted in earlier but as I didn't get any response so i posted in both communities i would keep in mind not to repeat it

ADD REPLYlink written 3.1 years ago by krushnach80890
2

Oh you didn't get a response after 2 hours on a Sunday, that is indeed unreasonably long. Quite a lazy community indeed, next thing you know we'll have a personal life to take care of.

ADD REPLYlink written 3.1 years ago by WouterDeCoster45k

@ WouterDeCoster Im sorry for that i was talking about this question which i asked earlier realted to this which was kind of not specific

Parse out exon for divergent primer design

ADD REPLYlink written 3.1 years ago by krushnach80890
4
gravatar for Alex Reynolds
3.1 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

Via BEDOPS bedops -n and Unix I/O streams:

$ echo -e "chr1\t11868\t12227" | bedops -n 1 exon.bed - > answer.bed

Or, if you have your genes in a BED file called genes.bed:

$ bedops -n 1 exon.bed genes.bed > answer.bed

If you have your genes in some other format, like GFF or GTF, you can use gff2bed or gtf2bed, e.g.:

$ bedops -n 1 exon.bed <(gff2bed < genes.gff) > answer.bed

Or:

$ bedops -n 1 exon.bed <(gtf2bed < genes.gtf) > answer.bed

The file answer.bed will contain exons that do not overlap a gene annotation.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Alex Reynolds31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1183 users visited in the last hour
_