Question: How To Fetch Exon Sequence From Genomic Coordinates
gravatar for BehMah
12 days ago by
BehMah20 wrote:

Hi Everyone :)

I have a list of genomic coordinates and want to get only exon sequence for them.

I can get the whole sequence (exon+intron) by Bedrolls getfasta but I want JUST exon sequences.

Thank you :)

rna-seq gene • 90 views
ADD COMMENTlink modified 12 days ago by ATpoint4.4k • written 12 days ago by BehMah20

If you're using R, you could use the biomart package

ADD REPLYlink written 12 days ago by caggtaagtat230

Thanks caggtaagtat! There is no assemble/annotation for rat (rnor 4) in biomart. There is just rnor 6 available :(

ADD REPLYlink written 12 days ago by BehMah20
gravatar for ATpoint
12 days ago by
ATpoint4.4k wrote:

Bedrolls sounds like some new kind of sushi roll :-D It is bedtools. Anyway, what you can do is 1) intersect your genomic coordinates with a GFF/GTF file that contains exonic coordinates. GFF files, depending on the organism you are working on, are available from GENCODE, NCBI etc. For this, first isolate exons from the GFF:

awk 'OFS="\t", $1 ~ /^#/ {print $0;next} {if ($3 == "exon") print $1, $4-1, $5}' in.gff3 | sort -k1,1 -k2,2n > exon.bed

Then intersect this exon file with your coordinates:

bedtools intersect -a your_file.bed -b exon.bed > intersection.bed

If you want the entire exon (even if one part of the exon does not overlap with your_file.bed), then add option -wb to the command.

Then proceed with getfasta.

ADD COMMENTlink modified 12 days ago • written 12 days ago by ATpoint4.4k

Thank you ATpoint. Sorry for Bedtools which was type error ;) I will give it a go. Just wondering if there there a way to get a single exon sequence (joint multiple exons) so I get only one sequence per each interval ? something like below: because each interval contain multiple exons. THANKS FOR YOUR REPLY :)

                   chr1    110743176       110749172                   gaatctgggtgagcaaatgcttcctgtgaccaacagggtatagtagaagtgatgctatgtgacttccaaggctagattaggaaaggccgtgccacttccacctggtgttctagggatactcattctagaggcagccagctgccatgtaagacagccaaccaccctgagactgccatgctagggaggcgatatgtttgcagatgcttaggttgacagcttcagctgagcttccagccaacagccagtgtcaactgccagccacatgaacacagcatactgaacgtttagcccagctgagcttcagatgtttgcagcccgctgacatctgattgtagctgcataagagaccctaagcaagaactgttcaactgagccctt
ADD REPLYlink written 12 days ago by BehMah20

Ignore my last comment. I sorted it out thank you ATpoint :)

ADD REPLYlink modified 10 days ago • written 11 days ago by BehMah20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1574 users visited in the last hour