Question: How To Fetch Exon Sequence From Genomic Coordinates
0
gravatar for BehMah
6 months ago by
BehMah30
BehMah30 wrote:

Hi Everyone :)

I have a list of genomic coordinates and want to get only exon sequence for them.

I can get the whole sequence (exon+intron) by Bedrolls getfasta but I want JUST exon sequences.

Thank you :)

rna-seq gene • 268 views
ADD COMMENTlink modified 6 months ago by ATpoint11k • written 6 months ago by BehMah30

If you're using R, you could use the biomart package

ADD REPLYlink written 6 months ago by caggtaagtat380

Thanks caggtaagtat! There is no assemble/annotation for rat (rnor 4) in biomart. There is just rnor 6 available :(

ADD REPLYlink written 6 months ago by BehMah30
2
gravatar for ATpoint
6 months ago by
ATpoint11k
Germany
ATpoint11k wrote:

Bedrolls sounds like some new kind of sushi roll :-D It is bedtools. Anyway, what you can do is 1) intersect your genomic coordinates with a GFF/GTF file that contains exonic coordinates. GFF files, depending on the organism you are working on, are available from GENCODE, NCBI etc. For this, first isolate exons from the GFF:

awk 'OFS="\t", $1 ~ /^#/ {print $0;next} {if ($3 == "exon") print $1, $4-1, $5}' in.gff3 | sort -k1,1 -k2,2n > exon.bed

Then intersect this exon file with your coordinates:

bedtools intersect -a your_file.bed -b exon.bed > intersection.bed

If you want the entire exon (even if one part of the exon does not overlap with your_file.bed), then add option -wb to the command.

Then proceed with getfasta.

ADD COMMENTlink modified 6 months ago • written 6 months ago by ATpoint11k

Thank you ATpoint. Sorry for Bedtools which was type error ;) I will give it a go. Just wondering if there there a way to get a single exon sequence (joint multiple exons) so I get only one sequence per each interval ? something like below: because each interval contain multiple exons. THANKS FOR YOUR REPLY :)

                   chr1    110743176       110749172                   gaatctgggtgagcaaatgcttcctgtgaccaacagggtatagtagaagtgatgctatgtgacttccaaggctagattaggaaaggccgtgccacttccacctggtgttctagggatactcattctagaggcagccagctgccatgtaagacagccaaccaccctgagactgccatgctagggaggcgatatgtttgcagatgcttaggttgacagcttcagctgagcttccagccaacagccagtgtcaactgccagccacatgaacacagcatactgaacgtttagcccagctgagcttcagatgtttgcagcccgctgacatctgattgtagctgcataagagaccctaagcaagaactgttcaactgagccctt
ADD REPLYlink written 6 months ago by BehMah30

Ignore my last comment. I sorted it out thank you ATpoint :)

ADD REPLYlink modified 6 months ago • written 6 months ago by BehMah30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1160 users visited in the last hour