Question: How To Fetch Exon Sequence From Genomic Coordinates
0
gravatar for BehMah
10 months ago by
BehMah30
BehMah30 wrote:

Hi Everyone :)

I have a list of genomic coordinates and want to get only exon sequence for them.

I can get the whole sequence (exon+intron) by Bedrolls getfasta but I want JUST exon sequences.

Thank you :)

rna-seq gene • 434 views
ADD COMMENTlink modified 10 months ago by ATpoint15k • written 10 months ago by BehMah30

If you're using R, you could use the biomart package

ADD REPLYlink written 10 months ago by caggtaagtat500

Thanks caggtaagtat! There is no assemble/annotation for rat (rnor 4) in biomart. There is just rnor 6 available :(

ADD REPLYlink written 10 months ago by BehMah30
2
gravatar for ATpoint
10 months ago by
ATpoint15k
Germany
ATpoint15k wrote:

Bedrolls sounds like some new kind of sushi roll :-D It is bedtools. Anyway, what you can do is 1) intersect your genomic coordinates with a GFF/GTF file that contains exonic coordinates. GFF files, depending on the organism you are working on, are available from GENCODE, NCBI etc. For this, first isolate exons from the GFF:

awk 'OFS="\t", $1 ~ /^#/ {print $0;next} {if ($3 == "exon") print $1, $4-1, $5}' in.gff3 | sort -k1,1 -k2,2n > exon.bed

Then intersect this exon file with your coordinates:

bedtools intersect -a your_file.bed -b exon.bed > intersection.bed

If you want the entire exon (even if one part of the exon does not overlap with your_file.bed), then add option -wb to the command.

Then proceed with getfasta.

ADD COMMENTlink modified 10 months ago • written 10 months ago by ATpoint15k

Thank you ATpoint. Sorry for Bedtools which was type error ;) I will give it a go. Just wondering if there there a way to get a single exon sequence (joint multiple exons) so I get only one sequence per each interval ? something like below: because each interval contain multiple exons. THANKS FOR YOUR REPLY :)

                   chr1    110743176       110749172                   gaatctgggtgagcaaatgcttcctgtgaccaacagggtatagtagaagtgatgctatgtgacttccaaggctagattaggaaaggccgtgccacttccacctggtgttctagggatactcattctagaggcagccagctgccatgtaagacagccaaccaccctgagactgccatgctagggaggcgatatgtttgcagatgcttaggttgacagcttcagctgagcttccagccaacagccagtgtcaactgccagccacatgaacacagcatactgaacgtttagcccagctgagcttcagatgtttgcagcccgctgacatctgattgtagctgcataagagaccctaagcaagaactgttcaactgagccctt
ADD REPLYlink written 10 months ago by BehMah30

Ignore my last comment. I sorted it out thank you ATpoint :)

ADD REPLYlink modified 10 months ago • written 10 months ago by BehMah30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1859 users visited in the last hour