How To Fetch Exon Sequence From Genomic Coordinates
2
1
Entering edit mode
5.9 years ago
BehMah ▴ 50

Hi Everyone :)

I have a list of genomic coordinates and want to get only exon sequence for them.

I can get the whole sequence (exon+intron) by Bedrolls getfasta but I want JUST exon sequences.

Thank you :)

RNA-Seq gene • 2.3k views
ADD COMMENT
0
Entering edit mode

If you're using R, you could use the biomart package

ADD REPLY
0
Entering edit mode

Thanks caggtaagtat! There is no assemble/annotation for rat (rnor 4) in biomart. There is just rnor 6 available :(

ADD REPLY
0
Entering edit mode

Hi, could you tell me how you solved the last problem about the joining of multiple fasta exons? Thank you!

ADD REPLY
0
Entering edit mode

Please explain what you mean.

ADD REPLY
3
Entering edit mode
5.9 years ago
ATpoint 81k

Bedrolls sounds like some new kind of sushi roll :-D It is bedtools. Anyway, what you can do is 1) intersect your genomic coordinates with a GFF/GTF file that contains exonic coordinates. GFF files, depending on the organism you are working on, are available from GENCODE, NCBI etc. For this, first isolate exons from the GFF:

awk 'OFS="\t", $1 ~ /^#/ {print $0;next} {if ($3 == "exon") print $1, $4-1, $5}' in.gff3 | sort -k1,1 -k2,2n > exon.bed

Then intersect this exon file with your coordinates:

bedtools intersect -a your_file.bed -b exon.bed > intersection.bed

If you want the entire exon (even if one part of the exon does not overlap with your_file.bed), then add option -wb to the command.

Then proceed with getfasta.

ADD COMMENT
0
Entering edit mode

Thank you ATpoint. Sorry for Bedtools which was type error ;) I will give it a go. Just wondering if there there a way to get a single exon sequence (joint multiple exons) so I get only one sequence per each interval ? something like below: because each interval contain multiple exons. THANKS FOR YOUR REPLY :)

                   chr1    110743176       110749172                   gaatctgggtgagcaaatgcttcctgtgaccaacagggtatagtagaagtgatgctatgtgacttccaaggctagattaggaaaggccgtgccacttccacctggtgttctagggatactcattctagaggcagccagctgccatgtaagacagccaaccaccctgagactgccatgctagggaggcgatatgtttgcagatgcttaggttgacagcttcagctgagcttccagccaacagccagtgtcaactgccagccacatgaacacagcatactgaacgtttagcccagctgagcttcagatgtttgcagcccgctgacatctgattgtagctgcataagagaccctaagcaagaactgttcaactgagccctt
ADD REPLY
0
Entering edit mode

Ignore my last comment. I sorted it out thank you ATpoint :)

ADD REPLY
0
Entering edit mode
4.2 years ago
cmdcolin ★ 3.8k

The gffread which has special tools for extracting the sequence given a gff and a reference seq http://ccb.jhu.edu/software/stringtie/gff.shtml

ADD COMMENT

Login before adding your answer.

Traffic: 2629 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6