Searching for introns within 5'UTRs
Entering edit mode
2.7 years ago
ryan.borotra ▴ 10

Hi everybody!

I am looking for the best way to find the sequences of introns that are located within 5'UTRs of a genome.

I have:

  • Annotation file from MAKER (maker_annotation.gff3)
  • Assembled genome (scaffold_genome_assembly.fasta)

So far, I've used GenomeTools to add intron features to my annotation file:

gt gff3 -addintrons maker_annotation.gff3 > intron_annotation.gff3

I created separate .gff3 files for the 5'UTRs and introns using grep:

cat intron_annotation_file.gff3 | grep five_prime_UTR > five_prime_UTR.gff3
cat intron_annotation_file.gff3 | grep intron > introns.gff3

With these files, I used bedtools to get a .gff3 of the intersection between introns and 5'UTRs:

bedtools intersect -wa -a introns.gff3 -b five_prime_UTR.gff3 > introns_five_prime.gff3

And used this last output to get the sequence from the genome:

bedtools getfasta -name -fo introns_five_seq.fasta -fi scaffold_genome_assembly.fasta -bed introns_five_prime.gff3

However, the sequences I get in introns_five_seq.fasta don't all start with GT and end in AG. Is there anything wrong with this process? I'm new to bioinformatics, any feedback is appreciated!

gff3 intron 5'UTR fasta • 514 views
Entering edit mode

May be you want to remove the -wa option in bedtools intersect if you only want the intron fragments that are located within 5'UTRs as you have stated above otherwise you get the entire intron sequence.


Login before adding your answer.

Traffic: 1223 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6