Extract EXON fasta from a GFF3 annotation and reference genome
1
0
Entering edit mode
4.0 years ago
marcelolaia ▴ 10

Hi, after a hard search on the net I found this awesome script (https://git.io/JfIjg). It works nice. However, I need to extract all exon sequence from a genome based on GFF3 and FASTA. Please, found here https://is.gd/gGMtUo a GFF3 sample file. From that file I need to extract sequences like these:

>Eucgr.A00001.1.v2.0.exon.1
ACTGTGACA......
>Eucgr.A00001.1.v2.0.exon.2
ACTGTGACA......
>Eucgr.A00001.1.v2.0.exon.3
ACTGTGACA......
(...)
>Eucgr.A00001.1.v2.0.exon.12
ACTGTGACA......
(...)

Could you help me? Thank you so much!

genome FASTA Exon gff2fasta GFF • 1.1k views
ADD COMMENT
2
Entering edit mode
4.0 years ago
Juke34 8.6k

Here a solution

conda create -n agat
conda activate agat
conda --install -c bioconda agat
agat_sp_extract_sequences.pl --gff file.gff -f file.fa -t exon --split
ADD COMMENT
1
Entering edit mode

Thank you so much! It worked out off the box! I have a little trouble to install it on my Debian box. So, I leave here the step by step I do to install AGAT on my Debian testing. This steps worked for me on Linux 5.5.0-2-amd64 #1 SMP Debian 5.5.17-1 (2020-04-15) x86_64 GNU/Linux:

Install all dependencies by APT

apt update && apt upgrade
apt install libbio-perl-perl libclone-perl libgraph-perl liblwp-useragent-determined-perl libstatistics-r-perl libjson-perl libcarp-clan-perl libsort-naturally-perl libfile-share-perl libfile-sharedir libfile-sharedir-install-perl`

Clone AGAT

git clone https://github.com/NBISweden/AGAT.git # Clone AGAT
cd AGAT                                         # move into AGAT folder
perl Makefile.PL                                # Check all the dependencies*. If it complains for lack of any dependencies, install it using apt. All perl dependencies are in the Debian repositories
make                                            # Compile
make test                                       # Test
sudo make install                               # Install

Run the agat_sp_extract_sequences.pl. For extract exon, for example, you could run:

agat_sp_extract_sequences.pl -f reference_sequence.fa -gff annotation_fasta.gff3 -t exon --split -o output.exons.fa

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1326 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6