Hi all,
I have to extract intronic sequences from gff file (species - buffalo) to create training data set for CPAT. I am trying to do it with betools. Can anyone please tell me whether the following procedure is correct?
(ucsc table browser doesnot have sequences for buffalo, perl exttract_seq_from_gff3.pl -d genome.fa - gene_intron.gff3 > output_intron.fa
from link also didnot work)
awk -F'\t' $3=='gene' my_gff >gene_gff
awk -F'\t' $3=='exons' my_gff
>exon_gff`
subtractBed -a gene_gff -b exon_gff >intron_gff
fastaFromBed -fi genome.fa -bed intron.gff -fo intron.fa
Is there anyway to check if the regions in resultant intron gff/fa file is actually introns.
Please give me suggestions. Thanks in advance!