how to extract intergenic regions from BRAKER output?
0
0
Entering edit mode
2.0 years ago
mthm ▴ 50

I have used softmaksed genome with the RNAseq bam file to generate gene annotation. the gff3 file contains: CDS, intron, exon, start-codon, stop-codon, mRNA, and gene.

xyz.00052 AUGUSTUS        gene    142183  143365  0.43    -       .       ID=g1;
xyz.00052 AUGUSTUS        mRNA    142183  143365  0.43    -       .       ID=g1.t1;Parent=g1;
xyz.00052 AUGUSTUS        stop_codon      142183  142185  .       -       0       ID=g1.t1.stop1;Parent=g1.t1;
xyz.00052 AUGUSTUS        CDS     142186  142347  1       -       0       ID=g1.t1.CDS1;Parent=g1.t1;
xyz.00052 AUGUSTUS        exon    142186  142347  .       -       .       ID=g1.t1.exon1;Parent=g1.t1;
xyz.00052 AUGUSTUS        intron  142348  142408  1       -       .       ID=g1.t1.intron1;Parent=g1.t1;

I assumed that "gene" covers all the other parts (am not interested in mRNA and CDS is basically the same as exon), so I extracted a bed file containing only gene coordinates.

awk '$4=="gene"' xyz.braker.bed > xyz.braker.gene.bed
xyz.00001 9196    11118   gene    -
xyz.00001 56048   61743   gene    -
xyz.00001 62586   67965   gene    -
xyz.00001 77364   79170   gene    +
xyz.00001 92556   102646  gene    +

then I created a genome file from the "unmasked" genome assembly

samtools faidx  --mark-strand sign xyz.fa
awk -v OFS='\t' {'print $1,$2,$3'} xyz.fa.fai >  xyz.genomefile
xyz.00001 25751438        17
xyz.00002 24213853        25751473
xyz.00003 21310674        49965344
xyz.00004 18560873        71276036
xyz.00005 14821237        89836927

finally I extracted the intergenic regions

bedtools complement -i xyz.braker.gene.bed -g xyz.genomefile > xyz.intergenic.bed

question 1. is this way of extracting intergenic regions correct?

question 2. how can I merge the "xyz.intergenic.bed" (only three columns) to xyz.braker.gene.bed (with four columns including the strand information column)?

intergenic braker annotation • 466 views
ADD COMMENT
0
Entering edit mode

intergenic regions do not have strand informations

ADD REPLY
0
Entering edit mode

oh that's right! I updated my post

ADD REPLY

Login before adding your answer.

Traffic: 1329 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6