I need to predict pseudogenes from the assembled genome of a catfish. For this, I need to predict the genes from the genome and make a parent protein set for finding similarity in intergenic regions of the genome. There is possibility of processed pseudogene being predicted as a gene during prediction. Which software can be used for gene prediction that avoids the pseudogene in the results? Thanks in advance.
How complete/polished is your catfish genome assembly?
Also: do you have some good quality RNA-Seq?
With a draft genome it would be hard to guess if a gene X is missing some say first or last exon(s) because a faulty genomic region duplication (pseudo-gene) or it is just missing from the assembly. Same goes for a frame shift/stop codon introduced by a sequencing error vs inactivating mutation in a paralogue.
You may get processed pseudogenes where introns will be missing.
I don't know if there are tools designed to predict pseudogenes, but If i understand correctly from your post you could predict all the ORFs (using Artemis or ORF finder) and after that to compare/align each identified ORF with eachother to see the sequence similarities between and to find some of pseudogenes (using BLAST or something similar).
I hope this is helpful.