Entering edit mode
15 months ago
sunnykevin97
▴
980
Hi,
Sequencing data - Illumina NextSeq 500, 150 bp RL and PE reads.
I had the masked fish genomes, using BRAKER2 by providing the vertebrate protein sequences as a hints file. I annotated the genomes. In one of the fish genomes, I had around ~85590 amino acid sequences corresponding to the ~85590 coding sequences (cds).
Please answer my questions -
How do I find the -
1. Full length protein coding sequences ?
2. Partial length protein coding sequences ?
And also,
3. What is the criteria for filtering out the fragmented genome assemblies ?
4. What is the min contig size cutoff for excluding the small contigs from genome assembly.
Suggestions appreciated.