Question: How To Find Members Of A Gene Family From The Reads Of An Unassembled Genome
Hi, I have to find some members of gene's family, in a genome in SRA format, (

Well I convert this SRA to FastQ (fastq-dump) and then in FASTA, I decided not assembly because the genome is very hard, and maybe assembly sequences from 2 or more member in one contig (because this family is a little conserved).

So I compared this genome (in fasta) against my set of genes (all the same family) (blast)

For my gene1 I had N1 hits, for my gene2 N2 hits... for gene17 I had N17 hits. The problem is here, when I compared the hits, most of hits are shared. So I can't recognize my genes.

What do you recommend me?


These problems are difficult, you can try some workarounds and see how they work out

  1. start with creating your best assembly,
  2. then map back the reads to this assembly
  3. find large groups of reads that partially align over regions that match you gene families.
  4. then take this subset and reassemble them independently.
