3.4 years ago by
Walnut Creek, USA
Typically, I'd suggest BLASTing them against nt or similar, and removing the ones that hit suspicious things. RefSeq also has a plastid dataset; you could align your contigs to that also.
Furthermore, the chloroplast contigs should have similar coverage. Once you know the chloroplast coverage, you can usually just throw away contigs with very different coverage. For example, if the chloroplast is 500x on average, and you get some contigs with 100x coverage, those are probably something else (like the plant main genome). You can determine coverage by mapping the reads to the assembly (e.g. with BBMap: bbmap.sh in=reads.fq ref=assembly.fa covstats=covstats.txt) or usually by looking at the contig names, though that's less accurate.