Entering edit mode
18 months ago
ve9
▴
50
Is there a tool that can give info from long WGS data fastq, which species I have before starting an assembly?
PS its not a metagenome
Do you have an idea of what genomes to expect? Would there be one or more Depending on that you will need to select/create a database to search against.
If this is a single genome them simply blasting a few reads at NCBI should also give you an idea of what you have.
If your target species is included in a WGS database, you could use one of the metagenomic tools like Kraken2 and compare all your reads against a database of known sequences. Then you would also have a kind of confidence metric given the proportion of reads that map to the correct species. Another benefit would be identification and quantification of any potential contamination.
If your target species is not in a WGS database but sister taxa are, then you could still use the genus level mapping information.
I tend to use Kraken2 for QC of DNA and RNA reads and it is a really useful step.