Hi,
I'm building a database containing Refseq genome sequences from selected bacterial species, which will be used for Nanopore sequencing of environmental samples.
In order to eliminate chances of false positives, I used the UniVec database to locate any potential contamination and got substantial hits to several vectors. I am pretty new to bioinformatics and therefore I wanted to hear if anyone has any ideas of how to mask/remove the contamination from the genome sequences?
/Helena
Are you only creating a database of main chromosomes from bacterial species? Normally the genomes may also include plasmids.
To start I'll create a database containing the chromosomes and afterwards I'll create one for plasmids :) I have already separated the plasmid sequences from the chromosomes.