Hello Biostar ! i hope you're doing well , i have question please i m trying to build database bacteria genre using all the sequences published to calculate the coverage of my reads against this database using bowtie2 for mapping , for that , i merge all the genomes sequences i downloaded from ncbi in one fasta_library ( i merge 74 files in on fasta file ) , the problem is that in this fasta file (the library i created ) i have a lot of duplicated sequences , and that affected the coverage in a big way , so i'm asking if theres any way to eliminate duplication i have in my Library_File , or if theres any way to merge the sequences without having the duplication , or also if theres any other way to calclulate the coverage of my reads against reference sequences
Hello Bioinfo!
This topic has been addressed multiple times on the site. Please see posts here: https://www.biostars.org/local/search/page/?q=fasta+remove+duplicates
For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.
If you disagree please tell us why in a reply below, we'll be happy to talk about it.
Cheers!
the original poster actually means something else completely, they call the similar regions in their reference genomes as "duplicated" regions,