Hi everyone,
I am wondering if anyone has a good way of filtering and possibly renaming genomic scaffolds for eventual submission to NCBI? I am working on submitting quite a few genomes and have gotten the contaminated scaffolds text file, but am wondering how people generally deal with removing them. I have gotten a few different answers asking around, so would just like any sort of community feedback or streamlined way of doing this.
I know that this often happens pre-assembly, but for algorithms like 10x's Supernova, they recommend not trimming or filtering before running the pipeline.
Thanks in advance for any advice or opinions. This site has been amazingly helpful for me over the past couple of years.
How did you decide they were contaminants? Based on NCBI/EBI scan report?
Yes, you are exactly right! It is just the contamination text file output from NCBI.