I'm doing an RNA-seq, according to kraken 12% my reads belong to other organisms, I need to clean them before continuing. I'm using an organism that have never been use before in this field, I don't have the reference genome I need to do everithing from scratch. Kraken allow me to detect them but not delete them, do you know any other tool?
"12% contamination" is too general information. The ideal step to check for adaptors and if they are present remove them by FASTX toolkit, Trimgalore, Trimmomatics the list is long. Next, you can check for the reads' average quality; usually, 5' ends have lousy quality, so just trim them for a few bases. You can also filter reads which are bad in quality over the length. All these initial analyses could be done by any of the reads assessment tools, as I mentioned before.
Coming to your next concern, you don't have the assembled genome for mapping. You can use De novo approach: This approach does not require a reference genome to reconstruct the transcriptome, and is typically used if the genome is unknown, incomplete, or substantially altered compared to the reference. But for better quality analysis, you will need reference for the RNA-seq analyses. You can make your reference genome (initially of bad quality but would be enough for your RNA-seq) by using WGS reads or any highthrouput seq which covers the whole genome.
Thanks for your answer. I have already trim them, the problem is that of the remaining high quality reads, 12% belong to bacteria and fungi not to my organism, I would like to delete those reads. We are using wild mussels and they are full of bacteria and fungi.
Should I just continue the process with the contamination?
The best to remove such contamination (other micro-organism reads) it to map your fastq on the reference wild mussels genome, the mapped reads sam/bam have all your reads of interest.
In case you don't have wild mussels genome, you can use blast to map all you reads from fastq to nrdatabase. Then remove the reads which are hitting against any bacteria or fungi (or with specific names). The remaining reads (which are not hitting against any bacteria/fungi but may hit against other organisms) can be use for further analyses.
I don't prefer to continue the process with the contamination.
Thanks again, I'm using BBSplit to get rid of the exogenous reads. Let's see if I manage to make it work.