I have Illumina fastq files from some RNA-seq, ATAC-seq and WES that originated as PDX samples. I am looking to filter out contaminating mouse reads from the human reads in these datasets. I have used Xenome before but wanted to try bbsplit. Xenome and bbslpit were attractive because they can handle the fastq files and there is no need to align to mouse and human and then compare filter those bams with tools like ngs-disambiguate, XenofilteR and etc
I built an index for bbsplit succesfuly for using human and mouse genomes:
bbsplit.sh -Xmx40g build=1 path=/home/ryan/Reference/bbsplit_mm10_hg38 ref_Mouse=/home/ryan/Reference/Mus_musculus/Ensembl/STAR_reference/Mus_musculus.GRCm38.dna.primary_assembly.fa ref_Human=/home/ryan/Reference/Homo_sapiens/Ensembl/gencode_GRCH38/GRCh38.primary_assembly.genome.fa
I then ran bbsplit as such:
bbsplit.sh -Xmx40g path=/home/ryan/Reference/bbsplit_mm10_hg38/ build=1 in=/home/ryan/NGS_Data/JCA108_S9_L004_R1_001.fastq.gz in2=/home/ryan/NGS_Data/JCA108_S9_L004_R2_001.fastq.gz refstats=/home/ryan/NGS_Data//test/JCA108_stats.txt basename=/home/ryan/NGS_Data/test/JCA108_%_#.fq.gz
I am running this on a Linux system with 48G RAM and 8 threads and the process is taking a long time (over>24hrs so far). Do I need a lot more RAM to use it? the output file is growing, but very slowly!