Hello,
Please pardon a potentially naive question. I have 30 newly sequenced WGS data files for a bird species.These are all in a FASTQ format. I have mapped them to a reference genome from NCBI, which is the only full genome sequence for a member of the same family, using bowtie2. All works fine at this point, but to have an outgroup for downstream phylogenetic analyses I would like to include the said reference genome in my analyses, as it is a perfect outgroup and the only same-family species available.
I probably naively tried to map it to 'itself' using bowtie2. Firstly, I am not sure if this is a correct approach. Secondly, the approach is failing due to constant out-of-memory errors on my cluster. Today I realised this might be due to the fact that bowtie2 is not meant to deal with this kind of data. The .fna reference file contains scaffold-level fastas, so some of them are in the range of 100s of kB. I imagine this is what kills the process due to OOM.
Is there a way to solve this problem and include the reference genome in my study? Any help will be appreciated.
For extra info this is the bowtie2 command. ref_genome/Carolina_wren is indexed and correctly prepared for mapping. All resources for the threads command have been correctly specified in the cluster settings.
bowtie2 -x ref_genome/Carolina_wren -f -U ref_genome/GCA_013397245.1_ASM1339724v1_genomic.fna -S CW.sam --threads 96
Thanks, Michal