Question

Mapping .fna reference genome to itself in order to include it as an outgroup

0

Entering edit mode

23 months ago

Michal • 0

Hello,

Please pardon a potentially naive question. I have 30 newly sequenced WGS data files for a bird species.These are all in a FASTQ format. I have mapped them to a reference genome from NCBI, which is the only full genome sequence for a member of the same family, using bowtie2. All works fine at this point, but to have an outgroup for downstream phylogenetic analyses I would like to include the said reference genome in my analyses, as it is a perfect outgroup and the only same-family species available.

I probably naively tried to map it to 'itself' using bowtie2. Firstly, I am not sure if this is a correct approach. Secondly, the approach is failing due to constant out-of-memory errors on my cluster. Today I realised this might be due to the fact that bowtie2 is not meant to deal with this kind of data. The .fna reference file contains scaffold-level fastas, so some of them are in the range of 100s of kB. I imagine this is what kills the process due to OOM.

Is there a way to solve this problem and include the reference genome in my study? Any help will be appreciated.

For extra info this is the bowtie2 command. ref_genome/Carolina_wren is indexed and correctly prepared for mapping. All resources for the threads command have been correctly specified in the cluster settings.

bowtie2 -x ref_genome/Carolina_wren -f -U ref_genome/GCA_013397245.1_ASM1339724v1_genomic.fna -S CW.sam --threads 96

Thanks, Michal

fasta bowtie2 mapping • 519 views

ADD COMMENT • link 23 months ago by Michal • 0