Mapping to mtDNA and then align the unmapped
1
0
Entering edit mode
9 months ago

Hello all,

I have aligned my samples against the mitochondrion genome of the species I work with. My idea was that after this I would keep the unmapped ones (which would be the nuclear reads), and then align these against the reference genome.

However, and I did not think of this, the result of said process are bam files, and I have no idea how or if I can use bam files for a re-alignment.

Here is the code I have used for the mapping against the mtDNA:

#!/bin/bash

REFERENCE=../refs/mtDNA/mitochondrion-genome.fa
REFNAME=mitochondrion-genome
FORWARD=_1.fq.gz
REVERSE=_2.fq.gz

for i in ../trim/1stBatch-fastp/*_1.fq.gz
do
  SAMPLE=$(echo ${i} | sed "s/_1\.fq\.gz//" | rev | cut -d'/' -f-1 | rev)
  echo ${SAMPLE}_1.fq.gz ${SAMPLE}_2.fq.gz
  ## Define reference base name
  REFBASENAME="${REFERENCE%.*}"
  sbatch --account ACCOUNT --mem=64g -c 8 -o bowtie2-mtDNA.out -e bowtie2-mtDNA.err \
  --wrap="bowtie2 -q --phred33 --very-sensitive -p 8 -I 0 -X 1500 -x ${REFBASENAME} \
  -1 ../trim/1stBatch-fastp/${SAMPLE}${FORWARD} -2 ../trim/1stBatch-fastp/${SAMPLE}${REVERSE} -S ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'.sam'
  samtools view -bS ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'.sam' > ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'.bam'
  rm -f ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'.sam'
  samtools view -h -q 20 ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'.bam' | samtools view -buS - | samtools sort -o ../bam/1stBatch/mtDNA/${SAMPLE}'_bt2_'${REFNAME}'_minq20_sorted.bam'"
done

I was thinking that, as alternative, I could append the mtDNA to the nuclear genome and then use samtools to remove the reads aligned to the mtDNA. What do you think is a best strategy?

Thank you!

mtDNA Alignment Samtools Bowtie2 • 660 views
ADD COMMENT
3
Entering edit mode
9 months ago
ATpoint 82k

It would be better to align against the entire genome and then subset for what you need. For example, only mapping against mt will false-positively align nuclear homologs of mt genes with similar sequence mitochondrial genes to the mt genome.

As for the question, use samtools fastq to convert bam to fastq. If the bam is sorted then do samtools collate first since some aligners expect randomly-ordered fastq reads.

ADD COMMENT
0
Entering edit mode

Great answer, thank you so much.

ADD REPLY

Login before adding your answer.

Traffic: 2612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6