Question

Basecount calculation after alignment to multi-fasta file

0

Entering edit mode

5 weeks ago

aj • 0

Hi,

I aligned (viral) paired end reads using the commands below:

bwa index {input.reference} bwa mem -t 2 {input.reference} {input.R1} {input.R2} | samtools sort -o tmp_aln.bam
#exclude supplementary and not primary alignment ({input.reference} has multiple sequences)
samtools view -h -F 2304 -o {output.bam} tmp_aln.bam
samtools index {output.bam}

NB: In order to account for sequence diversity {input.reference} contains many different sequences of the same (viral) Subtype. It is a long story, but basically direct alignment to the consensus of these sequences is something we would like to avoid.

Now, I would like to visualise the base counts relative to the consensus sequence (I believe that is the most forward way to visualise the alignment ?). But when using samtools depth I am obviously getting the basecounts per individual sequence present in my multi-fata file.

Question: Is there an well-established bioinformatics tool to map between the positions of the individual sequences and the consensus sequence of my multi-fasta file? Are there other approaches that, I could consider (end goal is visualising the summarised coverage)?

Thanks and best!

multi-fasta sequence bwa consensus samtool alignment • 335 views

ADD COMMENT • link 5 weeks ago by aj • 0