Question: One read and One alignment
gravatar for cxr5298
13 months ago by
cxr529820 wrote:

I am working with a data set of ONT MinION metagenome data aligning it against a fasta file where each entry is a different species using a series of aligners (BWA, Bowtie2, SNAP, and Minimap2) to see which aligner yields the best results. However for each alinger I am getting more alignments to the specific species than there are reads in the input file.

For example a fastq containing 100,000 reads of Mouse and 50,000 reads of Wood Mouse Herpes virus aligned against my database will return 300,000 alignments for Mouse and 120,000 alignments for Wood Mouse Herpes Virus.

I understand that this is in some part due to the fact that some of these aligners report secondary and chimeric alignments but I was wondering if there wasnt a known aligner or aligner configuration wherein I could report only a single best alignment for each read in my input file?

ADD COMMENTlink modified 13 months ago by swbarnes27.8k • written 13 months ago by cxr529820

Have you looked at the alignments to see those being produced by bwa and bowtie2 seem reasonable/logical? Look at the CIGAR strings and lengths of alignments.

minimap2 is the only bonafide aligner for long reads (not sure about SNAP) and should produce results that you should compare the others to.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax84k

As far as I can tell the alignments appear valid just that there's more than there should be. This phenomenon is across the board whether I'm using minmap2, bowtie2, snap, or BWA. I'm not too familiar with CIGAR strings but from what I can tell the alignments all appear to be good. If I wanted to double check the CIGAR strings how would I do that?

ADD REPLYlink written 13 months ago by cxr529820

how would you treat the reads/alignments that are truly double in your reference?

ADD REPLYlink modified 13 months ago • written 13 months ago by lieven.sterck7.8k

In the case where the read aligns twice to the same species I'd consider it a single 'hit' for the species in question. In the case where a read aligns to multiple species it would be something I'd need to investigate further to determine it's origin. I'm more preoccupied with identifying origin of read than I am the specifics of its alignment to the species in question as the metagenome Im working is going to be consisting of only a handful of potential organisms.

ADD REPLYlink written 13 months ago by cxr529820
gravatar for swbarnes2
13 months ago by
United States
swbarnes27.8k wrote:

You can filter your bam with samtools view -F 256 to get only the primary alignments. That will yield a one read-one line relationship.

ADD COMMENTlink written 13 months ago by swbarnes27.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1500 users visited in the last hour