Question: One read and One alignment
gravatar for cxr5298
14 days ago by
cxr52980 wrote:

I am working with a data set of ONT MinION metagenome data aligning it against a fasta file where each entry is a different species using a series of aligners (BWA, Bowtie2, SNAP, and Minimap2) to see which aligner yields the best results. However for each alinger I am getting more alignments to the specific species than there are reads in the input file.

For example a fastq containing 100,000 reads of Mouse and 50,000 reads of Wood Mouse Herpes virus aligned against my database will return 300,000 alignments for Mouse and 120,000 alignments for Wood Mouse Herpes Virus.

I understand that this is in some part due to the fact that some of these aligners report secondary and chimeric alignments but I was wondering if there wasnt a known aligner or aligner configuration wherein I could report only a single best alignment for each read in my input file?

ADD COMMENTlink modified 13 days ago by swbarnes25.5k • written 14 days ago by cxr52980

Have you looked at the alignments to see those being produced by bwa and bowtie2 seem reasonable/logical? Look at the CIGAR strings and lengths of alignments.

minimap2 is the only bonafide aligner for long reads (not sure about SNAP) and should produce results that you should compare the others to.

ADD REPLYlink modified 14 days ago • written 14 days ago by genomax67k

As far as I can tell the alignments appear valid just that there's more than there should be. This phenomenon is across the board whether I'm using minmap2, bowtie2, snap, or BWA. I'm not too familiar with CIGAR strings but from what I can tell the alignments all appear to be good. If I wanted to double check the CIGAR strings how would I do that?

ADD REPLYlink written 13 days ago by cxr52980

how would you treat the reads/alignments that are truly double in your reference?

ADD REPLYlink modified 13 days ago • written 13 days ago by lieven.sterck4.8k

In the case where the read aligns twice to the same species I'd consider it a single 'hit' for the species in question. In the case where a read aligns to multiple species it would be something I'd need to investigate further to determine it's origin. I'm more preoccupied with identifying origin of read than I am the specifics of its alignment to the species in question as the metagenome Im working is going to be consisting of only a handful of potential organisms.

ADD REPLYlink written 13 days ago by cxr52980
gravatar for swbarnes2
13 days ago by
United States
swbarnes25.5k wrote:

You can filter your bam with samtools view -F 256 to get only the primary alignments. That will yield a one read-one line relationship.

ADD COMMENTlink written 13 days ago by swbarnes25.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2237 users visited in the last hour