Dna Sequencers- Required Matching Errors For 150 Bps
1
1
Entering edit mode
12.0 years ago
Arpssss ▴ 40

I am doing some experiment using BowTie and WHAM which are short read aligners that aligns short DNA sequences (reads) to the human genome like BLAST. In WHAM, it is specified that for 75 bps reads, it is biologically required to allow matching with two errors and if the read length increases, allowed error should increase. Now, I want to experiment with those two tools on 150 bps. Can anybody help me what should be the allowed matching errors for 150 bps or where can I find information about it ?

dna blast bowtie • 2.3k views
ADD COMMENT
0
Entering edit mode

Actually, I am asking for "maximum tolerated miss match errors" for various (specially 150 bps) read length.

ADD REPLY
0
Entering edit mode

There is no definite answer. All are heuristic rules and depend on your purpose. Some are based on score, while others based on mismatches. Bwa paper gives one of these rules. You may read.

ADD REPLY
1
Entering edit mode
12.0 years ago
Vikas Bansal ★ 2.4k

I think it really depends on your analysis and what you want. Eg if you are aligning reads from divergent genome, then you should allow more mismatches (may be 8 or 9) or if you want your reads to match perfectly without mismatches. As you mentioned 2 errors, I think these are because we assume that there may be a sequencing error or common variant as compare to reference genome. For 75bp, I will go with 2-3 mismatches and with 150 bp 4-5 mismatches but please note that it depends on what I am doing and what I want.

I just read your comment. I think for Bowtie maximum mismatches allowed are 3. Please correct me if I am wrong. I just read at WHAM manual- Supports up to 5 errores.

P.S: Just for a note, aligners you mentioned also work with genomes other than Human.

ADD COMMENT
0
Entering edit mode

Thanks Vikas. Yeah, they work on genome other than Human. And your comment about BowTie is also true. Actually, what I am trying to find is what Bioinformatic's community says about required matching errors (if there any document) ? Because, in both of those manuals and for others ((http://en.wikipedia.org/wiki/List_of_sequence_alignment_software), some of them allows two, some of them none, some of three etc.) I found a lots of variations in allowing errors. Now, I have both two (genomic & reads) database and I have to use them for "to find Disease genome" , "Transcriptome analysis of protein-coding and non-protein coding regions" etc (varied works actually). Now, I am trying to find in what setting (means maximum one required) I have to run those tools.

ADD REPLY
0
Entering edit mode

Yes, different aligners will allow different number of maximum mismatches. What do you mean by "disease genome"? Are you interested in finding only SNVs or copy number or inversions or translocation etc. ?

ADD REPLY

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6