bowtie1 -m --best --strata options
1
1
Entering edit mode
10.1 years ago
Varun Gupta ★ 1.3k

Hi Everyone,

I want to use bowtie 1 with such parameters

command 1 : -a -m 5 --best

I went through the manual and it explained the situation where these options were used

command 2 : -a -m 5 --best --strata

I want to know what difference would command 1 give than command 2. Manual explained the output of -a -m 5 --best --strata . I want to know what would be the output with -a -m 5 --best (no --strata used).

I ran the 2 commands with exact values but one included --strata and other did not. This was the outcome.

-a --best and --strata -m 10

# reads processed: 7919843
# reads with at least one reported alignment: 163241 (2.06%)
# reads that failed to align: 7733095 (97.64%)
# reads with alignments suppressed due to -m: 23507 (0.30%)
Reported 323300 alignments to 1 output stream(s)

-a --best and -m 10 (no --strata option used)

# reads processed: 7919843
# reads with at least one reported alignment: 152085 (1.92%)
# reads that failed to align: 7733095 (97.64%)
# reads with alignments suppressed due to -m: 34663 (0.44%)
Reported 402672 alignments to 1 output stream(s)

Hope to hear soon.

Thanks in advance

Regards
Varun

bowtie • 5.6k views
ADD COMMENT
0
Entering edit mode

With such poor alignment my guess is that you have either; untrimmed adapter sequences, messed up paired end sequence files, have sequences contaminants or are searching against the wrong species. FASTQC may help narrow this down, as would a blast search against genbank for a handful of sequences.

ADD REPLY
1
Entering edit mode
10.1 years ago

The rate of alignment of your reads is really low. I don't know what is the problem here but I can explain those parameters in general.

-a option tells bowtie to report all the alignments. If any other parameters are specified then -a option will output all the alignments that also satisfy those parameters. For e.g. if -v 1 option is also given then bowtie will output all the alignments that have less than equal to 1 mismatch.

-m 10 option will prevent bowtie to output alignments of reads that have could be aligned to more than 10 locations in genome. -m 1 will only report read or alignment that have an unique hit in the genome.

--best options tells bowtie to output all the alignments but in an increasing order of mismatches. It will still output all the valid alignments as -a is also specified but it will just sort them by printing alignments with lower mismatches and then higher mismatches.

--strata will only report the best alignment for a read. It needs to be used with --best. As --best sorts the alignments based on mismatches, --strata performs an additional filtering and reports the top alignments from the --best.

Both commands will output reads that could be aligned against the genome but number of hits should be less than 10. The only difference in your two commands is that the one with "strata" will only report the best alignment for the read while other one will report all the valid alignments in increasing order of mismatches.

ADD COMMENT
0
Entering edit mode

Hi Ashutosh,

Nice explanation. I got the reads which were suppressed due to -m option in both the commands I used. As you can see from the output of the bowtie for 2 commands the one without strata has more suppress reads than one with with strata. So I took a read which was suppressed in the command where I used parameters without strata. When I searched for that read in the sam file I got this

DRR000559.2180 HWI-EAS420:7:1:44:1274 length=36    4    *    0    0    *    *    0    0    TGAGGTAGTAGATTGTATAGTATCTCGTATGCCCGC    B=ACCACCABBBBCA:?@##################    XM:i:10

When I searched for the same read in the sam file in which strata was used I got this output

DRR000559.2180    16    chrX    53584193    255    36M    *    0    0    GCGGGCATACGAGATACTATACAATCTACTACCTCA    ##################@?:ACBBBBACCACCA=B    XA:i:0    MD:Z:1G1T0A0T0G0A0C1C0T0A1A21    NM:i:11
DRR000559.2180    0    chr9    96938635    255    36M    *    0    0    TGAGGTAGTAGATTGTATAGTATCTCGTATGCCCGC    B=ACCACCABBBBCA:?@##################    XA:i:0    MD:Z:21T0G0T0G0G1G0T0A1T0G0A0T0T0    NM:i:13

Now this read also would have hit more than 10 places but was retained because the stratum XA:i:0 which is the best alignment stratum has 0 mismatches(2 alignments for a read) and other hits might have different mismatches but best stratum was retained.

Do you think that is what is happening here?

Varun

ADD REPLY

Login before adding your answer.

Traffic: 2609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6