I have 40 fasta files with a read length of ~150. A specific part of these 150 bp long reads should contain ~21 bp specific sequences. I have created a local reference.fa file by these 21 bp sequences and want to perform alignment using bowtie1/bowtie2 to see how many reads do have these specific 21 bp sequences with 0 mismatches
.
I tried using bowtie 1 by these options but the output was zero alignment: bowtie -v 0 ref -a --best --strata -f *.fasta > bowtie_output
.
Based on some forums it seems that bowtie can not handle longer reads against a reference with short reads. Could somebody help me to solve this problem?
I would like to go with zero mismatches (-v 0)
and -a --best --strata
options. If I have to use bowtie2 which options of it can translate the -v 0 -a --best --strata
options in bowtie1?
Thank you.
You should use
bbduk.sh
from BBMap suite in filtermode for this purpose, if that is all you want to do.Thank you for your help. Is there a way to perform this by using
bowtie1/bowtie2
? I have 64 reference reads with the length of21 bp
and 40 fasta files - In the case ofbbduk.sh
how can I adjust your suggested command-line and get output for all of my fasta files separately? And the last question of me is that is there an option to increase the number of mismatches inbbduk.sh
?Are you interested in knowing how many reads contain each reference you have? If so you can run each sequence pattern on the command line replacing the
literal=
sequence each time. If you don't actually need to separate the reads then you can eliminateoutm=
directive and you will get just the statistics.If you don't care about individual reference matches, you can create a multi-fasta file with all 64 reads in it and then use the following command line to get all sequences that match one or more of these references).
Yes, that is exactly what. I want to know - [how many reads contain each reference I have]. My reference is a fasta file and I ran this command
for i in `ls -1 *.fasta`; do /Users/bbmap/bbduk.sh in=$i ref=edt_AB_seq.fa hdist=0 outm=$i\sequences_matching.fa; done
but I have no results. But for example, when I cat all my reads (40 fasta files) and Igrep
for one of the reference reads I do e.g. 3 hits - but withbbduk.sh
all of the output files are empty.Here are head of reads: ```
and here is the ref head:
I am not sure what you are doing above in the loop.
Since you need to know
you need to take one pattern e.g.
CCAGGTAGTAGTACGTCTGTT
and then run it against your fasta files to get the reads that match this pattern.Can you explain exactly why aligning the short sequences to the long ones won't work for you?
U can try bowtie1 with the below setting, I tried it with 11bp sequence and it returned something. Caveat here is that the "reads" must be trimmed and not longer than your "reference" if reads are longer than it will not match those.
Come my next question, anyone knows of a way to match those reads that are slightly longer than the reference, for example trimming not proper and a added base is inside my read. I am trying to get bowtie2 in local mode to work, but is failing.