Question

How to map longer reads against a local reference using Bowtie

0

Entering edit mode

3.5 years ago

Apex92 ▴ 280

I have 40 fasta files with a read length of ~150. A specific part of these 150 bp long reads should contain ~21 bp specific sequences. I have created a local reference.fa file by these 21 bp sequences and want to perform alignment using bowtie1/bowtie2 to see how many reads do have these specific 21 bp sequences with 0 mismatches.

I tried using bowtie 1 by these options but the output was zero alignment: bowtie -v 0 ref -a --best --strata -f *.fasta > bowtie_output .

Based on some forums it seems that bowtie can not handle longer reads against a reference with short reads. Could somebody help me to solve this problem?

I would like to go with zero mismatches (-v 0) and -a --best --strata options. If I have to use bowtie2 which options of it can translate the -v 0 -a --best --strata options in bowtie1?

Thank you.

alignment bowtie sequence mismatch RNA-Seq • 2.1k views

ADD COMMENT • link updated 2.9 years ago by LI_zhihui • 0 • written 3.5 years ago by Apex92 ▴ 280

1

Entering edit mode

want to perform alignment using bowtie1/bowtie2 to see how many reads do have these specific 21 bp sequences with 0 mismatches.

You should use bbduk.sh from BBMap suite in filtermode for this purpose, if that is all you want to do.

bbduk.sh in=your.fq literal=21_bp_sequence_1,21_bp_sequence2 etc hdist=0 outm=sequences_matching.fq

ADD REPLY • link 3.5 years ago by GenoMax 142k

0

Entering edit mode

Thank you for your help. Is there a way to perform this by using bowtie1/bowtie2? I have 64 reference reads with the length of 21 bp and 40 fasta files - In the case of bbduk.sh how can I adjust your suggested command-line and get output for all of my fasta files separately? And the last question of me is that is there an option to increase the number of mismatches in bbduk.sh?

ADD REPLY • link 3.5 years ago by Apex92 ▴ 280

1

Entering edit mode

Are you interested in knowing how many reads contain each reference you have? If so you can run each sequence pattern on the command line replacing the literal= sequence each time. If you don't actually need to separate the reads then you can eliminate outm= directive and you will get just the statistics.

If you don't care about individual reference matches, you can create a multi-fasta file with all 64 reads in it and then use the following command line to get all sequences that match one or more of these references).

bbduk.sh in=your.fq ref=multi_fasta_ref.fa hdist=0 outm=sequences_matching.fq

ADD REPLY • link 3.5 years ago by GenoMax 142k

0

Entering edit mode

Yes, that is exactly what. I want to know - [how many reads contain each reference I have]. My reference is a fasta file and I ran this command for i in `ls -1 *.fasta`; do /Users/bbmap/bbduk.sh in=$i ref=edt_AB_seq.fa hdist=0 outm=$i\sequences_matching.fa; done but I have no results. But for example, when I cat all my reads (40 fasta files) and I grep for one of the reference reads I do e.g. 3 hits - but with bbduk.sh all of the output files are empty.

ADD REPLY • link 3.5 years ago by Apex92 ▴ 280

0

Entering edit mode

Here are head of reads: ```

MN00153:75:000H37WNG:1:11102:26354:1789 GCATACACAGTAGCAATTTAGTTTTTCTTCAGTTGCTTTATGACTCTGAAACCTAATTAAATGTAACATATTTTAACCAAAATTCATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATTACTCGATCTCGTATGCCGTCTTCTGCT MN00153:75:000H37WNG:1:11102:14072:2278 TGACTTCTGAAGGTAATTGGTTGCATCAGATCTTTGTTTTTCAGACCCAGTGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATTACTCGATATCGTATGCCGTCTTCTGCTTGAAAAAAAAAATAGGGGGGGGGGGGGGGGGGGG ```

and here is the ref head:

>PLAU_fw
CCAGGTAGTAGTACGTCTGTT
>PU_fw_rc
AACAGACGTACTACTACCTGG
>PU_rv
CGTGCTGCGAGAGTATTATCT
>PU_rv_rc
AGATAATACTCTCGCAGCACG
>AX_fw
CGGCCTCCTCCAATTAAAGAA

ADD REPLY • link 3.5 years ago by Apex92 ▴ 280

0

Entering edit mode

I am not sure what you are doing above in the loop.

Since you need to know

how many reads contain each reference I have

you need to take one pattern e.g. CCAGGTAGTAGTACGTCTGTT and then run it against your fasta files to get the reads that match this pattern.

bbduk.sh in=your.fa literal=CCAGGTAGTAGTACGTCTGTT k=10 outm=PLAU_fw.fa hdist=0

ADD REPLY • link 3.5 years ago by GenoMax 142k

0

Entering edit mode

Can you explain exactly why aligning the short sequences to the long ones won't work for you?

ADD REPLY • link 3.5 years ago by swbarnes2 14k

0

Entering edit mode

U can try bowtie1 with the below setting, I tried it with 11bp sequence and it returned something. Caveat here is that the "reads" must be trimmed and not longer than your "reference" if reads are longer than it will not match those.

Come my next question, anyone knows of a way to match those reads that are slightly longer than the reference, for example trimming not proper and a added base is inside my read. I am trying to get bowtie2 in local mode to work, but is failing.

bowtie --verbose --best -a \
-n 0 -l 10 \
fastq/reference/sample3k \
fastq/test_trimmed11bp_1.fastq.gz \
result/test_trimmed11bp_1

ADD REPLY • link 2.9 years ago by LI_zhihui • 0