Question

unpaired alignment in bowtie2

1

Entering edit mode

4.1 years ago

howenwy2 • 0

Hi

I have about 284 gene sequence in bacteria_A, and a whole genome sequence in bacteria_B, and bacteria_A and bacteria_B are highly homologous, they are different because of the gene arrangement.

I want to find the position of this 284 sequence in bacteria strainB.

Therefore, I use bowtie2 to align those 284 gene sequence to the complete sequence of bacteria_B by bowties:

./bowtie2-build Bacteria_B_referece.txt index_B
./bowtie2  -f -x index_B  -U 284_gene_sequence.txt > output.txt

However, I got this result:

 284 (100.00%) were unpaired; of these:
    0 (0.00%) aligned 0 times
    271 (95.42%) aligned exactly 1 time
    13 (4.58%) aligned >1 times

I also tries this command but it didn't help me

./bowtie2  --local -f -x index_B  -U 284_gene_sequence.txt > output.txt

The gene sequence are about 1000 to 3000 bp, and the complete genome of bacteria_B is 3.5Mbp.

Does anybody know how to fix it?

Ho-Wen Yang

genome alignment sequence gene • 1.0k views

ADD COMMENT • link 4.1 years ago by howenwy2 • 0

score 1 · Answer 1 · 2020-03-19

1

Entering edit mode

4.1 years ago

GenoMax 141k

If you have full gene sequences then use blast or blat to find their locations in whole genome sequence. bowtie2 is not the appropriate program to use.

If neither of these genomes are in assembled form then you would need to assemble bacteria_B genome first and then use fastq reads from bacteria_A using bowtie2 or any other NGS aligner.

ADD COMMENT • link 4.1 years ago by GenoMax 141k

0

Entering edit mode

Hi

Thank you for your reply, I use blast to find the location.

./blastn -db Bacteria_B.fa -query 284_sequence(Bacteria_A).txt -num_descriptions 1 -num_alignments 1 -out test.txt

And I have 284 set like this:

Chromosome corrected+111-379 Length=3869144

 Score = 854 bits (462),  Expect = 0.0
 Identities = 462/462 (100%), Gaps = 0/462 (0%)
 Strand=Plus/Minus

Query  1        ATGGCAGAAAATATGCAGCCAGACAGCCTCGATCGCGGCATTCTCGTTGCCCTGATGGAT  60
                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  2570132  ATGGCAGAAAATATGCAGCCAGACAGCCTCGATCGCGGCATTCTCGTTGCCCTGATGGAT  2570073

Query  61       AATGCCCGTACCGCCTATGCCGAGCTGGCCAAGCAGTTCAACGTCAGTCCGGGCACCATC  120
                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  2570072  AATGCCCGTACCGCCTATGCCGAGCTGGCCAAGCAGTTCAACGTCAGTCCGGGCACCATC  2570013

Query  121      CACGTCCGCGTGGAAAAGATGAAGCAGGCGGGCATCATCAAGGGAACGAGGGTCGAAATA  180
                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  2570012  CACGTCCGCGTGGAAAAGATGAAGCAGGCGGGCATCATCAAGGGAACGAGGGTCGAAATA  2569953

Query  181      GACCCAAAACAGCTTGGCTACGACGTGTGCTGCTTTATCGGCATCATCCTGAAGAGTGCC  240
                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  2569952  GACCCAAAACAGCTTGGCTACGACGTGTGCTGCTTTATCGGCATCATCCTGAAGAGTGCC  2569893

Query  241      AGGGACTATCCTGCTGCGGTGGCGAAACTGGAGCAGCTTGAAGAAGTGGTGGAAGCCTGG  300
                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  2569892  AGGGACTATCCTGCTGCGGTGGCGAAACTGGAGCAGCTTGAAGAAGTGGTGGAAGCCTGG  2569833

Query  301      TACACCACCGGACATTACAGCATCTTTATTAAAGTGATGTGCCGTTCGATCGACGCCCTG  360
                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  2569832  TACACCACCGGACATTACAGCATCTTTATTAAAGTGATGTGCCGTTCGATCGACGCCCTG  2569773

Query  361      CAACAGGTACTGATTAACAAGATCCAGACCATCGATGAGATCCAGTCAACTGAAACCCTG  420
                ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  2569772  CAACAGGTACTGATTAACAAGATCCAGACCATCGATGAGATCCAGTCAACTGAAACCCTG  2569713

Query  421      ATCTCGTTGCAAAACCCGATCATGCGTACGATTATCCCATGA  462
                ||||||||||||||||||||||||||||||||||||||||||
Sbjct  2569712  ATCTCGTTGCAAAACCCGATCATGCGTACGATTATCCCATGA  2569671

How do I do to retrieve the position number from Bacteria_B genome (ex: beginning(2570132) and end(2569671))? I have 284 genes so I couldn't copy and past all the number I need.

ADD REPLY • link 4.1 years ago by howenwy2 • 0

0

Entering edit mode

Here are different ways to extract sequence you need from genome_B:
how to quickly extract sequence from genome positions
Sequence extract from multifasta using blastall coordinates