bowtie2 messes up headers with sequences in output
1
0
Entering edit mode
6.4 years ago
chefarov ▴ 170

I have a query.fasta file that looks like:

>seq0 hello_world
GAACCTAAGTACGCG
...
>seq83 hello_world
CACGCGGCTAGTACG
...
>seq1170 hello_world
CGTACTAGCCGCGTG
...
>seq4420 hello_world
CGCGTACTTAGGTTC
...

Every sequence in this file is unique. However when I use bowtie2 to map these reads to a RefSeq genome

bowtie2 -x GCF_ref -p 12 --end-to-end -f -U query.fasta -S result.sam

I get:

seq0       16     chromosA    456940    3    15M    *    0    0    CGCGTACTTAGGTTC    IIIIIIIIIIIIIII    AS:i:-6    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:7G7    YT:Z:UU
seq83      16    chromosB    869078    42   15M    *    0    0    CGTACTAGCCGCGTG    IIIIIIIIIIIIIII    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:15    YT:Z:UU
seq1170    0    chromosB    869078    42   15M    *    0    0    CGTACTAGCCGCGTG    IIIIIIIIIIIIIII    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:15    YT:Z:UU
seq4420    0    chromosA    456940    3    15M    *    0    0    CGCGTACTTAGGTTC    IIIIIIIIIIIIIII    AS:i:-6    XN:i:0    XM:i:1    XO:i:0    XG:i:0    NM:i:1    MD:Z:7G7    YT:Z:UU

How is it even possible that sl1170's sequence (CGTACTAGCCGCGTG) is also found in sl83's row? The same happens for seq4420 ~ seq0 pair. This also happens for every mapped sequence (having a duplicate pair in sam)

Any ideas?

bowtie2 sam • 1.5k views
ADD COMMENT
2
Entering edit mode
6.4 years ago
michael.ante ★ 3.8k

Hi chefarov,

If you have a look at the second column, you'll see the read flags. "16" means that the read maps to the reverse strand. If you create these sequences' reverse complement, you'll see, that seq0's rev. complement is the same sequence as seq4420's one.

Cheers,

Michael

ADD COMMENT

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6