Bowtie1 sam output not including query sequence
1
0
Entering edit mode
2.3 years ago

Hi, I've been running bowtie1 where I've been querying a list of 20-mers in a single fasta file against the drosophila genome as my index. I've successfully been able to generate a sam file, however after converting my sam file to .bed the bed file does not contain the associated query sequence with each hit.

I used samtools to convert the .sam file to bam and bedtools to convert bam to bed. When running samtools view, each line was outputting continuously :

W::sam_parse1] empty query name
[W::sam_parse1] empty query name
[W::sam_parse1] empty query name
[W::sam_parse1] empty query name
[W::sam_parse1] empty query name
[W::sam_parse1] empty query name
[W::sam_parse1] empty query name

Does this mean bowtie 1 is failing to output the query in the sam file, since I don't see it when I search the sam file. If so, how would I be able to include my query in the output of bowtie1

Here's the code I used in each program:

bowtie test kmer_CR43193.fasta luck.sam -v 2 --threads 6 -f --sam
sudo samtools view -S -b luck.sam > luck.bam
bedtools bamtobed -i luck.bam > luck.bed
bedtools samtools bowtie • 1.5k views
ADD COMMENT
0
Entering edit mode

Can you show us the output of

$ cat luck.sam | grep -v -e "^@SQ" -e "^@PG" -e "^@HD" | head -4
ADD REPLY
0
Entering edit mode

the FBgn genes are part of my genome index, but not sure what's on the left most column.

    0       FBgn0262822_loc=X:20864813..20865731;   4       255     20M     *       0       0       GCTGGGGTAAAATGGTAGGA                                IIIIIIIIIIIIIIIIIIII     XA:i:0  MD:Z:20 NM:i:0
    16      FBgn0085387_loc=X:complement(20761071..20927050);       62216   255     20M     *       0       0      TCCTACCATTTTACCCCAGC                 IIIIIIIIIIIIIIIIIIII     XA:i:0  MD:Z:20 NM:i:0
    0       FBgn0261963_intergenic_FBgn0266825_loc=2L:5467611..5486904;     17815   255     20M     *       0      0                                    GCTGGGGTAAAATGGTAGGA     IIIIIIIIIIIIIIIIIIII    XA:i:2  MD:Z:5T5C8      NM:i:2
    0       FBgn0046225_intergenic_FBgn0267141_loc=3R:11467373..11486787;   1687    255     20M     *       0      0                                    GCTGGGGTAAAATGGTAGGA     IIIIIIIIIIIIIIIIIIII    XA:i:2  MD:Z:7G8T3      NM:i:2
ADD REPLY
0
Entering edit mode

Indeed there are no read names. I think what happened is that you needed to let bowtie know that your input was in fasta format but you did not do that correctly. Try this (-f needs to be before the fasta file name) :

bowtie test -f kmer_CR43193.fasta luck.sam -v 2 --threads 6 --sam

Then show us the command output from my command again. It should be correct this time.

Let us make sure that your fasta file does have headers. Show us the output of

grep "^>" kmer_CR43193.fasta | head -3
ADD REPLY
0
Entering edit mode

Its still outputting the same .sam file.

    0       FBgn0262822_loc=X:20864813..20865731;   4       255     20M     *       0       0       GCTGGGGTAAAATGGTAGGA                                IIIIIIIIIIIIIIIIIIII     XA:i:0  MD:Z:20 NM:i:0
    16      FBgn0085387_loc=X:complement(20761071..20927050);       62216   255     20M     *       0       0      TCCTACCATTTTACCCCAGC                 IIIIIIIIIIIIIIIIIIII     XA:i:0  MD:Z:20 NM:i:0
    0       FBgn0261963_intergenic_FBgn0266825_loc=2L:5467611..5486904;     17815   255     20M     *       0      0                                    GCTGGGGTAAAATGGTAGGA     IIIIIIIIIIIIIIIIIIII    XA:i:2  MD:Z:5T5C8      NM:i:2
    0       FBgn0046225_intergenic_FBgn0267141_loc=3R:11467373..11486787;   1687    255     20M     *       0      0                                    GCTGGGGTAAAATGGTAGGA     IIIIIIIIIIIIIIIIIIII    XA:i:2  MD:Z:7G8T3      NM:i:2

my fasta has labels for all the kmers,

> CR_43193_1
> CR_43193_2
> CR_43193_3
ADD REPLY
1
Entering edit mode
2.3 years ago
GenoMax 141k

I think you have a space in your fasta headers between > and actual name. So the read name in your case is set to space and what you are seeing above is column 2 in SAM file.

Removing those spaces and with some dummy phiX data I see no problems. So remove those spaces between > and CR.

$ more test1.fa
>CR_43193_1
CTCTACTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGT
>CR_43193_2
GCGGTAGGTTTTCTGCTTAGGAGTTTAATCATGTTTCAGACTTTTAT
>CR_43193_3
GCCTTATGGTTACAGTATGCCCATCGCAGTTCGCTACACGCAGGAC

$ bowtie phiX_genome -f test1.fa --sam test.sam

$ cat test.sam | grep -v -e "^@SQ" -e "^@PG" -e "^@HD"

CR_43193_1      0       phix    1186    255     45M     *       0       0       CTCTACTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGT   IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
IIIIIIIIIII     XA:i:0  MD:Z:45 NM:i:0  XM:i:1
CR_43193_2      0       phix    2365    255     47M     *       0       0       GCGGTAGGTTTTCTGCTTAGGAGTTTAATCATGTTTCAGACTTTTAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
IIIIIIIIIIIII   XA:i:0  MD:Z:47 NM:i:0  XM:i:1

CR_43193_3      0       phix    4877    255     46M     *       0       0       GCCTTATGGTTACAGTATGCCCATCGCAGTTCGCTACACGCAGGAC  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
IIIIIIIIIIII    XA:i:0  MD:Z:46 NM:i:0  XM:i:1
ADD COMMENT
0
Entering edit mode

Yes, this fixed the issue. Thanks so much for your time and fast responses!

ADD REPLY

Login before adding your answer.

Traffic: 1927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6