Question

bowtie2 giving 0% alignment rato across all samples

0

Entering edit mode

3.5 years ago

v.berriosfarias ▴ 140

Hello, I'm using a for loop for making alignment of different short reads (101pb) to an index of contigs across 70 samples via bowtie2, the issue is that I'm always getting 0% alignment rate and 100% discordant matches, which seems strange given the fact that these contigs were built from these reads, I thought that maybe it could be related to my index in a way that the FASTA format was not correct, but when I look at the sam file, there appear the reference sequences at which bowtie2 is aligning to (my 15 contigs)

here are the stats of my reference sequences:

format: FASTA
type: DNA
num_seq: 15
sum_leng 90449
min_len 2628
avg_len 6029.9
max_len 12178

and here is the command that I use:

  for i in *1.fastq.gz 
do base=$(basename $i "_1.fastq.gz")
bowtie2 -p 8 -x /mnt/c/path/NRPS_contigs -1 ${base}_1.fastq.gz -2 ${base}_2.fastq.gz | samtools view -b -o ${base}.bam -
done

and the out for all the 70 samples assigns 0% alignment rate as I said before.

ill be si thankful if you help me with this.

here I add flagstat output from one sample:

samtools flagstat stat EC-BacA.bam

49695006 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
94 + 0 mapped (0.00% : N/A)
49695006 + 0 paired in sequencing
24847503 + 0 read1
24847503 + 0 read2
88 + 0 properly paired (0.00% : N/A)
90 + 0 with itself and mate mapped
4 + 0 singletons (0.00% : N/A)
2 + 0 with mate mapped to a different chr
2 + 0 with mate mapped to a different chr (mapQ>=5)

bowtie2 alignment paired-end reads • 926 views

ADD COMMENT • link 3.4 years ago by v.berriosfarias ▴ 140

2

Entering edit mode

Step 1; can you get the desired results without the loop construct?

ADD REPLY • link 3.5 years ago by swbarnes2 14k

1

Entering edit mode

Indeed, I would try and map a single sample making sure that the two files indeed are the two mate files, and then see how it looks. I mean it most likely is something technical, even in super poor NGS assays with lots of contamination you get at least a few % of aligned reads. 0% is suspicious. Try to validate that the *1 and *2 files are indeed a pair and nothing went wrong during e.g. changing names.

ADD REPLY • link 3.5 years ago by ATpoint 81k

0

Entering edit mode

no, I get the same results without looping (sorry for answering late)

ADD REPLY • link 3.5 years ago by v.berriosfarias ▴ 140

0

Entering edit mode

Additionally, I tested that the paired-end files were the mate files, the only thing that I can suppose is the fact that the contigs that I'm using as a reference sequence (all gathered in a single file) are contigs that were assembled from de Bruijn graphs method using 51-mers, so instead of trying mapping these reads (101pb long), I must generate all 51-mers from each sample before trying to map them against the reference sequences (the contigs) but then realized that this is not correct due to the fact that this assembler should still build these reads from the k-mers. So, in my opinion, its an index issue of the reference sequences, is there anything wrong with my index reference file? (stats are provided in the upper question part and they are provided in FASTA format each of the 15 starting with the ">" character.

ADD REPLY • link 3.5 years ago by v.berriosfarias ▴ 140