BWA mapping - only mapped the part of the reads, why?

0

Entering edit mode

7.6 years ago

gerberd1990 ▴ 30

Hi all, I have a (probably silly) question: I try to map my reads to a reference using the original bwa software. The .fastq file contains ~560 000 reads (minlength=45), but in the alignment, only a small fragment of the reads appeared (in number, and in length as well). So, I am not (that) surprised, that the read number to the reference is low, because it is from an ancient horse sample (~ 4000 BC), but the read lengths in the alignment are very short (max 10 bp or something). When I check some of the reads in the bwa starter file, I found out that the read lengths are over (or equal) to 45. So, my question is, why does the bwa trim the reads in the alignment? Is this because of the low quality, or am I doing something wrong? Thanks in advance!

bwa • 2.3k views

ADD COMMENT • link updated 7.6 years ago by Devon Ryan 104k • written 7.6 years ago by gerberd1990 ▴ 30

0

Entering edit mode

Please post the command you used.

ADD REPLY • link 7.6 years ago by Devon Ryan 104k

0

Entering edit mode

bwa aln reference.fasta sample_merged_barcode/adapter-trimmed_.fastq > sample.sai
bwa samse reference.fasta sample.sai sample_merged_barcode/adapter-trimmed.fastq > sample_SE.sam

As you can see, the reads were previously quality and adapter trimmed (with cutadapt), and merged with leeHom

ADD REPLY • link updated 7.6 years ago by Devon Ryan 104k • written 7.6 years ago by gerberd1990 ▴ 30

0

Entering edit mode

Is the difference in fastq file name a typo when you posted or did that occur when you ran the command too?

ADD REPLY • link 7.6 years ago by Devon Ryan 104k

0

Entering edit mode

It was just when I posted it, my filenames are far different from these :)

ADD REPLY • link 7.6 years ago by gerberd1990 ▴ 30

0

Entering edit mode

Have you QC'ed the reads for presence of adapter contamination?

ADD REPLY • link 7.6 years ago by GenoMax 141k

0

Entering edit mode

Being a 6000 old sample, probably one has to quality check extensively - for example, I would guess bacterial contamination could be huge. Check papers on Neanderthal and mammoth genomes do see the problems you could face and solutions they used.

Also, why use bwa aln and not bwa mem?

ADD REPLY • link 7.6 years ago by h.mon 35k

Login before adding your answer.