Hi all, I have a (probably silly) question: I try to map my reads to a reference using the original bwa software. The .fastq file contains ~560 000 reads (minlength=45), but in the alignment, only a small fragment of the reads appeared (in number, and in length as well). So, I am not (that) surprised, that the read number to the reference is low, because it is from an ancient horse sample (~ 4000 BC), but the read lengths in the alignment are very short (max 10 bp or something). When I check some of the reads in the bwa starter file, I found out that the read lengths are over (or equal) to 45. So, my question is, why does the bwa trim the reads in the alignment? Is this because of the low quality, or am I doing something wrong? Thanks in advance!
Please post the command you used.
As you can see, the reads were previously quality and adapter trimmed (with cutadapt), and merged with leeHom
Is the difference in fastq file name a typo when you posted or did that occur when you ran the command too?
It was just when I posted it, my filenames are far different from these :)
Have you QC'ed the reads for presence of adapter contamination?
Being a 6000 old sample, probably one has to quality check extensively - for example, I would guess bacterial contamination could be huge. Check papers on Neanderthal and mammoth genomes do see the problems you could face and solutions they used.
Also, why use
bwa aln
and notbwa mem
?