Hello,
I'm using single digested and paired-end RAD-seq data. I would like to extract mitochondrial DNA data. I would expect to have some sequences derived from mtDNA, however I cannot be sure about that because the species on which I'm working has no sequenced mitogenome.
I aligned cleaned paired-end reads on the mitogenome of a "relatively" closely related species.
bwa mem -t $threads -M $mitogenome $input1 $input2 | samtools sort -@ $threads -m 7G -o ${output}.bam
The output bam files contain reads.
Then, I used the gstacks tool included in stacks v2.0. The gstacks tool builds loci from the single and/or paired-end reads before calling SNPs. The results of such analyses are the following:
Reading BAM headers...
Processing all loci...
done.
Read 79645748 BAM records:
kept 7279 primary alignments (0.0%), of which 7279 reverse reads
skipped 0 primary alignments with insufficient mapping qualities (0.0%)
skipped 8224 excessively soft-clipped primary alignments (0.0%)
skipped 79630245 unmapped reads (100.0%)
Per-sample stats (details in 'gstacks.log.distribs'):
read 4685044.0 records/sample (344176-6873442)
kept 0.0%-0.0% of these
Built 0 loci comprising 0 forward reads and 0 matching paired-end reads; mean insert length was 0.0 (sd: nan).
Would you be able to explain me what is it happening? If bwa mem detected matching reads, why gstacks does not?
1) problems related to the fact that the reference genome is not a closely related species? 2) may be I should perform some alignement which takes into account indels? or ....