Entering edit mode
9 weeks ago
QX
▴
80
Hi all,
I have a template sequence that I want to map all my FASTQ reads against. The template sequence is around 300 bp long, with two regions of approximately 20 bp each that contain variable bases; the rest is fixed.
I have used BWA-MEM before, but it is more suitable for mapping to a reference genome with global alignment. Can anyone suggest a suitable method for this?
Where are these located in the reference?
bwa-mem2
aligns reads to a small reference without problems. It may depend on what you are exactly looking for. Just go ahead and try it.Otherwise doing a MSA will work too. Problematic part may be "all my reads", in case you have millions. You could use a clustering program like
clumpify.sh
from BBMap suite and reduce the number down significantly.the template is look like this: -----[fix base ~100bp]-------[20bp varying/diverse]------[fix-remain]---[20bp varying/diverse]-------[fix base ~100bp]-----
shall I set 20bp as NNNNNx5 in the template?
Not clear what you are trying to do here. Are you looking for quantify the "diverse" tags in your data or is it something else?
so I have a template with that design, where some bases are fixed, others are diverse on purpose. However, those reads that generated from this temple are not always 'good'; some reads for perfectly align with temples in the fix region, but some have a shift in 1-2 bases or technical mismatch in the fix region. I want to use the mapping the re-align or detect the indel/deletion/technical error in these reads
Without more details about your problem, I'd suggest you to look at other amplicon-analysis workflows like dada2 or qiime2. These workflows originate from 16S/ITS-amplicon analysis.