Question: Align metagenomic shotgun sequencing reads to primers
0
gravatar for Crystal
3.3 years ago by
Crystal30
United States
Crystal30 wrote:

Hi All,

I'm trying to use BWA to align metagenomics reads (~100bp) to six primers (~15bp).

All the steps work fine but when I convert sam file to readable file, there was no results at all.

Is it appropriate to use BWA to align reads to super short primer sequences or is there other way I can do this?

Thanks

bwa alignment metagenomics • 1.4k views
ADD COMMENTlink modified 3.3 years ago by apt.university70 • written 3.3 years ago by Crystal30

Post the commands you are running. And what are you trying to accomplish? What makes sense to me is assemble reads into contigs, then map primers into contigs.

ADD REPLYlink written 3.3 years ago by h.mon24k
​bwa aln index input1p.fastq > 1p.sai; bwa aln index input2p.fastq > 2p.sai; bwa sampe -n 1000 -N 1000 index 1p.sai 2p.sai input1p.fastq input2p.fastq >1.sam;

Above is the code i used, those are just basic bwa code. After that I used a java script to convert sam file into readable file.What i was trying to do was to check if some of our reads matched to those primers. i don't know if bwa can deal with long contigs. Our reads length has already been much longer than primer length. 

 

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Crystal30
0
gravatar for h.mon
3.3 years ago by
h.mon24k
Brazil
h.mon24k wrote:

You should map the primers into your reads, not the opposite. If you have only six primers, you could try just to use grep to find the primers on the reads files.

Anyway, I would assemble the reads and then map the primers into the contigs, not against the reads. I have used bowtie for this, but bwa should work as well - but I think bwa mem is better (certainly simpler) than bwa sampe.

ADD COMMENTlink written 3.3 years ago by h.mon24k
0
gravatar for apt.university
3.3 years ago by
United States
apt.university70 wrote:

BWA is a software package for mapping reads against a large reference genome. BWA is fast but not as sensitive as other programs, ex., blast or needle. To achieve great speed, BWA requires the reads to show low-divergence from the reference (see program description at http://bio-bwa.sourceforge.net/) and uses some heuristics (tricks) that assume that there is a long stretch of DNA that matches exactly between the ref and the reads. This is called a seed. Depending on the version of BWA you are using, this can be between 19 to  32 bases by default. In your case, you are most likely not getting any hits before your references are shorter than the seed length.

If you are looking for exact matches, then you could use grep, as suggested by @h.mon. This would require putting the sequence and the id on the same line, so that you can retrieve the hits. 

 

 

 

 

ADD COMMENTlink written 3.3 years ago by apt.university70

"If you are looking for exact matches, then you could use grep, as suggested by @h.mon. This would require putting the sequence and the id on the same line, so that you can retrieve the hits."

No, just use the 'B' (before) and 'A' (after) flags to return the lines that precede and follow the match.

ADD REPLYlink written 3.3 years ago by harold.smith.tarheel4.3k

@harold.smith.tarheel: -B (or -A)  would be ideal if the sequence if formatted as one line. If the sequences are formatted on 2 lines (60 or 80 bases per line, as is often the case with Fasta files), -B 2 (or -A 2) would return DNA sequence from the previous record.

 

 

ADD REPLYlink written 3.3 years ago by apt.university70

The code from the OP shows that the input is FASTQ (not FASTA), which specifies a four-line format (identifier, sequence, optional, quality) for each sequence/read. So 'grep -B 1 -A 2' would return the entire entry for the matching sequence.

Note, this assumes that the 15bp primer string is not present in the other lines (unlikely but not impossible), so it might be preferable to add a secondary criterion, such as the initial character(s) of the identifier line (which is usually identical for every read).

Edit: Actually, Sanger PHRED scores do not include 'T', so secondary filtering is unnecessary. The primer string would match only the sequence line (unless the quality scores are old Illumina PHRED+64, or the primer contains only A/C/G).

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by harold.smith.tarheel4.3k

Well, thank you so much for your help. So now I just need to use command like "grep -B 1 -A 2 'primer sequence' file.fastq"?

ADD REPLYlink written 3.3 years ago by Crystal30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1076 users visited in the last hour