Question

Designing primers by mapping short NGS reads to reference genes

0

Entering edit mode

9.3 years ago

bioinfo ▴ 840

We would like to do some PCRs for 4 genes (e.g. blaX, tetX etc.) that were identified from the metagenomic sequencing data. In order to design primers and set up the PCRs, we need the sequences (from metagenome) of that gene present in the sample. To be able to find regions suitable for primer design, it is important for us, not only to get consensus sequence, but also to get information about positions where different bases are found in different reads due to presence of multiple variants of the respective gene in the sample.

Now I have protein (aa) sequences of the 4 reference genes (~300 aa each) and short 100 bp (nt) over 6000 reads from metagenomes that matched those genes. I used blastx to do that.

blastall -p blastx -i reads.6756.fasta -d ref.fasta -a 16 -F F -e 0.000001 -m 8 -o BLAST.out -K 1 -b 1

Then I tried to convert the blastx output to sam or gff format (couldn't make it to work) so I can get the alignments, the consensus sequences, SNPs and visualize it. Any suggestions?

blast sam samtools gff alignment • 2.5k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.3 years ago by bioinfo ▴ 840

0

Entering edit mode

The output of blastx is a protein pairwise alignment, how could it be converted to SAM?

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.3 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

I guess blastx output (BLAST.out) to SAM that what I need. I tried with two different scripts posted above (link) but didn't work. I remember you posted Blast2Sam script in C some time ago "written by your student". I will try with that. Where exactly did you post that one? Was that to convert blastn output to Sam or any blast output to Sam?

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.3 years ago by bioinfo ▴ 840

0

Entering edit mode

https://github.com/guyduche/Blast2Bam but again, those tools expect a DNA input, not a protein.

ADD REPLY • link 9.3 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Ok. Then option 1. I have to get the Nucleotide sequences of my Reference proteins from NCBI then run the blastn followed by conversion from blast output to Sam format.

Option 2. as I already have the 6000 short reads that matched to the reference protein last time, I can directly map those short 6000 metagenomic matched reads using bowtie or similar aligner to newly obtained reference genes (nt) from NCBI and create the Sam file.

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.3 years ago by bioinfo ▴ 840