We would like to do some PCRs for 4 genes (e.g. blaX, tetX etc.) that were identified from the metagenomic sequencing data. In order to design primers and set up the PCRs, we need the sequences (from metagenome) of that gene present in the sample. To be able to find regions suitable for primer design, it is important for us, not only to get consensus sequence, but also to get information about positions where different bases are found in different reads due to presence of multiple variants of the respective gene in the sample.
Now I have protein (aa) sequences of the 4 reference genes (~300 aa each) and short 100 bp (nt) over 6000 reads from metagenomes that matched those genes. I used blastx to do that.
blastall -p blastx -i reads.6756.fasta -d ref.fasta -a 16 -F F -e 0.000001 -m 8 -o BLAST.out -K 1 -b 1
Then I tried to convert the
blastx output to
gff format (couldn't make it to work) so I can get the alignments, the consensus sequences, SNPs and visualize it. Any suggestions?