We would like to do some PCRs for 4 genes (e.g. blaX, tetX etc.) that were identified from the metagenomic sequencing data. In order to design primers and set up the PCRs, we need the sequences (from metagenome) of that gene present in the sample. To be able to find regions suitable for primer design, it is important for us, not only to get consensus sequence, but also to get information about positions where different bases are found in different reads due to presence of multiple variants of the respective gene in the sample.
Now I have protein (aa) sequences of the 4 reference genes (~300 aa each) and short 100 bp (nt) over 6000 reads from metagenomes that matched those genes. I used blastx to do that.
blastall -p blastx -i reads.6756.fasta -d ref.fasta -a 16 -F F -e 0.000001 -m 8 -o BLAST.out -K 1 -b 1
Then I tried to convert the blastx
output to sam
or gff
format (couldn't make it to work) so I can get the alignments, the consensus sequences, SNPs and visualize it. Any suggestions?
The output of blastx is a protein pairwise alignment, how could it be converted to SAM?
I guess blastx output (BLAST.out) to SAM that what I need. I tried with two different scripts posted above (link) but didn't work. I remember you posted Blast2Sam script in C some time ago "written by your student". I will try with that. Where exactly did you post that one? Was that to convert blastn output to Sam or any blast output to Sam?
https://github.com/guyduche/Blast2Bam but again, those tools expect a DNA input, not a protein.
Ok. Then option 1. I have to get the Nucleotide sequences of my Reference proteins from NCBI then run the blastn followed by conversion from blast output to Sam format.
Option 2. as I already have the 6000 short reads that matched to the reference protein last time, I can directly map those short 6000 metagenomic matched reads using bowtie or similar aligner to newly obtained reference genes (nt) from NCBI and create the Sam file.