I recently ran NCBI's AMRFinderPlus to detect antimicrobial resistance on two different datasets of clinical isolates of Mycobacterium tuberculosis.
Since the input file requires is a fasta file, I had fastq input files which I then aligned to a reference genome using BWA-MEM, used bcftools to create vcf files (variant calling), compressed and indexed them to then run bcftools consensus
to generate a consensus genome assembly fasta file for each single isolate to use as input for AMRFinder. However, the two different datasets had very similar results (which doesn't make much biological sense) and I was wondering if I should follow a different approach to generate the fasta input files from the fastq isolate files. I saw SPAdes is common in this aspect but I read it's best for de novo assembly, and since I'm interested in SNPs I have to do a reference-based assembly.
Here's one of the output tsv files from AMRFinder:
Protein id Contig id Start Stop Strand Element symbol Element name Scope Type Subtype Class Subclass Method Target length Reference sequence length % Coverage of reference % Identity to reference Alignment length Closest reference accession Closest reference name HMM accession HMM description
NA NC_000962.3 314347 314889 - aac(2')-Ic aminoglycoside N-acetyltransferase AAC(2')-Ic core AMR AMR AMINOGLYCOSIDE GENTAMICIN/TOBRAMYCIN EXACTX 181 181 100.00 100.00 181 WP_003899880.1 aminoglycoside N-acetyltransferase AAC(2')-Ic NA NA
NA NC_000962.3 2231620 2232156 + erm(37) 23S rRNA (adenine(2058)-N(6))-methyltransferase Erm(37) core AMR AMR LINCOSAMIDE/MACROLIDE AZITHROMYCIN/CLARITHROMYCIN/CLINDAMYCIN/ERYTHROMYCIN/TELITHROMYCIN EXACTX 179 179 100.00 100.00 179 WP_003900446.1 23S rRNA (adenine(2058)-N(6))-methyltransferase Erm(37) NA NA
NA NC_000962.3 2325827 2326747 - blaC class A beta-lactamase BlaC core AMR AMR BETA-LACTAM BETA-LACTAM EXACTX 307 307 100.00 100.00 307 WP_003410677.1 class A beta-lactamase BlaC NA NA
the only differences I saw among the two datasets were ONLY in the start and stop columns, the rest - subclass, closest reference accession, closest reference name... - were exactly the same. One of the datasets is multiresistant and the other is monoresistant, I've already verified with Mykrobe, TBProfiler and ResFinder.
Has anyone here used AMRFinder with clinical isolate fastq files before? If so, how did you obtain the fasta files to use as input?
Please read through this thread --> Generating consensus sequence from bam file
You may want to do
samtools consensus
instead of the procedure above and see if that helps. Michael's answer explains what happens when one follows the procedure you used above.