Hello everyone, I am fairly new to QIIME and I am finding it troublesome to work with it.
I have mouse 16S data obtained with Illumina MiSeq and I want to detect the different OTUs in each sample. My problem here is that my final OTU table for all my samples is basically all 0, with some random 1 or a 2, when I was expecting relatively big numbers of mapped reads.
I have done the following. I start with 45 samples, each with 2 FASTQ files of paired-end reads. I used cutadapt to trim the adapters, flash to merge the paired-end FASTQ files into 1, and trimmomatic to filter by quality.
Starting from my merged and trimmed FASTQ file per sample (I am attaching one of them), even though my data is already demultiplexed, I use split_libraries_fastq.py to transform the FASTQ file into FASTA format with the sample ID in each FASTA file entry.
split_libraries_fastq.py -i $FASTQ -o $outDIR1 -m $mapfile --barcode_type 'not-barcoded' --sample_ids $sample -q 0
where $FASTQ is the FASTQ file input for each sample, $outDIR1 is the output directory for each sample, $sample is each sample ID (AHM041trim in this case), and $mapfile is my mapping file for each sample, which in this case looks like:
#SampleID BarcodeSequence LinkerPrimerSequence Description AHM041trim NA NA AHM041trim
Once I obtain my properly formatted seqs.fna file for each sample (I am attaching the one for AHM041trim), I use pick_closed_reference_otus.py to detect the OTUs, using the Silva database as reference (https://www.arb-silva.de/)
pick_closed_reference_otus.py -i $outDIR1/seqs.fna -o $outDIR2 -r $silvafasta -t $silvatax -p $paramfile --parallel --jobs_to_start=2
where $outDIR2 is the output directory for OTU picking results for each sample, $silvafasta is the absolute path to the file SILVA_123.1_SSURef_Nr99_tax_silva.fasta and $silvatax is the absolute path to the taxonomy file taxonomy_all_levels.txt (SILVA123_QIIME_release/taxonomy/16S_only/99), both downloaded from https://www.arb-silva.de/download/archive/
My parameters file $paramfile looks like the following and is the same for all samples:
pick_otus:enable_rev_strand_match True pick_otus:similarity 0.97
After this, I successfully obtain a otu_table.biom per sample (I am attaching the one for AHM041trim). I merge all of them into one using merge_otu_tables.py:
merge_otu_tables.py -i $OTUtables -o $ALLbiom
where $OTUtables is a string of the absolute paths for each sample otu_table.biom, separated by "," and $ALLbiom is the final biom file merging all samples.
Once I have my final biom file with all samples (attached), I convert it into txt for better readability:
biom convert -i $ALLbiom -o $ALLtxt -b --header-key taxonomy
where $ALLtxt is the final table with all samples in txt format
My final table (attached) has the expected format, but I am puzzled to find all 0s everywhere with random 1s and 2s here and there... Is this how it is supposed to look like? I was expecting relatively big mapped read counts, so my guess is I am doing something wrong somewhere, but I cannot figure out what.
I would really appreciate if I could get some help on this, many thanks!!
Please refer to the same question I posted in Google groups (https://groups.google.com/forum/#!topic/qiime-forum/z0ifBb8HFl0) where I was able to upload the files I mention; and note the FASTQ and FASTA files where too big to upload, so I am just uploading the first portion of them. Thanks!
EDIT: There seems to be a problem with the Silva files I use... can anybody provide some guidance on which Silva files to use (https://www.arb-silva.de/download/archive/)? Many thanks!