There are sequences I download from MG-RAST that already have primers removed, quality filtering performed, and forward-reverse reads paired. Each file is a fasta file of sequences. Here is some code to get an example sequence file in R:
cur.dir <- 'path/to/test_dir/'
mgm.id <- 'mgm4788103.3'
mgrast.link <- paste0('http://api.metagenomics.anl.gov/1/download/',mgm4788103.3,'?file=050.1')
cmd <- paste0('curl ',mgrast.link,' > ',cur.dir,'/',mgm.id,'.fasta')
system(cmd)
I would like to turn this fasta file into a dereplicated amplicon sequence variant (ASV) table, where the rownames are the sample names (in this case there is only one sample, mgm4788103.3
), the column names are the sequences of the unique sequences in the .fasta
file, and the entries in the table are the number of each unique sequence observed in each sample. This is the output of the makeSequenceTable
function of the dada2
package, a tutorial can be found here. Unfortunately, generating this table in dada2
requires me to start from unpaired reads that have not been quality filtered, which I don't have.
How can I generate this table from this .fasta
file, or a directory of similar .fasta
files?