Question: Generate an amplicon sequence variant table without quality data using dada2 or similar in R
1
gravatar for caverill
12 months ago by
caverill40
caverill40 wrote:

There are sequences I download from MG-RAST that already have primers removed, quality filtering performed, and forward-reverse reads paired. Each file is a fasta file of sequences. Here is some code to get an example sequence file in R:

cur.dir <- 'path/to/test_dir/'
mgm.id <- 'mgm4788103.3'
mgrast.link <- paste0('http://api.metagenomics.anl.gov/1/download/',mgm4788103.3,'?file=050.1')
cmd <- paste0('curl ',mgrast.link,' > ',cur.dir,'/',mgm.id,'.fasta')
system(cmd)

I would like to turn this fasta file into a dereplicated amplicon sequence variant (ASV) table, where the rownames are the sample names (in this case there is only one sample, mgm4788103.3), the column names are the sequences of the unique sequences in the .fasta file, and the entries in the table are the number of each unique sequence observed in each sample. This is the output of the makeSequenceTable function of the dada2 package, a tutorial can be found here. Unfortunately, generating this table in dada2 requires me to start from unpaired reads that have not been quality filtered, which I don't have.

How can I generate this table from this .fasta file, or a directory of similar .fasta files?

ADD COMMENTlink modified 12 months ago • written 12 months ago by caverill40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1398 users visited in the last hour