Question

Phyloseq and ASV names between datasets

0

Entering edit mode

6 months ago

pablo ▴ 300

Hi,

I ran DADA2 on 3 16S-datasets , what went well.

Then, I imported the output files to phyloseq in R to create some abundances tables, graphs etc.

My problem is to correlate the ASV names between the 3 abundance tables. I mean, each ASV will be named ASV1, ASV2 ... for each dataset, but the ASV1 won't be the same ASV1 between each table.

For the moment, I have my 3 phyloseq objects from which I removed any contaminants. I show one :

ps.pool1 
phyloseq-class experiment-level object otu_table()   
OTU Table:         [ 1660 taxa and 80 samples ]  sample_data() 
Sample Data: [ 80 samples by 12 sample variables ] tax_table()   
Taxonomy table:   [ 1660 taxa by 8 taxonomic ranks ]

Then, I replace each ASV sequence by a generic name :

dna <- Biostrings::DNAStringSet(taxa_names(ps.pool1))
names(dna) <- taxa_names(ps.pool1)
ps.pool1 <- merge_phyloseq(ps.pool1, dna)
taxa_names(ps.pool1) <- paste0("ASV", seq(ntaxa(ps.pool1)))


taxa_names(ps.pool1)
   [1] "ASV1"    "ASV2"    "ASV3"    "ASV4"    "ASV5"    "ASV6"    "ASV7"   
   [8] "ASV8"    "ASV9"    "ASV10"   "ASV11"   "ASV12"   "ASV13"   "ASV14"  
  [15] "ASV15"   "ASV16"   "ASV17"   "ASV18"   "ASV19"   "ASV20"   "ASV21"  
  [22] "ASV22"   "ASV23"   "ASV24"   "ASV25"   "ASV26"   "ASV27"   "ASV28"

What I need is to give the same ASV name to the same ASV sequences, between the 3 datasets. In order to compare the three abudance tables.

Any help? Best

dada2 phyloseq 16S asv • 762 views

ADD COMMENT • link updated 6 months ago by antonioggsousa 3.2k • written 6 months ago by pablo ▴ 300

0

Entering edit mode

Hi,

I'm not sure if I understood completely your problem. If so, there are several ways to address this issue:

The easiest is to give the ASV1...ASVn names to the object you obtain from DADA2. Before doing this you should keep a mapping table mapping each new ASV id to the ASV sequence for future reference. Then, all your downstream tables should match and correspond to this one.
Create a table matching each ASV id, i.e., ASV1, ASV2...etc, to the ASV sequence. Then you use this table to order or match ASV across tables.
Use the ASV sequences throughout the analyses. I know this is less convenient due to its size, but it is a possibility.

It is quite difficult for me to exemplify this as I don't have or know your objects.

If you wanna try the option (1) you may want to check this tutorial I made about DADA2 awhile ago (check section 7 - link).

I hope this helps.

Best,

António

ADD REPLY • link 6 months ago by antonioggsousa 3.2k

0

Entering edit mode

Thanks for your reply.

I meant, for example, the "ASV1" needs to correspond to the same biological sequence, through my 3 datasets. That's why your first option, doing a mapping table, could be the solution.

I have this object, where each sequence has a "ASVx" name for the 3 datasets. I need the ASV1 sequence in the dataset1 is the same in the dataset 2 for example.

> refseq(ps.pool1)
DNAStringSet object of length 1576:
       width seq                                            names
   [1]  1403 AACGAACGCTGGCGGCAGGCTT...GTAGGGTCAGCGACTGGGGTG ASV1
   [2]  1450 GATGAACGCTGGCGGCGTGCTT...GTGGGACCGGCGATTGGGACT ASV2
   [3]  1474 GATGAACGCTGGCGGCGTGCCT...GTGGGACAGATGATTGGGGTG ASV3
   [4]  1474 GATGAACGCTGGCGGCGTGCCT...GTGGGACAGATGATTGGGGTG ASV4
   [5]  1455 ATTGAACGCTGGCGGCATGCTT...GCGGGGTTCGTGACTGGGGTG ASV5
   ...   ... ...
[1572]  1446 GATGAACGCTAGCGGCAGGCCT...GTTATACCAATGACTGGGGCT ASV1572
[1573]  1442 AATGAACGTTGGCGGCGTGGAT...ATGAAACTCTTGATCGGGACT ASV1573
[1574]  1458 AACGAACGCTGGCGGCGTGCTT...GTTGTGGTCGCGATTGGGGTG ASV1574
[1575]  1460 ATTGAACGCTGGCGGAATGCTT...GTAGTATTCATGACTGGGGTG ASV1575
[1576]  1169 TCCTTTCCCCGCAGGCGTCGCA...GACGCTCTCTCACATACGATG ASV1576

refseq(ps.pool2)
DNAStringSet object of length 1194:
       width seq                                            names
   [1]  1474 GATGAACGCTGGCGGCGTGCCT...GTGGGACAGATGATTGGGGTG ASV1
   [2]  1453 ATTGAACGCTGGCGGCATGCCT...GCAGGGTTCGTGACTGGGGTG ASV2
   [3]  1459 ATTGAACGCTGGCGGCAGGCCT...GTGTGATTCATGACTGGGGTG ASV3
   [4]  1447 GACGAACGCTGGCGGCGTGCTT...GTGGGACTGGTGATTAGGACT ASV4
   [5]  1456 ATTGAACGCTGGCGGCAGGCCT...TTGTGATTCATGACTGGGGTG ASV5
   ...   ... ...
[1190]  1843 GTTCTTTTAGGGATTGTAGCCT...TACAATTGAACTACGCACTAA ASV1190
[1191]  1978 TTAACACTTCGATCTGCCCCCC...CGGTGTCAGATTGCTGCTCAT ASV1191
[1192]  1293 TTCCGGGACGAATCGTACGAGA...CGCTCCAGCCTCGGCTGCTTC ASV1192
[1193]  1197 AACCGTGCGCAGGAGTGGGATG...GTACCTGTGGGGGATCATGGC ASV1193
[1194]  1027 TGCAAGAGGCGCAATCTGACCG...TATCAATGCGATAGTTGACGT ASV1194

I let you know, Best

ADD REPLY • link 6 months ago by pablo ▴ 300

0

Entering edit mode

Hi again, I am struggling a bit. I am able to export as a fasta file my refseq(ps.pool) objects with :

rs <- refseq(ps.pool1)
tax <- tax_table(ps.pool1) 
tax_strings <- apply(tax, 1, paste, collapse=";")    
new_names <- paste(taxa_names(ps), tax_strings, sep = " ")
names(rs) <- new_names
Biostrings::writeXStringSet(rs, "./sequences_with_tax_pool1.fasta")


less sequences_with_tax_pool1.fasta
>ASV1 Bacteria;Proteobacteria;Alphaproteobacteria;Rhizobiales;Rhizobiaceae;Allorhizobium-Neorhizobium-Pararhizobium-Rhizobium;NA;NA
AACGAACGCTGGCGGCAGGCTTAACACATGCAAGTCGAGCGCCCCGCAAGGGGAGCGGCAGACGGGTGAGTAACGCGTGG
GAATCTACCCTTGACTACGGAATAACGCAGGGAAACTTGTGCTAATACCGTATGTGTCCTTCGGGAGAAAGATTTATCGG
TCAAGGATGAGCCCGCGTTGGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCCATAGCTGGTCTGAGAG
GATGATCAGCCACATTGGGACTGAGACACGGCCCAAACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGC
AAGCCTGATCCAGCCATGCCGCGTGAGTGATGAAGGCCCTAGGGTTGTAAAGCTCTTTCACCGGAGAAGATAATGACGGT
ATCCGGAGAAGAAGCCCCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGTTCGGAATTACTG
GGCGTAAAGCGCACGTAGGCGGATCGATCAGTCAGGGGTGAAATCCCAGGGCTCAACCCTGGAACTGCCTTTGATACTGT
CGATCTGGAGTATGGAAGAGGTGAGTGGAATTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAGGAACACCAGTGGC

I can create a matching "ASV-sequence" fasta file for my 3 datasets.When a same sequence is found in 2 or 3 fasta files, I check the headers and rename them to get only one header for the sequence. Something like that.

But I don't know how to create a "table" in R with this fasta file, and then, match ASV names for the 3 datasets..

ADD REPLY • link 6 months ago by pablo ▴ 300