How To Use Paired End Reads With Rdp Classifier
2
1
Entering edit mode
11.5 years ago
boblowlaws ▴ 10

Hey guys, I was wondering if it's possible to use paired end reads with RDP classifier? I'm looking at the RDP multiclassifier command line tool right now and it says that it takes multiple input files but I'm still now sure. The rRNA data I'm using is the output of Ribopicker. Can anyone help? Thanks

deleted-post • 3.6k views
ADD COMMENT
2
Entering edit mode
11.4 years ago
Josh Herr 5.8k

The answer is YES, of course you can use paired end reads. You'll have to pair them, convert them from FASTQ to FASTA. RDP takes a FASTA file as input so your data just needs to be in that format. If you use their server for 16S, all you need is a single FASTA file

That being said, I'm not certain what you are referring to regarding multiple input files, you didn't give much to go on here. For RDP you'll need your dataset (your FASTA file with your reads) and (basically) that's it. If you've never used RDP classifier before you'll need to "train" the classifier, which basically uses a reference database of your marker gene that has been aligned with precision. If you're using 16S (and now fungal LSU) you can use the training files on the RDP website: RDP classifier.

From my experience using Ribopicker, the rDNA data can be patchy. You'll have to inspect the read length and quality of your data. You didn't give us anything more to go on, so I don't know if your data was derived from whole genome shotgun (metagenomic) data or a suite or single marker amplicon, so I don't know how to help you there.

ADD COMMENT
0
Entering edit mode

Thanks for that, sorry I wasn't more specific, I'm new to all this! I just need to go find a way to pair the 16S sequences, any tips?

ADD REPLY
0
Entering edit mode

Can you be more specific when you say "pair"? Do you mean the processing of your paired end reads or are you referring to clustering of amplicons? What do you want to do?

ADD REPLY
1
Entering edit mode
11.4 years ago

If the question is "may I submit my 2 files: one 3' and 5'?" and RDP classifier understands that directly, the answer is not positive ;-) Indeed, you have to mate the pairs formerly. Assuming your inserts (or whatever the sequenced fragments) are shorter than the sum of both reads which are for example 150bp long, the actual sequenced fragments should be shorter than (roughly) 250bp to mate properly. Currently (for computational time reasons) I use FLASH to perform that step. If your paired-end reads are not long enough to overlap, I really don't know what to advise: either you insert some N between, but I'm not sure RDP classifier could handle sequences with N... either you work with separate files. In the last case, I would recommend to take a look to your quality assessment of both files: usually, one of them (I don't remember if it's the 3' or the 5') is quiet worse than the other. In the last case, you may sometimes consider using only the best one. Eventually, don't forget the "fastq to fasta" step ;-)

ADD COMMENT
1
Entering edit mode

The RDP classifier looks at the frequency of kmers in the sequence without doing an alignment, so inserting N's should be ok. I tested it with one sequence - mock 150 paired ends with 10 N's in the middle, and it predicted correctly.

ADD REPLY

Login before adding your answer.

Traffic: 2727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6