Question

How To Use Paired End Reads With Rdp Classifier

1

Entering edit mode

11.5 years ago

boblowlaws ▴ 10

Hey guys, I was wondering if it's possible to use paired end reads with RDP classifier? I'm looking at the RDP multiclassifier command line tool right now and it says that it takes multiple input files but I'm still now sure. The rRNA data I'm using is the output of Ribopicker. Can anyone help? Thanks

deleted-post • 3.6k views

ADD COMMENT • link updated 11.5 years ago by Manu Prestat 4.1k • written 11.5 years ago by boblowlaws ▴ 10

score 2 · Answer 1 · 2012-11-12

The answer is YES, of course you can use paired end reads. You'll have to pair them, convert them from FASTQ to FASTA. RDP takes a FASTA file as input so your data just needs to be in that format. If you use their server for 16S, all you need is a single FASTA file

That being said, I'm not certain what you are referring to regarding multiple input files, you didn't give much to go on here. For RDP you'll need your dataset (your FASTA file with your reads) and (basically) that's it. If you've never used RDP classifier before you'll need to "train" the classifier, which basically uses a reference database of your marker gene that has been aligned with precision. If you're using 16S (and now fungal LSU) you can use the training files on the RDP website: RDP classifier.

From my experience using Ribopicker, the rDNA data can be patchy. You'll have to inspect the read length and quality of your data. You didn't give us anything more to go on, so I don't know if your data was derived from whole genome shotgun (metagenomic) data or a suite or single marker amplicon, so I don't know how to help you there.

score 1 · Answer 2 · 2012-11-14

If the question is "may I submit my 2 files: one 3' and 5'?" and RDP classifier understands that directly, the answer is not positive ;-) Indeed, you have to mate the pairs formerly. Assuming your inserts (or whatever the sequenced fragments) are shorter than the sum of both reads which are for example 150bp long, the actual sequenced fragments should be shorter than (roughly) 250bp to mate properly. Currently (for computational time reasons) I use FLASH to perform that step. If your paired-end reads are not long enough to overlap, I really don't know what to advise: either you insert some N between, but I'm not sure RDP classifier could handle sequences with N... either you work with separate files. In the last case, I would recommend to take a look to your quality assessment of both files: usually, one of them (I don't remember if it's the 3' or the 5') is quiet worse than the other. In the last case, you may sometimes consider using only the best one. Eventually, don't forget the "fastq to fasta" step ;-)