Entering edit mode
5.7 years ago
arpankbasak
▴
10
Hi, I'm struggling here working with amino acid sequences and DECIPHER package. I figured DECIPHER works smoothly when the format is GenBank. Which can be challenging to download manually for these number of sequences. Some of my peptides are unidentified. Is anyone aware of the usage and flexibility (may be a pipeline) of this package or any other handling amino acid seqs.
I'm not familiar with the package. What are you trying to do? Identify the genomes which your peptides are found in?
In short I want reverse translate my peptides such that I can fetch possible exons from genome coordinates coding for my peptides, more like de-novo method.
BY THE WAY/S, DECIPHER is quite popular package or pipe, you can do almost everything with it and its really fast and not at all furious on RAM. But there are limitations it takes GenBank format but not GenPept. When it comes to AAStringsets it DECIPHER fails to keep its name. I'm open to any other options or pipes.
They only mention FASTA, FASTQ, and GenBank in the manual:
[source: https://www.bioconductor.org/packages/devel/bioc/vignettes/DECIPHER/inst/doc/DECIPHERing.pdf]
If your data is GenPept, then consider converting it (?).
Well, its pretty complicated even when you convert it, (I tried it). Whats behind SQL is a "Black Box". But I waiting for the next update. Probably they'll fix these issues, with AAstringsset objects. The simple way will be to take the data into .txt format. But still, the deeper operations are becoming complicated.