Question: How to Filter peptide sequence FASTA File vs known Proteome
gravatar for ieuangw
14 months ago by
ieuangw0 wrote:

Hello all, I stuck with the following problem

Having performed a de novo RNA seq alignment with Trinity and then 6 frame translation I would like to filter my Amino Acid sequence fasta file vs a known reference proteome such as Uniprot, to exact all sequences that do not occur in it.

Unfortunately I cannot filter by ID due to the de novo tags. Does anybody have a solution to filter each sequence (700,000) to see if the same sequence occurs in the reference file.?

rna-seq assembly • 422 views
ADD COMMENTlink modified 14 months ago by evabrown950 • written 14 months ago by ieuangw0
gravatar for genomax
14 months ago by
United States
genomax80k wrote:

I would suggest that you take the RNAseq assembly (trinity would create an assembly) and use DIAMOND to search against UniProt database. Use a tab delimited output for ease of parsing and only long/strict hits. DIAMOND does require good bit of RAM so make sure you do this on appropriate hardware.

You could also try using blat (Jim Kent from UCSC) since you are only looking for very similar (or identical?) hits.

ADD COMMENTlink modified 14 months ago • written 14 months ago by genomax80k

Thanks Diamond is a good idea, I shall give it a go- It is hosted on Galaxy so processing power is easy.. What I am hoping to separate out is a list of all sequences that are not in the reference proteome (as I can then attempt to align this to novel MS data).

ADD REPLYlink written 14 months ago by ieuangw0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1418 users visited in the last hour