Question: How to Filter peptide sequence FASTA File vs known Proteome
0
gravatar for ieuangw
10 months ago by
ieuangw0
ieuangw0 wrote:

Hello all, I stuck with the following problem

Having performed a de novo RNA seq alignment with Trinity and then 6 frame translation I would like to filter my Amino Acid sequence fasta file vs a known reference proteome such as Uniprot, to exact all sequences that do not occur in it.

Unfortunately I cannot filter by ID due to the de novo tags. Does anybody have a solution to filter each sequence (700,000) to see if the same sequence occurs in the reference file.?

rna-seq assembly • 362 views
ADD COMMENTlink modified 10 months ago by evabrown950 • written 10 months ago by ieuangw0
1
gravatar for genomax
10 months ago by
genomax74k
United States
genomax74k wrote:

I would suggest that you take the RNAseq assembly (trinity would create an assembly) and use DIAMOND to search against UniProt database. Use a tab delimited output for ease of parsing and only long/strict hits. DIAMOND does require good bit of RAM so make sure you do this on appropriate hardware.

You could also try using blat (Jim Kent from UCSC) since you are only looking for very similar (or identical?) hits.

ADD COMMENTlink modified 10 months ago • written 10 months ago by genomax74k

Thanks Diamond is a good idea, I shall give it a go- It is hosted on Galaxy so processing power is easy.. What I am hoping to separate out is a list of all sequences that are not in the reference proteome (as I can then attempt to align this to novel MS data).

ADD REPLYlink written 10 months ago by ieuangw0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1404 users visited in the last hour