Question

How to Filter peptide sequence FASTA File vs known Proteome

0

Entering edit mode

5.3 years ago

ieuangw • 0

Hello all, I stuck with the following problem

Having performed a de novo RNA seq alignment with Trinity and then 6 frame translation I would like to filter my Amino Acid sequence fasta file vs a known reference proteome such as Uniprot, to exact all sequences that do not occur in it.

Unfortunately I cannot filter by ID due to the de novo tags. Does anybody have a solution to filter each sequence (700,000) to see if the same sequence occurs in the reference file.?

RNA-Seq Assembly • 850 views

ADD COMMENT • link written 5.3 years ago by ieuangw • 0

score 1 · Answer 1 · 2019-01-13

1

Entering edit mode

5.3 years ago

GenoMax 141k

I would suggest that you take the RNAseq assembly (trinity would create an assembly) and use DIAMOND to search against UniProt database. Use a tab delimited output for ease of parsing and only long/strict hits. DIAMOND does require good bit of RAM so make sure you do this on appropriate hardware.

You could also try using blat (Jim Kent from UCSC) since you are only looking for very similar (or identical?) hits.

ADD COMMENT • link 5.3 years ago by GenoMax 141k

0

Entering edit mode

Thanks Diamond is a good idea, I shall give it a go- It is hosted on Galaxy so processing power is easy.. What I am hoping to separate out is a list of all sequences that are not in the reference proteome (as I can then attempt to align this to novel MS data).

ADD REPLY • link 5.3 years ago by ieuangw • 0