How to Filter peptide sequence FASTA File vs known Proteome
2
0
Entering edit mode
5.3 years ago
ieuangw • 0

Hello all, I stuck with the following problem

Having performed a de novo RNA seq alignment with Trinity and then 6 frame translation I would like to filter my Amino Acid sequence fasta file vs a known reference proteome such as Uniprot, to exact all sequences that do not occur in it.

Unfortunately I cannot filter by ID due to the de novo tags. Does anybody have a solution to filter each sequence (700,000) to see if the same sequence occurs in the reference file.?

RNA-Seq Assembly • 850 views
ADD COMMENT
1
Entering edit mode
5.3 years ago
GenoMax 141k

I would suggest that you take the RNAseq assembly (trinity would create an assembly) and use DIAMOND to search against UniProt database. Use a tab delimited output for ease of parsing and only long/strict hits. DIAMOND does require good bit of RAM so make sure you do this on appropriate hardware.

You could also try using blat (Jim Kent from UCSC) since you are only looking for very similar (or identical?) hits.

ADD COMMENT
0
Entering edit mode

Thanks Diamond is a good idea, I shall give it a go- It is hosted on Galaxy so processing power is easy.. What I am hoping to separate out is a list of all sequences that are not in the reference proteome (as I can then attempt to align this to novel MS data).

ADD REPLY

Login before adding your answer.

Traffic: 3332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6