Confusion on blastp output files
1
0
Entering edit mode
10 months ago
Riyad • 0

Hi,

My blastp output is like the following 4 lines. I am looking for nuclear genes. Here, clear on mitochondrial, cytoplasmic, and chloroplastic genes hit, but there are bunch of hits those doesn't mention anything, for example the fourth one here. How can I choose which one is nuclear gene match in a typical blastp output?

hit_1  4.62e-155  66.358  RecName: Full=Malate dehydrogenase, **mitochondrial**; Flags: Precursor [Fragaria x ananassa]
hit_2  4.05e-88   49.485  RecName: Full=Protein RETICULATA-RELATED 1, **chloroplastic**; Flags: Precursor [Arabidopsis thaliana]
hit_3  7.66e-104  48.225  RecName: Full=Phenylalanine--tRNA ligase beta subunit, **cytoplasmic**; AltName: Full=Phenylalanyl-tRNA synthetase beta subunit; Short=PheRS [Arabidopsis thaliana]
hit_4  8.10e-108  80.556  RecName: Full=Peptidyl-prolyl cis-trans isomerase CYP22; Short=PPIase CYP22; AltName: Full=Cyclophilin of 22 kDa; AltName: Full=Cyclophilin-22 [Arabidopsis thaliana]
blast homologs Blastp • 937 views
ADD COMMENT
1
Entering edit mode

Just for clarification: are you looking for nuclear-encoded gene sequences (the DNA sequence of the gene is encoded in the nuclear vs. plastid genomes) or nuclear proteins (the protein product of the gene has sub-cellular localization in the nucleus)? These questions are very different. I think the localizations you have highlighted refer to the latter and the question of sub-cellular localization of products is more common and more interesting in my opinion.

ADD REPLY
1
Entering edit mode

Also, what is your Blast database? It looks like TrEMBL but you did not include any accessions in the output file. If you want to solve this task, be prepared to run the BLASTP analysis again with options to contain subject accessions to be able to look them up. If you need to look up the subcellular localization make sure to use only SwissProt/TrEMBL or a subset thereof. You can then lookup the subcellular localization of the proteins using UniProtKB: https://www.uniprot.org/uniprotkb/A0A0A0KDZ4/entry#subcellular_location

ADD REPLY
0
Entering edit mode

Thanks for the feedbacks, but I am not clear though.

Here is more details... Actually, I am doing a phylogeny project. I did RNA seqs from around 30 samples. Then assembled them by TRINITY. Then I converted them into amino acid seqs. So, now I have all of my samples with their amino acid sequences. I will use the amino acid sequences for phylogeny analysis. Importantly, I need only those sequences which are comes from nuclear (not from organelle) for my phylogeny works.

I am doing blastp with my amino acid seqs fasta file on swissport database. In output file, I can see specific localisation in many hits in case of organelle, even in cytoplasmic. I can't see any hits that could display "nuclear", etc. From this blast hit, how can I select my targeted sequences (just hits) as phylogeny works. That I mentioned earlier, I only need those sequences comes from nuclear, not from organelle.

Expecting a good solution...

ADD REPLY
0
Entering edit mode

I think you need to define your orthologue extraction strategy properly, pure BlastP is insufficient. Selecting nuclear-encoded genes alone will not do the trick either. The sequences need to be aligned with their respective orthologues to create a MSA suitable for phylogenetic analysis. To devise a good solution, we need to know what your samples are. If they are from closely related strains, for example, the strategy should be based on nucleotide sequences, not protein sequences.

I suggest you run 1:1 orthologue detection, e.g. OrthoFinder, first. Then filter the orthologous groups for nuclear genes if need be and use the remainder to create a concatenated MSA.

Also, what you are seeing as localization is the localization of the product which is not the same as the origin necessarily, e.g. many mitochondrial proteins are encoded in the nuclear genome.

ADD REPLY
1
Entering edit mode
10 months ago
Mensur Dlakic ★ 27k

You can't do that by analyzing BLASTP hits. Fasta headers may contain cellular localization, but it is not a given that they will.

ADD COMMENT

Login before adding your answer.

Traffic: 1202 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6