Question: compare 2 sequence using blast from different identification files
3.1 years ago
kws1540 wrote:

Hi everyone

i am trying to compare promoters sequence of two species using blast2seq , so i have a table generated from the genome genbank file using python and the table,which is a txt file, looks like something this , each GeneID has a corresponding XP value and other information.

LOC101251020 XP_004228330.1 GeneID:101251020 NADH dehydrogenase [ubiquinone] 1 alpha subcomplex subunit 6

LOC101251313 XP_004228331.1 GeneID:101251313 F-box/kelch-repeat protein At1g55270-like

LOC101251313 XP_010314935.1 GeneID:101251313 F-box/kelch-repeat protein At1g55270-like

LOC101251313 XP_010315084.1 GeneID:101251313 F-box/kelch-repeat protein At1g55270-li

LOC101264245 XP_004228763.1 GeneID:101264245 NAC domain-containing protein 78-like

LOC101264547 XP_004228764.1 GeneID:101264547 uncharacterized protein LOC101264547

LOC104645410 XP_010315223.1 GeneID:104645410 probable E3 ubiquitin protein ligase DRIPH

what i am interested in there are the GeneID and XP_value, becuase the promoter is labelled by GeneID like this


also i have another text file table showing the match pairs of sequences , what i want is getting the corresponding promoters identification ID (GeneID) of these sequence through table one and the promoter sequence through the promoter files, so that i can use blast to compare their promoters, does anyone know how i can do this automatically for a lot of sequences? thank you very much

1 3517 S.lyco.fasta 1.000 XP_004228331.1 100%

1 3517 spen.fasta 1.000 XP_004228763.1 100%

2 3145 S.lyco.fasta 1.000 XP_004236763.1 100%

2 3145 spen.fasta 1.000 XP_004228763.1 100%

3 3078 S.lyco.fasta 1.000 XP_008522763.1 100%

3 3078 spen.fasta 1.000 XP_000753763.1 100%

