Question: Merge my dataframe depending on a fasta SeqId order
0
gravatar for Darill
18 months ago by
Darill30
Darill30 wrote:

Hi all, I actually have two fasta file candidates_aa_0042.fasta and candidates_aa_0035.fasta

and two dataframe Best_blast_candidate_hit_0042.csv and Best_blast_candidate_hit_0035.csv

Here is the exemple containt of them :

qseqid  sseqid  pident  length  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore    salltitles  staxids scientific_name scomnames   sskingdoms  Order
g44459.t1_0035_0035 XP_011687429.1  39.5    157 95  0   7   163 2   158 8.1e-27 129.8   uncharacterized protein LOC105449744 [Wasmannia auropunctata]   64793   Wasmannia auropunctata      Eukaryota   Hymenoptera
g17612.t1_0035_0042 XP_011699787.1  59.3    349 142 0   99  447 336 684 1.5e-120    442.6   uncharacterized protein LOC105457055 [Wasmannia auropunctata]   64793   Wasmannia auropunctata      Eukaryota   Hymenoptera
g29924.t1_0035_0042 XP_011871948.1  67.0    261 85  1   1   260 18  278 1.3e-100    375.6   uncharacterized protein LOC105564266, partial [Vollenhovia emeryi]  411798  Vollenhovia emeryi      Eukaryota   Hymenoptera
g47960.t1_0035_0035 XP_011860868.1  68.8    298 93  0   1   298 142 439 3.3e-116    427.6   uncharacterized protein LOC105558006 [Vollenhovia emeryi]   411798  Vollenhovia emeryi      Eukaryota   Hymenoptera
g28580.t1_0035_0042 XP_011883624.1  70.0    240 69  3   1   239 41  278 1.3e-86 328.9   uncharacterized protein LOC105570787 [Vollenhovia emeryi]   411798  Vollenhovia emeryi      Eukaryota   Hymenoptera

and

qseqid  sseqid  pident  length  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore    salltitles  staxids scientific_name scomnames   sskingdoms  Order
g34354.t1_0042_0035 XP_011699801.1  43.7    135 63  4   7   128 625 759 9.3e-17 96.3    LOW QUALITY PROTEIN 64793   Wasmannia auropunctata      Eukaryota   Hymenoptera
g34606.t1_0042_0035 XP_011871948.1  59.8    249 79  2   1   228 51  299 3.4e-81 310.8   uncharacterized protein LOC105564266, partial [Vollenhovia emeryi]  411798  Vollenhovia emeryi      Eukaryota   Hymenoptera
g13215.t1_0042_0042 XP_011883625.1  62.0    242 92  0   46  287 160 401 5.4e-82 313.9   uncharacterized protein LOC105570788, partial [Vollenhovia emeryi]  411798  Vollenhovia emeryi      Eukaryota   Hymenoptera
g35379.t1_0042_0035 XP_011858260.1  73.3    191 51  0   4   194 690 880 6.3e-76 293.1   uncharacterized protein LOC105555830 [Vollenhovia emeryi]   411798  Vollenhovia emeryi      Eukaryota   Hymenoptera
g13770.t1_0042_0042 XP_011883624.1  66.5    203 65  3   10  211 33  233 1.9e-65 258.5   uncharacterized protein LOC105570787 [Vollenhovia emeryi]   411798  Vollenhovia emeryi      Eukaryota   Hymenoptera

And I actually have to merge them BUT in the same order than the seqID in the fasta file.

For exemple if the fasta file 1 contains :

>seq1_0035_0042
ATGGAGAGATAG
>seq6_0035_0035
ATGGATAGAGA

and the fasta file 2 contains:

>seq8_0042_0042
ATGGAGAGATAG
>seq3_0042_0035
ATGGATAGAGA

then I would like to merge my dataframe in that order:

ex:

qseqid_1       qseqid_2       sseqid_1       sseqid_2       pident_1 pident_2 etc...
seq1_0035_0042 XP_011883678.1 seq8_0042_0042 XP_011883789.1   78.9   45.9 etc
seq6_0035_0035 XP_011566754.1 seq3_0042_0035 XP_011566754.1   67.9   78.0. etc

Ps: all SeqId in the fasta files are not present in the dataframe, so if there is not a pair, maybe could we add it at the dataframe and add a Nan at the column_2 parts? Thank for your help :)

pandas merge python • 305 views
ADD COMMENTlink modified 18 months ago • written 18 months ago by Darill30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1879 users visited in the last hour