Need to collapse rows in Pandas
1
0
Entering edit mode
2.4 years ago
Daksh • 0

Hi, Please go easy on me as I am new to coding!

This is my data frame

                                               A         B       C  D 
0                                    TG-bck-full       3648   10064  0
1    pNLRep2-Caprh74kan.SGR.(S.30MAY19.N.4Dec19)      10726    4083  0
2           pHELP-KanV4.SGR.(S.04MAY19.N.4Dec19)      13269    1795  0
3                                              1  248956422     248  0
4                                             10  133797422     120  0
..                                           ...        ...     ... ..
196                                   KI270394.1        970       0  0
197                            AY601635.1:1-4344       4344       2  0
198                                   CP011113.2    4587291       9  0
199                                     cassette       4981  164534  0
200                                            *          0       0  0

I need to build a code so that Row 3-196 are collapsed into one row, and the sum of B and C should be in that one row. So, in the end, it should look like

                                               A         B          C  D 
0                                    TG-bck-full       3648        10064  0
1    pNLRep2-Caprh74kan.SGR.(S.30MAY19.N.4Dec19)      10726        4083  0
2           pHELP-KanV4.SGR.(S.04MAY19.N.4Dec19)      13269        1795  0
3           (Not sure what would come here)            3099755062   25021
4                              AY601635.1:1-4344       4344        2  0
5                                     CP011113.2    4587291        9  0
6                                       cassette       4981        164534  
7                                              *          0        0  0

Please note the sum of the rows should be only the values between rows 3-196

Thanks a lot!

pandas python VGS • 2.5k views
ADD COMMENT
0
Entering edit mode

what is the basis of collapse?

ADD REPLY
1
Entering edit mode
2.4 years ago
Wayne ★ 2.0k

This approach works. (The clunky part is that the 'pivot_table' step needed the content in column A not to be a string to be easy.)
There's probably other ways.

Approach summary:
I basically use the .iloc to define the top and the bottom portions of the dataframe you want.
Then I take the 'middle portion' and sum it and make that produced series a dataframe.
Then I concatenate the three dataframes together to get the final one you seek.

See the process illustrated in a notebook here.
I made a mock dataframe that is much shorter. First three rows form the top and the last three the bottom in my fascmile of your data.
The first three cells in the notebook just make the example dataframe and show the 'top' and 'bottom' parts.

I tried to make it straightforward for you to adapt to your real case.

ADD COMMENT
0
Entering edit mode

Thanks a lot Wayne, really made stuff clearer!

ADD REPLY

Login before adding your answer.

Traffic: 1756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6