Flatten DataFrame with multi-index columns

Question

I'd like to convert a Pandas DataFrame that is derived from a pivot table into a row representation as shown below.

This is where I'm at:

import pandas as pd
import numpy as np
df = pd.DataFrame({
    'goods': ['a', 'a', 'b', 'b', 'b'],
    'stock': [5, 10, 30, 40, 10],
    'category': ['c1', 'c2', 'c1', 'c2', 'c1'],
    'date': pd.to_datetime(['2014-01-01', '2014-02-01', '2014-01-06', '2014-02-09', '2014-03-09'])
})
# we don't care about year in this example
df['month'] = df['date'].map(lambda x: x.month)
piv = df.pivot_table(["stock"], "month", ["goods", "category"], aggfunc="sum")
piv = piv.reindex(np.arange(piv.index[0], piv.index[-1] + 1))
piv = piv.ffill(axis=0)
piv = piv.fillna(0)
print piv

which results in

stock            
goods        a       b    
category    c1  c2  c1  c2
month                     
1            5   0  30   0
2            5  10  30  40
3            5  10  10  40

And this is where I want to get to.

goods category month stock
    a       c1     1     5
    a       c1     2     0
    a       c1     3     0
    a       c2     1     0
    a       c2     2    10
    a       c2     3     0
    b       c1     1    30
    b       c1     2     0
    b       c1     3    10
    b       c2     1     0
    b       c2     2    40
    b       c2     3     0

Previously, I used

piv = piv.stack()
piv = piv.reset_index()
print piv

to get rid of the multi-indexes, but this results in this because I pivot now on two columns (["goods", "category"]):

      month category stock    
goods                    a   b
0         1       c1     5  30
1         1       c2     0   0
2         2       c1     5  30
3         2       c2    10  40
4         3       c1     5  10
5         3       c2    10  40

Does anyone know how I can get rid of the multi-index in the column and get the result into a DataFrame of the exemplified format?

behzad.nouri · Accepted Answer · 2014-12-20 04:11:44Z

14

>>> piv.unstack().reset_index().drop('level_0', axis=1)
   goods category  month   0
0      a       c1      1   5
1      a       c1      2   5
2      a       c1      3   5
3      a       c2      1   0
4      a       c2      2  10
5      a       c2      3  10
6      b       c1      1  30
7      b       c1      2  30
8      b       c1      3  10
9      b       c2      1   0
10     b       c2      2  40
11     b       c2      3  40

then all you need is to change last column name from 0 to stock.

answered Dec 20, 2014 at 4:11

behzad.nouri

76.3k18 gold badges127 silver badges125 bronze badges

1

hmmmm, melt (my answer) loses month from the index :(
– Andy Hayden
Dec 20, 2014 at 4:51
Thanks. It works fine with my example. However, I don't quite understand why using stack previously worked and I now should use unstack.
– orange
Dec 20, 2014 at 5:45

Add a comment |

Andy Hayden · Accepted Answer · 2014-12-20 07:42:22Z

It seems to me that melt (aka unpivot) is very close to what you want to do:

In [11]: pd.melt(piv)
Out[11]:
      NaN goods category  value
0   stock     a       c1      5
1   stock     a       c1      5
2   stock     a       c1      5
3   stock     a       c2      0
4   stock     a       c2     10
5   stock     a       c2     10
6   stock     b       c1     30
7   stock     b       c1     30
8   stock     b       c1     10
9   stock     b       c2      0
10  stock     b       c2     40
11  stock     b       c2     40

There's a rogue column (stock), that appears here that column header is constant in piv. If we drop it first the melt works OOTB:

In [12]: piv.columns = piv.columns.droplevel(0)

In [13]: pd.melt(piv)
Out[13]:
   goods category  value
0      a       c1      5
1      a       c1      5
2      a       c1      5
3      a       c2      0
4      a       c2     10
5      a       c2     10
6      b       c1     30
7      b       c1     30
8      b       c1     10
9      b       c2      0
10     b       c2     40
11     b       c2     40

Edit: The above actually drops the index, you need to make it a column with reset_index:

In [21]: pd.melt(piv.reset_index(), id_vars=['month'], value_name='stock')
Out[21]:
    month goods category  stock
0       1     a       c1      5
1       2     a       c1      5
2       3     a       c1      5
3       1     a       c2      0
4       2     a       c2     10
5       3     a       c2     10
6       1     b       c1     30
7       2     b       c1     30
8       3     b       c1     10
9       1     b       c2      0
10      2     b       c2     40
11      3     b       c2     40

@orange the index appears to be dropped, leaving this answer here as it could be useful to some (I also hope I can fix it!). — Andy Hayden, Dec 20, 2014 at 7:11
@orange I wouldn't say it's a bug, but could potentially be a feature/enhancement (to not drop index). Also, you can't pass things like col_level=[1, 2] which would be useful here. — Andy Hayden, Dec 20, 2014 at 7:31
Thanks anyway. I keep an eye on this function to use it in future. — orange, Dec 20, 2014 at 7:35

Philipe Riskalla Leal · Accepted Answer · 2019-06-17 22:50:36Z

I know that the question has already been answered, but for my dataset multiindex column problem, the provided solution was unefficient. So here I am posting another solution for unpivoting multiindex columns using pandas.

Here is the problem I had:

As one can see, the dataframe is composed of 3 multiindex, and two levels of multiindex columns.

The desired dataframe format was:

When I tried the options given above, the pd.melt function didn't allow to have more than one column in the var_name attribute. Therefore, every time that I tried a melt, I would end up losing some attribute from my table.

The solution I found was to apply a double stacking function over my dataframe.

Before the coding, it is worth notice that the desired var_name for my unpivoted table column was "Populacao residente em domicilios particulares ocupados" (see in the code below). Therefore, for all my value entries, they should be stacked in this newly created var_name new column.

Here is a snippet code:

import pandas as pd

# reading my table

df = pd.read_excel(r'my_table.xls', sep=',', header=[2,3], encoding='latin3', 
               index_col=[0,1,2], na_values=['-', ' ', '*'], squeeze=True).fillna(0)

df.index.names = ['COD_MUNIC_7', 'NOME_MUN', 'TIPO']
df.columns.names = ['sexo', 'faixa_etaria']


df.head()


# making the stacking:

df = pd.DataFrame(pd.Series(df.stack(level=0).stack(), name='Populacao residente em domicilios particulares ocupados')).reset_index()


df.head()

Another solution I found was to first apply a stacking function over the dataframe and then apply the melt.

Here is an alternative code:

df = df.stack('faixa_etaria').reset_index().melt(id_vars=['COD_MUNIC_7', 'NOME_MUN','TIPO', 'faixa_etaria'],
                  value_vars=['Homens', 'Mulheres'],
                  value_name='Populacao residente em domicilios particulares ocupados', 
                  var_name='sexo')

df.head()

Sincerely yours,

Philipe Riskalla Leal

Collectives™ on Stack Overflow

Flatten DataFrame with multi-index columns

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged
python
pandas
pivot-table
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged pythonpandaspivot-table or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
pivot-table
or ask your own question.