Extended Iterable Unpacking (Python 3.5+): [*df]
and Friends
Unpacking generalizations (PEP 448) have been introduced with Python 3.5. So, the following operations are all possible.
df = pd.DataFrame('x', columns=['A', 'B', 'C'], index=range(5))
df
A B C
0 x x x
1 x x x
2 x x x
3 x x x
4 x x x
If you want a list
....
[*df]
# ['A', 'B', 'C']
Or, if you want a set
,
{*df}
# {'A', 'B', 'C'}
Or, if you want a tuple
,
*df, # Please note the trailing comma
# ('A', 'B', 'C')
Or, if you want to store the result somewhere,
*cols, = df # A wild comma appears, again
cols
# ['A', 'B', 'C']
... if you're the kind of person who converts coffee to typing sounds, well, this is going consume your coffee more efficiently ;)
P.S.: if performance is important, you will want to ditch the
solutions above in favour of
df.columns.to_numpy().tolist()
# ['A', 'B', 'C']
This is similar to Ed Chum's answer, but updated for
v0.24 where .to_numpy()
is preferred to the use of .values
. See
this answer (by me) for more information.
Visual Check
Since I've seen this discussed in other answers, you can use iterable unpacking (no need for explicit loops).
print(*df)
A B C
print(*df, sep='\n')
A
B
C
Critique of Other Methods
Don't use an explicit for
loop for an operation that can be done in a single line (list comprehensions are okay).
Next, using sorted(df)
does not preserve the original order of the columns. For that, you should use list(df)
instead.
Next, list(df.columns)
and list(df.columns.values)
are poor suggestions (as of the current version, v0.24). Both Index
(returned from df.columns
) and NumPy arrays (returned by df.columns.values
) define .tolist()
method which is faster and more idiomatic.
Lastly, listification i.e., list(df)
should only be used as a concise alternative to the aforementioned methods for Python 3.4 or earlier where extended unpacking is not available.