在对excel的操作中,调整列的顺序以及添加一些列也是经常用到的,下面我们用pandas实现这一功能。
1、调整列的顺序
>>> df = pd.read_excel(r\'D:/myExcel/1.xlsx\') >>> df A B C D 0 bob 12 78 87 1 millor 15 92 21 >>> df.columns Index([\'A\', \'B\', \'C\', \'D\'], dtype=\'object\') # 这是最简单常用的一种方法,相当于指定列名让pandas # 从df中获取 >>> df[[\'A\', \'D\', \'C\', \'B\']] A D C B 0 bob 87 78 12 1 millor 21 92 15 # 这也是可以的 >>> df[[\'A\', \'A\', \'A\', \'A\']] A A A A 0 bob bob bob bob 1 millor millor millor millor
2、添加某一列或者某几列
(1)直接添加
>>> df[\'E\']=[1, 2] >>> df A B C D E 0 bob 12 78 87 1 1 millor 15 92 21 2
(2)调用assign方法。该方法善于根据已有的列添加新的列,通过基本运算,或者调用函数
>>> df A B C D 0 bob 12 78 87 1 millor 15 92 21 # 其中E是列名,根据B列-C列的值得到 >>> df.assign(E=df[\'B\'] - df[\'C\']) A B C D E 0 bob 12 78 87 -66 1 millor 15 92 21 -77 # 添加两列也可以 >>> df.assign(E=df[\'B\'] - df[\'C\'], F=df[\'B\'] * df[\'C\']) A B C D E F 0 bob 12 78 87 -66 936 1 millor 15 92 21 -77 1380
哈哈,以上就是pandas关于调整列的顺序以及新增列的用法
补充:pandas修改DataFrame中的列名&调整列的顺序
修改列名:
直接调用接口:
df.rename()
看一下接口中的定义:
def rename(self, *args, **kwargs): \"\"\" Alter axes labels. Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don\'t throw an error. See the :ref:`user guide <basics.rename>` for more. Parameters ---------- mapper, index, columns : dict-like or function, optional dict-like or functions transformations to apply to that axis\' values. Use either ``mapper`` and ``axis`` to specify the axis to target with ``mapper``, or ``index`` and ``columns``. axis : int or str, optional Axis to target with ``mapper``. Can be either the axis name (\'index\', \'columns\') or number (0, 1). The default is \'index\'. copy : boolean, default True Also copy underlying data inplace : boolean, default False Whether to return a new DataFrame. If True then value of copy is ignored. level : int or level name, default None In case of a MultiIndex, only rename labels in the specified level. Returns ------- renamed : DataFrame See Also -------- pandas.DataFrame.rename_axis Examples -------- ``DataFrame.rename`` supports two calling conventions * ``(index=index_mapper, columns=columns_mapper, ...)`` * ``(mapper, axis={\'index\', \'columns\'}, ...)`` We *highly* recommend using keyword arguments to clarify your intent. >>> df = pd.DataFrame({\"A\": [1, 2, 3], \"B\": [4, 5, 6]}) >>> df.rename(index=str, columns={\"A\": \"a\", \"B\": \"c\"}) a c 0 1 4 1 2 5 2 3 6 >>> df.rename(index=str, columns={\"A\": \"a\", \"C\": \"c\"}) a B 0 1 4 1 2 5 2 3 6 Using axis-style parameters >>> df.rename(str.lower, axis=\'columns\') a b 0 1 4 1 2 5 2 3 6 >>> df.rename({1: 2, 2: 4}, axis=\'index\') A B 0 1 4 2 2 5 4 3 6 \"\"\" axes = validate_axis_style_args(self, args, kwargs, \'mapper\', \'rename\') kwargs.update(axes) # Pop these, since the values are in `kwargs` under different names kwargs.pop(\'axis\', None) kwargs.pop(\'mapper\', None) return super(DataFrame, self).rename(**kwargs)
注意:
一个*,输入可以是数组、元组,会把输入的数组或元组拆分成一个个元素。
两个*,输入必须是字典格式
示例:
>>>import pandas as pd >>>a = pd.DataFrame({\'A\':[1,2,3], \'B\':[4,5,6], \'C\':[7,8,9]}) >>> a A B C 0 1 4 7 1 2 5 8 2 3 6 9 #将列名A替换为列名a,B改为b,C改为c >>>a.rename(columns={\'A\':\'a\', \'B\':\'b\', \'C\':\'c\'}, inplace = True) >>>a a b c 0 1 4 7 1 2 5 8 2 3 6 9
调整列的顺序:
如:
>>> import pandas >>> dict_a = {\'user_id\':[\'webbang\',\'webbang\',\'webbang\'],\'book_id\':[\'3713327\',\'4074636\',\'26873486\'],\'rating\':[\'4\',\'4\',\'4\'], \'mark_date\':[\'2017-03-07\',\'2017-03-07\',\'2017-03-07\']} >>> df = pandas.DataFrame(dict_a) # 从字典创建DataFrame >>> df # 创建好的df列名默认按首字母顺序排序,和字典中的先后顺序并不一样,字典中\'user_id\',\'book_id\',\'rating\',\'mark_date\' book_id mark_date rating user_id 0 3713327 2017-03-07 4 webbang 1 4074636 2017-03-07 4 webbang 2 26873486 2017-03-07 4 webbang
直接修改列名:
>>> df = df[[\'user_id\',\'book_id\',\'rating\',\'mark_date\']] # 调整列顺序为\'user_id\',\'book_id\',\'rating\',\'mark_date\' >>> df user_id book_id rating mark_date 0 webbang 3713327 4 2017-03-07 1 webbang 4074636 4 2017-03-07 2 webbang 26873486 4 2017-03-07
就可以了。
以上为个人经验,希望能给大家一个参考,也希望大家多多支持。如有错误或未考虑完全的地方,望不吝赐教。
© 版权声明
THE END
暂无评论内容