pandas.DataFrame中提取特定类型dtype的列-偶像资源网

import pandas as pd

df = pd.DataFrame({\'a\': [1, 2, 1, 3],
                   \'b\': [0.4, 1.1, 0.1, 0.8],
                   \'c\': [\'X\', \'Y\', \'X\', \'Z\'],
                   \'d\': [[0, 0], [0, 1], [1, 0], [1, 1]],
                   \'e\': [True, True, False, True]})

df[\'f\'] = pd.to_datetime([\'2018-01-01\', \'2018-03-15\', \'2018-02-20\', \'2018-03-15\'])

print(df)
#    a    b  c       d      e          f
# 0  1  0.4  X  [0, 0]   True 2018-01-01
# 1  2  1.1  Y  [0, 1]   True 2018-03-15
# 2  1  0.1  X  [1, 0]  False 2018-02-20
# 3  3  0.8  Z  [1, 1]   True 2018-03-15

print(df.dtypes)
# a             int64
# b           float64
# c            object
# d            object
# e              bool
# f    datetime64[ns]
# dtype: object

将描述以下内容。

select_dtypes（）的基本用法

指定要提取的类型：参数include
指定要排除的类型：参数exclude

select_dtypes（）的基本用法

指定要提取的类型：参数include

在参数include中指定要提取的数据类型dtype。

print(df.select_dtypes(include=int))
#    a
# 0  1
# 1  2
# 2  1
# 3  3

可以按原样指定作为Python的内置类型提供的那些变量，例如int和float。您可以将“ int”指定为字符串，也可以指定“ int64”（包括确切位数）。（标准位数取决于环境）

print(df.select_dtypes(include=\'int\'))
#    a
# 0  1
# 1  2
# 2  1
# 3  3

print(df.select_dtypes(include=\'int64\'))
#    a
# 0  1
# 1  2
# 2  1
# 3  3

当然，当最多包括位数时，除非位数匹配，否则不会选择它。

print(df.select_dtypes(include=\'int32\'))
# Empty DataFrame
# Columns: []
# Index: [0, 1, 2, 3]

列表中可以指定多种数据类型dtype。日期和时间datetime64 [ns]可以由’datetime’指定。

print(df.select_dtypes(include=[int, float, \'datetime\']))
#    a    b          f
# 0  1  0.4 2018-01-01
# 1  2  1.1 2018-03-15
# 2  1  0.1 2018-02-20
# 3  3  0.8 2018-03-15

可以将数字类型（例如int和float）与特殊值“ number”一起指定。

print(df.select_dtypes(include=\'number\'))
#    a    b
# 0  1  0.4
# 1  2  1.1
# 2  1  0.1
# 3  3  0.8

元素为字符串str类型的列的数据类型dtype是object，但是object列还包含除str外的Python标准内置类型。实际上，数量并不多，但是，如示例中所示，如果有一列的元素为列表类型，请注意，该列也是由include = object提取的。

print(df.select_dtypes(include=object))
#    c       d
# 0  X  [0, 0]
# 1  Y  [0, 1]
# 2  X  [1, 0]
# 3  Z  [1, 1]

print(type(df.at[0, \'c\']))
# <class \'str\'>

print(type(df.at[0, \'d\']))
# <class \'list\'>

但是，除非对其进行有意处理，否则字符串str类型以外的对象都不会（可能）成为pandas.DataFrame的元素，因此不必担心太多。

指定要排除的类型：参数exclude

在参数exclude中指定要排除的数据类型dtype。您还可以在列表中指定多个数据类型dtype。

print(df.select_dtypes(exclude=\'number\'))
#    c       d      e          f
# 0  X  [0, 0]   True 2018-01-01
# 1  Y  [0, 1]   True 2018-03-15
# 2  X  [1, 0]  False 2018-02-20
# 3  Z  [1, 1]   True 2018-03-15

print(df.select_dtypes(exclude=[bool, \'datetime\']))
#    a    b  c       d
# 0  1  0.4  X  [0, 0]
# 1  2  1.1  Y  [0, 1]
# 2  1  0.1  X  [1, 0]
# 3  3  0.8  Z  [1, 1]

可以同时指定包含和排除，但是如果指定相同的类型，则会发生错误。

print(df.select_dtypes(include=\'number\', exclude=int))
#      b
# 0  0.4
# 1  1.1
# 2  0.1
# 3  0.8

# print(df.select_dtypes(include=[int, bool], exclude=int))
# ValueError: include and exclude overlap on frozenset({<class \'numpy.int64\'>})

版权声明 1 本网站名称：偶像资源网
2 本站永久网址：https://www.ox520.com
3 本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长 QQ593098775进行删除处理。
4 本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5 本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6 本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END