Pandas数据结构-Series(二)

导入必要的包

1 2	import numpy as np import pandas as pd

通过索引对齐值的特性

用”+”操作实现

1
2
3

s6 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
s7 = pd.Series([4, 3, 2, 1], index=['d', 'c', 'b', 'a'])
print(s6 + s7)

输出：

a    2
b    4
c    6
d    8
dtype: int64

Serise的相加是基于索引匹配的
如果存在无法匹配的标签，运算结果将为NaN

算术运算

将所有值乘以一个数

1	print(s6 * 2)

输出：

a    2
b    4
c    6
d    8
dtype: int64

两个Series相乘

1 2	t = pd.Series(2, s6.index) print(s6 * t)

输出：

a    2
b    4
c    6
d    8
dtype: int64

存在NaN值的情况

忽略NaN值

使用了mean()函数求平均

1 2	s = pd.Series(np.array([1, 2, 3, 4, np.nan])) print(s.mean())

输出：

2.5

不忽略NaN值
使用mean()函数的skipna参数，类型为布尔型(boolean)

1	print(s.mean(skipna=False))

输出：

nan

布尔选择

用到的Series

1	s = pd.Series(np.arange(0, 10))

0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int32

使用布尔运算符

1	print(s > 5)

输出：

0    False
1    False
2    False
3    False
4    False
5    False
6     True
7     True
8     True
9     True
dtype: bool

得到的结果类型仍为Series

使用逻辑型数值的Series作为索引

利用了对方括号运算符[ ]的重载，最终只返回对应索引的值为true的对应值的Series片段

1 2	bigger_loc = s > 5 print(s[bigger_loc])

输出：

6    6
7    7
8    8
9    9
dtype: int32

更简介的语法：s[s > 5]

使用复合的布尔运算表达式

1	print(s[(s > 5) & (s < 8)])

输出：

1
2
3

6    6
7    7
dtype: int32

特别注意这里单一的运算周围用的括号()是必须的，否则会报类似下面的错误：

1	ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

函数：all()和any()

all()函数：单一逻辑间的且运算

1	print((s >= 0).all())

输出：

True

any()函数：单一逻辑间的或运算

1	print((s < 2).any())

输出：

True

重定义Series的索引

使用index属性

将一个索引列表(list)传给index属性

1
2
3

s = pd.Series(np.random.randn(5))
s.index = ['a', 'b', 'c', 'd', 'e']
print(s)

输出：

a   -0.623722
b   -0.906193
c   -0.424337
d   -0.486135
e   -0.075073
dtype: float64

修整连接Series后的索引

连接后的索引

s1 = pd.Series(np.random.randn(3))
s2 = pd.Series(np.random.randn(3))
combined = pd.concat([s1, s2])
print(combined)

输出：

0    0.565139
1    0.033748
2   -1.315617
0    1.064170
1   -1.524376
2    0.391475
dtype: float64

修整索引

1 2	combined.index = np.arange(0, len(combined)) print(combined)

输出：

0   -0.169124
1   -1.167542
2   -0.431747
3    0.879552
4   -1.460969
5    0.119569
dtype: float64

使用reindex()函数

参数为索引的列表(list)，特别要注意的是reindex()函数将会返回一个新的Series实例

1
2
3

s1 = pd.Series(np.random.randn(4), ['a', 'b', 'c', 'd'])
s2 = s1.reindex(['a', 'c', 'g'])
print(s2)

输出：

a   -0.644808
c    1.309109
g         NaN
dtype: float64

当索引不存在时，会被填充NaN，若要修改填充值，可使用fill_value参数
或者使用填充方法参数method：有向前填充(‘ffill’)和向后填充(‘bfill’)两种选择

1
2
3

s3 = pd.Series(['red', 'green', 'blue'], index=[0, 3, 5])
s3 = s3.reindex(np.arange(0, 7), method='ffill')
print(s3)

输出：

0      red
1      red
2      red
3    green
4    green
5     blue
6     blue
dtype: object

在本体(in-place)上操作

修改

直接通过索引赋值
Series[index] = val

删除

使用del函数
del(Series[index])

切片操作

1 2	s = pd.Series(np.arange(100, 110), index=np.arange(10, 20)) print(s)

10    100
11    101
12    102
13    103
14    104
15    105
16    106
17    107
18    108
19    109
dtype: int32

形式一：Series[start: end: interval]

1	print(s[0:6:2])

输出：

10    100
12    102
14    104
dtype: int32

等价于s.iloc[[0, 2, 4]]

形式二：Series[start: end]

1	print(s[:5])

等价于s.head(5)

巧妙利用切片

倒序Series

1	print(s[::-1])

输出：

19    109
18    108
17    107
16    106
15    105
14    104
13    103
12    102
11    101
10    100
dtype: int32

从开始位置4，倒叙

1	print(s[4::-2])

输出：

14    104
12    102
10    100
dtype: int32