学习心得5 - Numpy 高级

Copies and views
Fancy indexing
- boolean mask
- 采用整数数组索引
numpy数组添加更多元素

前面的部分介绍了Numpy基本的使用方法, 这里着重介绍一下Numpy比较高级的内容.

Copies and views

参考NumPy: creating and manipulating numerical data

在学习心得2 - 矩阵运算介绍中提到:

虽然对list和tuple的切割建立新的对象，但是对数组的切割，仅仅是创建了一种查看数组的快捷方式（View）。通过这种快捷方式，你可以更清楚的观察想要观察的内容，但是如果修改被观察的内容，原始数组的数据也会改变。

这里详述一下这个问题.

切割操作创建原始数组的查看方式(view), 它仅仅是访问数据的一个方法. 也就是说, 原始数组在内存中并没有另行备份. 你可以使用 np.may_share_memory() 查看两个数组是否占用同一块内存.

Views

如果你修改 view, 原始数组也会被修改:

import numpy as np

a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

b = a[::2]
b

array([0, 2, 4, 6, 8])

b[0] = 10
b # b 变了

array([10,  2,  4,  6,  8])

a # a 也变了

array([10,  1,  2,  3,  4,  5,  6,  7,  8,  9])

np.may_share_memory(a, b)

True

Copies

下面语句可以复制:

c = b.copy() # 使用copy创建一块儿新的内存
np.may_share_memory(c, b)

False

试试这两个语句

a = np.ones((2,2))
ref = a
ref

array([[ 1.,  1.],
       [ 1.,  1.]])

deep_copy = np.zeros((2,2))
deep_copy[:] = a
deep_copy

array([[ 1.,  1.],
       [ 1.,  1.]])

np.may_share_memory(ref, a)

True

np.may_share_memory(deep_copy, a)

False

简单说, ref=a 只是定义了一个a的引用. 而deep_copy[:] = a是对a的深复制

Fancy indexing

Numpy数组可以采用切割的方法索引, 也可以采用布尔方式(或者掩模 mask). 这种方法叫做花式索引(fancy indexing). 它创建的是副本而不是view

boolean mask

a = np.arange(10)
d = a[a % 2 == 0]
d

array([0, 2, 4, 6, 8])

np.shares_memory(a, d)

False

采用整数数组索引

a = np.arange(0, 100, 10)
a[[1,2,2,2,3,2]]

array([10, 20, 20, 20, 30, 20])

如果新的数组采用整数数组索引, 那么新的数组的形状和这个索引数组的形状一样

a = np.arange(10)
idx = np.array([[2,3], [7,9]])
a[idx]

array([[2, 3],
       [7, 9]])

numpy数组添加更多元素

尽量别这么做, 你应该事先规划好数组的空间, 修改数组的值. 而不是扩大数组的形状, 增加更多的元素. 因为前者效率高得多.

如果你实在想要添加更多元素, 参考下面文档:

How to add items into a numpy array

ppending data to an existing array is a natural thing to want to do for anyone with python experience. However, if you find yourself regularly appending to large arrays, you’ll quickly discover that NumPy doesn’t easily or efficiently do this the way a python list will. You’ll find that every “append” action requires re-allocation of the array memory and short-term doubling of memory requirements. So, the more general solution to the problem is to try to allocate arrays to be as large as the final output of your algorithm. Then perform all your operations on sub-sets (slices) of that array. Array creation and destruction should ideally be minimized.

That said, It’s often unavoidable and the functions that do this are:

for 2-D arrays:

for 3-D arrays (the above plus):

np.dstack

for N-D arrays:

np.concatenate