个人学习 学,以致用

学习心得1 - 基础介绍

2017-04-15
Geng

Python提供了大量的学习心得工具,下面做一些基本介绍。 首先是NumPy,即“Numerical Python”,是Python的一种开源的数值计算扩展。它可用来存储和处理大型矩阵,比Python自身的嵌套列表(nested list structure)结构要高效的多(该结构也可以用来表示矩阵(matrix))。

提到了Numpy,自然也要提到SciPy (Scientific Python),Scipy在Numpy基础上扩展了更多功能,比如回归,傅里叶变换等。 (素材大多取自http://www.python-course.eu/)。

还有一个基本的数据处理工具pandas,简单来说就是一个Python版的excel。

和Matlab对比

如果想要匹配Matlab这种变态的需求,那么我们需要NumPyScipyMatplotlibpandas。这一套东西熟悉了之后,那么Matlab真的不用买了,或者费劲找资源了。

上图的这些东西,我们需要从下到上安装,防止出现依赖问题,或者直接下载anaconda,一键完成。

学习路线图

由上图可见,我们学习这套工具最好是从Numpy开始,然后ScipyMatplotlib,最后Pandas

总之,为了学习心得方便,我们从Numpy开始吧。

工具

强烈建议使用Jupyter,保证你的学习效率提高数倍。如果安装的是anaconda,那么不用单独安装了。怎么用的话,网上很多教程我就不详细说了。我这里的所有内容都是用Jupyter写的。

初体验

import numpy as np
temperature_c = [25.3, 24.8, 26.9, 23.9] # 摄氏温度

如果想要计算华氏温度,原生Python方法需要:

temperature_f = [x * 9 / 5 + 32 for x in temperature_c]
print(temperature_f)
[77.54, 76.64, 80.42, 75.02]

但是如果是用numpy array就简单了

np_c = np.array(temperature_c) # 转为numpy array,ndarray
np_f = np_c * 9 / 5 + 32
print(np_f)
[ 77.54  76.64  80.42  75.02]

平均分布

我们可以使用Numpy提供的arangelinspace构建等差数列

arange

np.arange(start, stop=None, step=1, dtype=None)  

详细用法,抄一段文档如下(第一次看建议略过):

Docstring:
arange([start,] stop[, step,], dtype=None)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval ``[start, stop)``
(in other words, the interval including `start` but excluding `stop`).
For integer arguments the function is equivalent to the Python built-in
`range <http://docs.python.org/lib/built-in-funcs.html>`_ function,
but returns an ndarray rather than a list.

When using a non-integer step, such as 0.1, the results will often not
be consistent.  It is better to use ``linspace`` for these cases.

Parameters
----------
start : number, optional
    Start of interval.  The interval includes this value.  The default
    start value is 0.
stop : number
    End of interval.  The interval does not include this value, except
    in some cases where `step` is not an integer and floating point
    round-off affects the length of `out`.
step : number, optional
    Spacing between values.  For any output `out`, this is the distance
    between two adjacent values, ``out[i+1] - out[i]``.  The default
    step size is 1.  If `step` is specified, `start` must also be given.
dtype : dtype
    The type of the output array.  If `dtype` is not given, infer the data
    type from the other input arguments.

Returns
-------
arange : ndarray
    Array of evenly spaced values.

    For floating point arguments, the length of the result is
    ``ceil((stop - start)/step)``.  Because of floating point overflow,
    this rule may result in the last element of `out` being greater
    than `stop`.

重点说下:

  1. arange返回等间距数组,范围是[start, stop)(包括start,但是不包括stop),与原生的range不同的是,arange返回的是ndarray,而不是iterator
  2. 如果是非整数step,结果可能不准,建议使用linspace
a = np.arange(1, 10)
print('ndarray:', a)
# compare to range:
x = range(1,10)
print('range object:', x)    # x is an iterator
print(list(x))
ndarray: [1 2 3 4 5 6 7 8 9]
range object: range(1, 10)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

linspace

linspace(start, stop, num=50, endpoint=True, retstep=False)

详细用法,抄一段文档如下(第一次看建议略过):

Return evenly spaced numbers over a specified interval.

Returns `num` evenly spaced samples, calculated over the
interval [`start`, `stop`].

The endpoint of the interval can optionally be excluded.

Parameters
----------
start : scalar
    The starting value of the sequence.
stop : scalar
    The end value of the sequence, unless `endpoint` is set to False.
    In that case, the sequence consists of all but the last of ``num + 1``
    evenly spaced samples, so that `stop` is excluded.  Note that the step
    size changes when `endpoint` is False.
num : int, optional
    Number of samples to generate. Default is 50. Must be non-negative.
endpoint : bool, optional
    If True, `stop` is the last sample. Otherwise, it is not included.
    Default is True.
retstep : bool, optional
    If True, return (`samples`, `step`), where `step` is the spacing
    between samples.
dtype : dtype, optional
    The type of the output array.  If `dtype` is not given, infer the data
    type from the other input arguments.

    .. versionadded:: 1.9.0

Returns
-------
samples : ndarray
    There are `num` equally spaced samples in the closed interval
    ``[start, stop]`` or the half-open interval ``[start, stop)``
    (depending on whether `endpoint` is True or False).
step : float, optional
    Only returned if `retstep` is True

    Size of spacing between samples.

总的来说,就是返回一组个数为num的数列,[start, stop](如果endpoint=True),或者[start, stop)(如果endpoint=False)

# [1, 10],共十个数:
print(np.linspace(1, 10, 10))
# [1, 10),共十个数:
print(np.linspace(1, 10, 10,endpoint=False))
[  1.   2.   3.   4.   5.   6.   7.   8.   9.  10.]
[ 1.   1.9  2.8  3.7  4.6  5.5  6.4  7.3  8.2  9.1]

我们在试试retstep有什么用处

# [1, 10],共十个数:
print(np.linspace(1, 10, 10, retstep=True))
(array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.]), 1.0)

可以看到,如果设为True, 返回值变为tuple类型(samples, step), step是样本间距

速度

Nunpy往往比原生Python计算要快。知道这么多够了,有兴趣自己查好了。

下一步

这部分仅仅大概体验了一下Numpy,下一步,真正开始了。


Similar Posts

Comments

你可以请我喝喝茶,聊聊天,鼓励我

Wechat Pay
wechat

Thanks!