学习心得8 - Pandas -- Series

创建
提取和切片
- 调用方法
- 利用位置或者标签
时间序列

Series, 由一列数据及对应的标签（索引）组成，本质上就是 Numpy 数组. 每个 Series 对象实际上由两个数组组成, 具有 index 和 values 两个属性.

创建

默认创建

import pandas

s = pandas.Series([1, 2, 3, 4])
s

  1
  2
  3
  4
dtype: int64

index 从0开始的整数, 和 array 一样

空的

s1 = pandas.Series()
s1

Series([], dtype: float64)

完整创建

s2 = pandas.Series([1, 3, 5, 7, 9], index=['a', 'b', 'c', 'd', 'e'])
s2

a    1
b    3
c    5
d    7
e    9
dtype: int64

s2.values

array([1, 3, 5, 7, 9])

s2.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

使用字典创建 Series

pandas.Series({'a': 1, 'b': 2, 'c': 3})

a    1
b    2
c    3
dtype: int64

增加新的元素

s2['f'] = 11
s2

a     1
b     3
c     5
d     7
e     9
f    11
dtype: int64

提取和切片

调用方法

s2 = pandas.Series([1, 3, 5, 7, 9, 11], index=['a', 'b', 'c', 'd', 'e', 'f'])
s2.head()

a    1
b    3
c    5
d    7
e    9
dtype: int64

s2.head(3)

a    1
b    3
c    5
dtype: int64

s2.tail()

b     3
c     5
d     7
e     9
f    11
dtype: int64

s2.tail(2)

e     9
f    11
dtype: int64

s2.take([2, 0])

c    5
a    1
dtype: int64

利用位置或者标签

s2[[1,2,3]]

b    3
c    5
d    7
dtype: int64

s2[['a', 'c']]

a    1
c    5
dtype: int64

s2[0:2]

a    1
b    3
dtype: int64

s2['a': 'c']

a    1
b    3
c    5
dtype: int64

时间序列

“index” 设置为 DatetimeIndex 对象即可.

如果采用时间序列数据, 我们经常要:

创建固定时间间隔的时间序列
将某个时间序列转变为另一种时间间隔的时间序列
基于某个非标准时间间隔, 计算相对日期(比如5个工作日之后).

import pandas as pd
import numpy as np
from datetime import datetime

创建

1. 创建一个时间范围的数据:

# 从2011-01午夜开始的72小时
rng = pd.date_range('2011-1-1', periods=72, freq='H')
rng[:5] 
# 返回类型是DatetimeIndex

DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
               '2011-01-01 02:00:00', '2011-01-01 03:00:00',
               '2011-01-01 04:00:00'],
              dtype='datetime64[ns]', freq='H')

2. 使用`DatetimeIndex`创建时间序列

ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.head()

2011-01-01 00:00:00    2.708266
2011-01-01 01:00:00   -1.981343
2011-01-01 02:00:00    0.690045
2011-01-01 03:00:00   -2.151155
2011-01-01 04:00:00    1.436621
Freq: H, dtype: float64

3. 你也可以更改时间频率(间隔)并填充

# 更改为30分钟间隔, 并且前向填充
converted = ts.asfreq('30Min', method='pad')
converted.head()

2011-01-01 00:00:00    2.708266
2011-01-01 00:30:00    2.708266
2011-01-01 01:00:00   -1.981343
2011-01-01 01:30:00   -1.981343
2011-01-01 02:00:00    0.690045
Freq: 30T, dtype: float64

4. 还可以重新抽样

# 每日均值
ts.resample('D').mean()

2011-01-01   -0.256885
2011-01-02    0.088856
2011-01-03    0.098362
Freq: D, dtype: float64

下列表格展示pandas提供的时间相关类

Class	Remarks	创建
Timestamp	Represents a single time stamp	to_datetime, Timestamp
DatetimeIndex	Index of Timestamp	to_datetime, date_range, DatetimeIndex
Period	Represents a single time span	Period
PeriodIndex	Index of Period	period_range, PeriodIndex

timeStamp 时间戳 – 一个时间点

时间戳是时间序列数据最基本的类型, 它将数据值与时间点联系起来. 对于pandas对象来说, 就是使用某个时间点.

pd.Timestamp(datetime(2012, 5, 1)) # from datetime

Timestamp('2012-05-01 00:00:00')

pd.Timestamp('2012-05-01') # from string

Timestamp('2012-05-01 00:00:00')

pd.Timestamp(2012, 5, 1) # from params

Timestamp('2012-05-01 00:00:00')

Period – 时间段

通常使用时间段建立起与某个事物或者事件的关联, 比如变化. 时间段可以直接指定, 也可以推测.

pd.Period('2011-01') # 从格式猜出来

Period('2011-01', 'M')

pd.Period('2012-05', freq='D') # 指定

Period('2012-05-01', 'D')

Timestamp, Period 和 DatetimeIndex, PeriodIndex 的联系

Timestamp 和 Period 可以作为索引. Timestamp 和 Period 的数组可以自动分别结合成为 DatetimeIndex 和 PeriodIndex.

使用 Timestamp

dates = [pd.Timestamp('2012-05-01'), pd.Timestamp('2012-05-02'), pd.Timestamp('2012-05-03')]
ts = pd.Series(np.random.randn(3), dates)
type(ts.index)

pandas.tseries.index.DatetimeIndex

ts.index # freq=None

DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)

ts

2012-05-01   -0.659063
2012-05-02    0.632789
2012-05-03   -1.098881
dtype: float64

使用 Period

periods = [pd.Period('2012-01'), pd.Period('2012-02'), pd.Period('2012-03')]
ts = pd.Series(np.random.randn(3), periods)
type(ts.index)

pandas.tseries.period.PeriodIndex

ts.index # frep='M'

PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M')

ts

2012-01    1.073730
2012-02    2.660690
2012-03   -0.210098
Freq: M, dtype: float64

其实, pandas 使用 Timestamp 的实例代表各个时间戳, 使用 DatetimeIndex 代表一系列时间戳. 对于规则的时间跨度, pandas 使用 Period 对象代表标量, PeriodIndex 代表一系列时间跨度.

大量创建

Timestamp() 不接受列表等可迭代对象, 如果想创建一个 Series 对象,需要对每个日期单独应用一次 Timestamp() 函数, 很麻烦. 因此, 我们一般使用 to_datetime() 方法来将 Series 的 index 属性转换为 DatatimeIndex.

dates = ['2016-01-01', '2016-01-02', '2016-01-03']
ts = pd.Series([1, 2, 3], index=pd.to_datetime(dates))
ts

2016-01-01    1
2016-01-02    2
2016-01-03    3
dtype: int64

ts.index

DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03'], dtype='datetime64[ns]', freq=None)

ts.index[0]

Timestamp('2016-01-01 00:00:00')

也可以直接使用 datetime 对象, pandas 会自动将其转换为 Timestamp 对象:

dates = [datetime(2016,1,1), datetime(2016,1,2), datetime(2016,1,3)]
ts = pd.Series([1, 2, 3], index=dates)
ts

2016-01-01    1
2016-01-02    2
2016-01-03    3
dtype: int64

ts.index

Index([2016-01-01, 2016-01-02, 2016-01-03], dtype='object')

时序序列之间的算数运算会自动按照时间对齐

截取时间段数据

ts['2016-01-02']

ts['2016']

2016-01-01    1
2016-01-02    2
2016-01-03    3
dtype: int64

ts['20160102']

ts['20160101':'20160102']

2016-01-01    1
2016-01-02    2
dtype: int64

滞后或者超前操作

使用 shift（）

ts.shift(1) # 滞后

2016-01-01    NaN
2016-01-02    1.0
2016-01-03    2.0
dtype: float64

ts.shift(-1) # 超前

2016-01-01    2.0
2016-01-02    3.0
2016-01-03    NaN
dtype: float64

计算收益率的话，可以：

price = pd.Series([10, 30, 40, 50, 60], 
                 index=pd.date_range('2011-1-1', periods=5, freq='D'))
(price - price.shift(1)) / price.shift(1)

2011-01-01         NaN
2011-01-02    2.000000
2011-01-03    0.333333
2011-01-04    0.250000
2011-01-05    0.200000
Freq: D, dtype: float64

学习心得8 - Pandas -- Series

创建

默认创建

空的

完整创建

使用字典创建 Series

增加新的元素

提取和切片

调用方法

利用位置或者标签

时间序列

创建

1. 创建一个时间范围的数据:

2. 使用`DatetimeIndex`创建时间序列

3. 你也可以更改时间频率(间隔)并填充

4. 还可以重新抽样

timeStamp 时间戳 – 一个时间点

Period – 时间段

Timestamp, Period 和 DatetimeIndex, PeriodIndex 的联系

使用 Timestamp

使用 Period

大量创建

截取时间段数据

滞后或者超前操作

Similar Posts

Comments

学习心得8 - Pandas -- Series

创建

默认创建

空的

完整创建

使用字典创建 Series

增加新的元素

提取和切片

调用方法

利用位置或者标签

时间序列

创建

1. 创建一个时间范围的数据:

2. 使用DatetimeIndex创建时间序列

3. 你也可以更改时间频率(间隔)并填充

4. 还可以重新抽样

timeStamp 时间戳 – 一个时间点

Period – 时间段

Timestamp, Period 和 DatetimeIndex, PeriodIndex 的联系

使用 Timestamp

使用 Period

大量创建

截取时间段数据

滞后或者超前操作

Similar Posts

Comments

2. 使用`DatetimeIndex`创建时间序列