11-26 11:18 阅读 129

Pandas的基本方法(pandas使用教程)

Pandas 基本方法实例

到目前为止，我们了解了三个Pandas DataStructures以及如何创建它们。由于它在实时数据处理中的重要性，因此我们将主要关注DataFrame对象，并讨论其他一些DataStructures。

方法	描述
axes	返回行轴标签的列表
dtype	返回对象的dtype。
empty	如果Series为空，则返回True。
ndim	根据定义返回基础数据的维数。
size	返回基础数据中的元素数。
values	将Series返回为ndarray。
head()	返回前n行。
tail()	返回最后n行。

接下来我们创建一个Series，并看看上所有列表的属性操作。

示例

 import pandas as pd
 import numpy as np
 # 用100随机数创建一个Series s = pd.Series(np.random.randn(4))
 print(s)

运行结果：

0   0.9678531  -0.1483682  -1.3959063  -1.758394dtype: float64

axes

返回Series标签的列表

示例

 import pandas as pd
 import numpy as np
 # 用100随机数创建一个Series s = pd.Series(np.random.randn(4))
 print ("The axes are:")
 print(s.axes)

运行结果：

 The axes are:
 [RangeIndex(start=0, stop=4, step=1)]

以上结果是0到5（即[0,1,2,3,4]）。

empty

返回布尔值，说明对象是否为空。True表示对象为空

示例

 import pandas as pd
 import numpy as np
 # 用100随机数创建一个Series s = pd.Series(np.random.randn(4))
 print ("Is the Object empty?")
 print(s.empty)

运行结果：

Is the Object empty?False

ndim

返回对象的维数。根据定义，Series 是一个1D 数据结构，所以它返回

示例

 import pandas as pd
 import numpy as np
 # 用4个随机数创建一个Series s = pd.Series(np.random.randn(4))
 print s
 print ("The dimensions of the object:")
 print(s.ndim)

运行结果：

     0   0.1758981   0.1661972  -0.6097123  -1.377000dtype: float64The dimensions of the object:1

size

返回Series的大小（长度）.

示例

 import pandas as pd
 import numpy as np
 # 用4个随机数创建一个Series s = pd.Series(np.random.randn(2))
 print s
 print ("The size of the object:")
 print(s.size)

运行结果：

0   3.0780581  -1.207803dtype: float64The size of the object:2

values

以数组形式返回Series数据

示例

 import pandas as pd
 import numpy as np
 # 用4个随机数创建一个Series s = pd.Series(np.random.randn(4))
 print s
 print ("The actual data series is:")
 print(s.values)

运行结果：

0   1.7873731  -0.6051592   0.1804773  -0.140922dtype: float64The actual data series is:[ 1.78737302 -0.60515881 0.18047664 -0.1409218 ]

Head 和 Tail

要查看Series或DataFrame对象的头尾数据，请使用head() 和tail() 方法。

head() 返回前n行（观察索引值）。默认显示的元素数是5，但是您可以传递自定义数字。

示例

 import pandas as pd
 import numpy as np
 # 用4个随机数创建一个Series s = pd.Series(np.random.randn(4))
 print ("最初的系列是:")
 print s
 print ("数据系列的前两行:")
 print(s.head(2))

运行结果：

最初的系列是:0   0.7208761  -0.7658982   0.4792213  -0.139547dtype: float64数据系列的前两行:0   0.7208761  -0.765898dtype: float64

tail() 返回最后n行（观察索引值）。默认显示的元素数是5，但是您可以传递自定义数字。

示例

 import pandas as pd
 import numpy as np
 # 用4个随机数创建一个Series s = pd.Series(np.random.randn(4))
 print("最初的系列是:")
 print(s)
 print("数据序列的最后两行:")
 print(s)tail(2)

运行结果：

最初的系列是:0 -0.6550911 -0.8814072 -0.6085923 -2.341413dtype: float64数据序列的最后两行:2 -0.6085923 -2.341413dtype: float64

DataFrame 基本功能

现在让我们了解什么是DataFrame基本功能。下表列出了有助于DataFrame基本功能的重要属性或方法。

属性/方法	描述
T	行和列互相转换
axes	返回以行轴标签和列轴标签为唯一成员的列表。
dtypes	返回此对象中的dtypes。
empty	如果NDFrame完全为空[没有项目]，则为true；否则为false。如果任何轴的长度为0。
ndim	轴数/数组尺寸。
shape	返回表示DataFrame维度的元组。
size	NDFrame中的元素数。
values	NDFrame的数字表示。
head()	返回前n行。
tail()	返回最后n行。

下面我们下创建一个DataFrame并查看上述属性的所有操作方式。

Example

示例

 import pandas as pd
 import numpy as np
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]),
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("Our data series is:")
 print(df)

运行结果：

Our data series is:    Age   Name    Rating0   25    Tom     4.231   26    James   3.242   25    Ricky   3.983   23    Vin     2.564   30    Steve   3.205   29    Smith   4.606   23    Jack    3.80

T (Transpose)

返回DataFrame的转置。行和列将互换。

示例

 import pandas as pd
 import numpy as np
  
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]),
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("数据序列的转置是:")
 print(df.T)

运行结果：

数据序列的转置是:         0     1       2      3      4      5       6Age      25    26      25     23     30     29      23Name     Tom   James   Ricky  Vin    Steve  Smith   JackRating   4.23  3.24    3.98   2.56   3.2    4.6     3.8

axes

返回行轴标签和列轴标签的列表。

示例

 import pandas as pd
 import numpy as np
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]),
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("行轴标签和列轴标签是:")
 print(df.axes)

运行结果：

  行轴标签和列轴标签是:
 [RangeIndex(start=0, stop=7, step=1), Index([u'Age', u'Name', u'Rating'],
 dtype='object')]

dtypes

返回每一列的数据类型。

示例

 import pandas as pd
 import numpy as np
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]),
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("每列的数据类型如下:")
 print(df.dtypes)

运行结果：

每列的数据类型如下:Age     int64
Name    object
Rating  float64dtype: object

empty

返回布尔值，说明对象是否为空；True表示对象为空。

示例

 import pandas as pd
 import numpy as np
  
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]),
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
  
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("Is the object empty?")
 print(df.empty)

运行结果：

 Is the object empty? False

ndim

返回对象的数量。根据定义，DataFrame是2D对象。

示例

 import pandas as pd
 import numpy as np
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]),
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("Our object is:")
 print df
 print ("The dimension of the object is:")
 print(df.ndim)

运行结果：

     Our object is:      Age    Name     Rating0     25     Tom      4.231     26     James    3.242     25     Ricky    3.983     23     Vin      2.564     30     Steve    3.205     29     Smith    4.606     23     Jack     3.80The dimension of the object is:2

shape

返回表示DataFrame维度的元组。元组(a,b)，其中a表示行数，b表示列数。

示例

 import pandas as pd
 import numpy as np
  
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]),
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
  
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("Our object is:")
 print df
 print ("The shape of the object is:")
 print(df.shape)

运行结果：

     Our object is:   Age   Name    Rating0  25    Tom     4.231  26    James   3.242  25    Ricky   3.983  23    Vin     2.564  30    Steve   3.205  29    Smith   4.606  23    Jack    3.80The shape of the object is:(7, 3)

size

返回DataFrame中的元素数。

示例

 import pandas as pd
 import numpy as np
  
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]),
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
  
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("Our object is:")
 print df
 print ("The total number of elements in our object is:")
 print(df.size)

运行结果：

     Our object is:    Age   Name    Rating0   25    Tom     4.231   26    James   3.242   25    Ricky   3.983   23    Vin     2.564   30    Steve   3.205   29    Smith   4.606   23    Jack    3.80The total number of elements in our object is:21

values

以NDarray的形式返回DataFrame中的实际数据。

示例

 import pandas as pd
 import numpy as np
  
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]),
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
  
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("Our object is:")
 print df
 print ("The actual data in our data frame is:")
 print(df.values)

运行结果：

     Our object is:    Age   Name    Rating0   25    Tom     4.231   26    James   3.242   25    Ricky   3.983   23    Vin     2.564   30    Steve   3.205   29    Smith   4.606   23    Jack    3.80The actual data in our data frame is:[[25 'Tom' 4.23][26 'James' 3.24][25 'Ricky' 3.98][23 'Vin' 2.56][30 'Steve' 3.2][29 'Smith' 4.6][23 'Jack' 3.8]]

Head & Tail

要查看DataFrame对象的头尾数据，请使用head()和tail()方法。head() 返回前n行（观察索引值）。默认显示的元素数是5，但是您可以传递自定义数字。

示例

 import pandas as pd
 import numpy as np
  
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]),
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("Our data frame is:")
 print df
 print ("The first two rows of the data frame is:")
 print(df.head(2))

运行结果：

     Our data frame is:    Age   Name    Rating0   25    Tom     4.231   26    James   3.242   25    Ricky   3.983   23    Vin     2.564   30    Steve   3.205   29    Smith   4.606   23    Jack    3.80The first two rows of the data frame is:   Age   Name   Rating0  25    Tom    4.231  26    James  3.24

tail() 返回最后n行（观察索引值）。默认显示的元素数是5，但是您可以传递自定义数字。

示例

 import pandas as pd
 import numpy as np
 # 创建Series字典 d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
    'Age':pd.Series([25,26,25,23,30,29,23]), 
    'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
  
 # 创建一个 DataFrame df = pd.DataFrame(d)
 print ("我们的数据帧是:")
 print df
 print ("数据帧的最后两行是:")
 print(df.tail(2))

运行结果：

我们的数据帧是：    Age   Name    Rating0   25    Tom     4.231   26    James   3.242   25    Ricky   3.983   23    Vin     2.564   30    Steve   3.205   29    Smith   4.606   23    Jack    3.80数据帧的最后两行是:    Age   Name    Rating5   29    Smith    4.66   23    Jack     3.8