Python----初識(shí)pandas

猿友 2020-12-31 16:22:47 瀏覽數(shù) (3273)

反饋

前言

pandas 是基于 Numpy 的一種工具，該工具是為解決數(shù)據(jù)分析任務(wù)而創(chuàng)建的，pandas 納入了大量庫和一些標(biāo)準(zhǔn)的數(shù)據(jù)模型，提供了高效的操作大型數(shù)據(jù)集所需要的的工具，pandas 提供了大量能使我們快速便捷地處理數(shù)據(jù)的函數(shù)和方法。推薦好課：Pandas 中文教程、Python3進(jìn)階：數(shù)據(jù)分析及可視化。

一、pandas 操作流程

表格數(shù)據(jù)的增刪改查；
實(shí)現(xiàn)多表格處理；
數(shù)據(jù)清洗操作：缺失值,重復(fù)值，異常值，數(shù)據(jù)標(biāo)準(zhǔn)化，數(shù)據(jù)轉(zhuǎn)換操作；
實(shí)現(xiàn) excel 的特殊操作，生成透視表，交叉表；
完成統(tǒng)計(jì)分析。

二、pandas 的創(chuàng)建

1、導(dǎo)入 pandas 庫

import pandas as pd

2、表結(jié)構(gòu)數(shù)據(jù)，構(gòu)建 Dataframe

columns：列索引 index：行索引 values：元素?cái)?shù)據(jù)

方式一：

df = pd.DataFrame(

data=[['alex', 20, '男','0831'],['tom', 30, '女', '0830'],],

index=['a','b'], # 可以不寫，默認(rèn)從0開始，也可以直接指定字符進(jìn)行排序

columns=['name', 'age', 'sex', 'class'],

) # 構(gòu)建方法

print(df) # 打印數(shù)據(jù)

name  age sex class
a  alex   20   男  0831
b   tom   30   女  0830

方式二：

df1 = pd.DataFrame(data={'name':['tom', 'alex'], 'age':[18,20], 'sex':['男','女'], 'class':['0831','0831']})

print(df) # 打印數(shù)據(jù)，沒有指定index字符排序時(shí)，默認(rèn)從0開始排序

name  age sex class
0  alex   20   男  0831
1   tom   30   女  0830

3、dataframe 的屬性

因?yàn)?pandas 基于 numpy，因此，numpy 的 ndarray 的屬性，dataframe 也同樣具有。

df.shape # 結(jié)構(gòu)

df.ndim # 維度

df.size # 數(shù)量

df.dtypes # 元素的數(shù)據(jù)類型

df.columns # 列索引

df.index # 行索引

df.values # 元素

三、df 的查找

1、索引某一列值

df1[‘name’] 一維的切法，返回的是 series

print(df1['name']) # 切一列值的方法

0 tom
1 alex

2、切多列值的方法

print(df1[['name', 'age']])

name age

0 tom 18

1 alex 20

print(type(df1[['name', 'age']])) # series 是一維的類型，只有一個(gè)軸

3、索引切的方法

方法一：

print(df[['name', 'age']][:2]) # 不能指定行進(jìn)行索引

name age

a alex 20

b tom 30

方法二：

索引切的方法： df.loc[行索引名稱、條件，列的索引名稱]

print(df.loc['a', 'name'])

alex

df.loc['a', ['name']] # <class 'pandas.core.series.Series'> 行或者列，只要有一個(gè)為字符串，是一維

df.loc[['a'], ['name']] # <class 'pandas.core.frame.DataFrame'> 行或者列，兩個(gè)參數(shù)都為列表，是二維

4、條件索引： bool 切片

mask = df['age']>18 # 返回所有大于18歲的同學(xué)，返回True， False

mask2 = df['sex'] == '女' # 返回所有女的同學(xué)

mask3 = mask & mask2 # 將兩個(gè)mask進(jìn)行結(jié)合，不能使用and，只能使用 & 邏輯與

print(mask3)

a False

b True

dtype: bool

print(df.loc[mask3, :]) # 利用mask，對(duì)數(shù)據(jù)進(jìn)行切片

name age sex class

b tom 30 女 0830

5、索引查詢： iloc 【行的索引，列的索引】 # 前閉后開

print(df.iloc[:1, :])

name age sex class

a alex 20 男 0831

四、df增加方法

1、鍵值對(duì)添加列

# df['address'] = ['北京', '上海'] 兩種方式，一一對(duì)應(yīng)，直接等于‘北京’，則所有數(shù)據(jù)都會(huì)變成北京

df['address'] = '北京'

name age sex class address

a alex 20 男 0831 北京

b tom 30 女 0830 北京

2、append 增加行

df_mini = pd.DataFrame(data = {

'name':['jerry', 'make'],

'age':[15, 18],

'sex':['男', '女'],

'class':['0831', '0770'],

'address':['北京', '河南']

}, index = ['a', 'b'])

df4 = df.append(df_mini)

print(df4)

a alex 20 男 0831 北京

b tom 30 女 0830 北京

a jerry 15 男 0831 北京

b make 18 女 0770 河南

五、刪除方法

axis ：刪除的行或者列

inplace：是否修改原始表

a = df4.drop(labels=['address', 'class'], axis=1) # 刪除列需要使用一個(gè)變量接受

df4.drop(labels=['a'], axis=0, inplace=True)

六、修改

切出指定數(shù)據(jù)，再進(jìn)行賦值修改

c = df4.loc[df4['name'] == 'tom', 'class'] = '有問題'

print(c)

name age sex class address

a alex 20 男 0831 北京

b tom 30 女有問題北京

a jerry 15 男 0831 北京

b make 18 女 0770 河南

七、統(tǒng)計(jì)分析

1、延用了 Numpy 中的 10 個(gè)統(tǒng)計(jì)方法

min() argmin() max() argmax() std() vat() sum() mean() cumsum() cumprod()

2、pandas 中的方法

df['age'].min() df['age'].max() df['age'].argsort()

3、眾數(shù)、非空元素、頻數(shù)

df['age'].mode()

a grade

b grade

dtype: object

df['age'].count()

tom 1

make 1

alex 1

jerry 1

Name: name, dtype: int64

df['age'].value_counts()

name alex

age 20

sex 女

class 0830

address 北京

dtype: object

4、針對(duì) df 類型

df['age'].idxmax(axis=1) # 橫向比較

df['age'].idxmax(axis=0) # 縱向比較

name age sex class address

0 alex 15 女 0831 北京

1 jerry 18 男 NaN NaN

2 make 20 NaN NaN NaN

3 tom 30 NaN NaN NaN

5、描述 describe

df['age'].describe()

# age

# count 4.00 非空數(shù)目

# mean 20.75 平均值

# std 6.50 標(biāo)準(zhǔn)差

# min 15.00 最小

# 25% 17.25 1/4

# 50% 19.00 2/4

# 75% 22.50 3/4

# max 30.00 最大

df['name'].describe()

# count ：非空數(shù)目

# unique：去重之后有幾個(gè)值

# top：眾數(shù)

# freq：眾數(shù)出現(xiàn)的頻數(shù)

八、Excel 文件的讀取

pandas 可以讀取多種數(shù)據(jù)類型,這里介紹下讀取 excel 數(shù)據(jù)的操作方法

pd.read_excel(r'文件路徑')

Python 庫

0 人點(diǎn)贊

Python----初識(shí)pandas

前言

一、pandas 操作流程

二、pandas 的創(chuàng)建

2、表結(jié)構(gòu)數(shù)據(jù)，構(gòu)建 Dataframe

3、dataframe 的屬性

三、df 的查找

四、df增加方法

五、刪除方法

六、修改

七、統(tǒng)計(jì)分析

八、Excel 文件的讀取

二、pandas 的創(chuàng)建

2、表結(jié)構(gòu)數(shù)據(jù)，構(gòu)建 Dataframe

三、df 的查找

四、df增加方法

五、刪除方法

八、Excel 文件的讀取