站長資訊平臺(tái)

首頁 > IDC資訊 > IDC新聞

Matplotlib數(shù)據(jù)可視化最有價(jià)值的50個(gè)圖表（附完整Python源代碼）

2019-01-17 來源：raincent

容器云強(qiáng)勢(shì)上線！快速搭建集群，上萬Linux鏡像隨意使用

本文總結(jié)了50個(gè)圖表繪制方法，對(duì)于數(shù)據(jù)分析的可視化有莫大的作用。

from Unsplash by @Mike Enerio

Tips：

本文原文部分代碼有不準(zhǔn)確的地方，已進(jìn)行修改;

所有正確的源代碼，已整合到 jupyter notebook 文件中;

運(yùn)行本文代碼，除了安裝 matplotlib 和 seaborn 可視化庫外，還需要安裝其他的一些輔助可視化庫，已在代碼部分作標(biāo)注，具體內(nèi)容請(qǐng)查看下面文章內(nèi)容。

在數(shù)據(jù)分析和可視化中最有用的 50 個(gè) Matplotlib 圖表。這些圖表列表允許您使用 python 的 matplotlib 和 seaborn 庫選擇要顯示的可視化對(duì)象。

介紹

這些圖表根據(jù)可視化目標(biāo)的7個(gè)不同情景進(jìn)行分組。例如，如果要想象兩個(gè)變量之間的關(guān)系，請(qǐng)查看“關(guān)聯(lián)”部分下的圖表。或者，如果您想要顯示值如何隨時(shí)間變化，請(qǐng)查看“變化”部分，依此類推。

有效圖表的重要特征：

在不歪曲事實(shí)的情況下傳達(dá)正確和必要的信息。

設(shè)計(jì)簡單，您不必太費(fèi)力就能理解它。

從審美角度支持信息而不是掩蓋信息。

信息沒有超負(fù)荷。

準(zhǔn)備工作

在代碼運(yùn)行前先引入下面的設(shè)置內(nèi)容。當(dāng)然，單獨(dú)的圖表，可以重新設(shè)置顯示要素。

# !pip install brewer2mpl
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
import warnings; warnings.filterwarnings(action='once')

large = 22; med = 16; small = 12
params = {'axes.titlesize': large,
          'legend.fontsize': med,
          'figure.figsize': (16, 10),
          'axes.labelsize': med,
          'axes.titlesize': med,
          'xtick.labelsize': med,
          'ytick.labelsize': med,
          'figure.titlesize': large}
plt.rcParams.update(params)
plt.style.use('seaborn-whitegrid')
sns.set_style("white")
%matplotlib inline

# Version
print(mpl.__version__)  #> 3.0.0
print(sns.__version__)  #> 0.9.0

一、關(guān)聯(lián) (Correlation)

關(guān)聯(lián)圖表用于可視化2個(gè)或更多變量之間的關(guān)系。也就是說，一個(gè)變量如何相對(duì)于另一個(gè)變化。

1. 散點(diǎn)圖(Scatter plot)

散點(diǎn)圖是用于研究兩個(gè)變量之間關(guān)系的經(jīng)典的和基本的圖表。如果數(shù)據(jù)中有多個(gè)組，則可能需要以不同顏色可視化每個(gè)組。在 matplotlib 中，您可以使用 plt.scatterplot() 方便地執(zhí)行此操作。

# Import dataset 
midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")

# Prepare Data 
# Create as many colors as there are unique midwest['category']
categories = np.unique(midwest['category'])
colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]

# Draw Plot for Each Category
plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')

for i, category in enumerate(categories):
    plt.scatter('area', 'poptotal', 
                data=midwest.loc[midwest.category==category, :], 
                s=20, c=colors[i], label=str(category))

# Decorations
plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
              xlabel='Area', ylabel='Population')

plt.xticks(fontsize=12); plt.yticks(fontsize=12)
plt.title("Scatterplot of Midwest Area vs Population", fontsize=22)
plt.legend(fontsize=12)    
plt.show()

2. 帶邊界的氣泡圖(Bubble plot with Encircling)

有時(shí)，您希望在邊界內(nèi)顯示一組點(diǎn)以強(qiáng)調(diào)其重要性。在這個(gè)例子中，你從數(shù)據(jù)框中獲取記錄，并用下面代碼中描述的 encircle() 來使邊界顯示出來。

from matplotlib import patches
from scipy.spatial import ConvexHull
import warnings; warnings.simplefilter('ignore')
sns.set_style("white")

# Step 1: Prepare Data
midwest = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/midwest_filter.csv")

# As many colors as there are unique midwest['category']
categories = np.unique(midwest['category'])
colors = [plt.cm.tab10(i/float(len(categories)-1)) for i in range(len(categories))]

# Step 2: Draw Scatterplot with unique color for each category
fig = plt.figure(figsize=(16, 10), dpi= 80, facecolor='w', edgecolor='k')    

for i, category in enumerate(categories):
    plt.scatter('area', 'poptotal', data=midwest.loc[midwest.category==category, :], s='dot_size', c=colors[i], label=str(category), edgecolors='black', linewidths=.5)

# Step 3: Encircling
# https://stackoverflow.com/questions/44575681/how-do-i-encircle-different-data-sets-in-scatter-plot
def encircle(x,y, ax=None, **kw):
    if not ax: ax=plt.gca()
    p = np.c_[x,y]
    hull = ConvexHull(p)
    poly = plt.Polygon(p[hull.vertices,:], **kw)
    ax.add_patch(poly)

# Select data to be encircled
midwest_encircle_data = midwest.loc[midwest.state=='IN', :]                         

# Draw polygon surrounding vertices    
encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="k", fc="gold", alpha=0.1)
encircle(midwest_encircle_data.area, midwest_encircle_data.poptotal, ec="firebrick", fc="none", linewidth=1.5)

# Step 4: Decorations
plt.gca().set(xlim=(0.0, 0.1), ylim=(0, 90000),
              xlabel='Area', ylabel='Population')

plt.xticks(fontsize=12); plt.yticks(fontsize=12)
plt.title("Bubble Plot with Encircling", fontsize=22)
plt.legend(fontsize=12)    
plt.show()

3. 帶線性回歸最佳擬合線的散點(diǎn)圖 (Scatter plot with linear regression line of best fit)

如果你想了解兩個(gè)變量如何相互改變，那么最佳擬合線就是常用的方法。下圖顯示了數(shù)據(jù)中各組之間最佳擬合線的差異。要禁用分組并僅為整個(gè)數(shù)據(jù)集繪制一條最佳擬合線，請(qǐng)從下面的 sns.lmplot()調(diào)用中刪除 hue ='cyl'參數(shù)。

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
df_select = df.loc[df.cyl.isin([4,8]), :]

# Plot
sns.set_style("white")
gridobj = sns.lmplot(x="displ", y="hwy", hue="cyl", data=df_select, 
                     height=7, aspect=1.6, robust=True, palette='tab10', 
                     scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))

# Decorations
gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
plt.title("Scatterplot with line of best fit grouped by number of cylinders", fontsize=20)
plt.show()

針對(duì)每列繪制線性回歸線：

或者，可以在其每列中顯示每個(gè)組的最佳擬合線。可以通過在 sns.lmplot() 中設(shè)置 col=groupingcolumn 參數(shù)來實(shí)現(xiàn)，如下：

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
df_select = df.loc[df.cyl.isin([4,8]), :]

# Each line in its own column
sns.set_style("white")
gridobj = sns.lmplot(x="displ", y="hwy", 
                     data=df_select, 
                     height=7, 
                     robust=True, 
                     palette='Set1', 
                     col="cyl",
                     scatter_kws=dict(s=60, linewidths=.7, edgecolors='black'))

# Decorations
gridobj.set(xlim=(0.5, 7.5), ylim=(0, 50))
plt.show()

4. 抖動(dòng)圖 (Jittering with stripplot)

通常，多個(gè)數(shù)據(jù)點(diǎn)具有完全相同的 X 和 Y 值。結(jié)果，多個(gè)點(diǎn)繪制會(huì)重疊并隱藏。為避免這種情況，請(qǐng)將數(shù)據(jù)點(diǎn)稍微抖動(dòng)，以便您可以直觀地看到它們。使用 seaborn 的 stripplot() 很方便實(shí)現(xiàn)這個(gè)功能。

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")

# Draw Stripplot
fig, ax = plt.subplots(figsize=(16,10), dpi= 80)    
sns.stripplot(df.cty, df.hwy, jitter=0.25, size=8, ax=ax, linewidth=.5)

# Decorations
plt.title('Use jittered plots to avoid overlapping of points', fontsize=22)
plt.show()

5. 計(jì)數(shù)圖 (Counts Plot)

避免點(diǎn)重疊問題的另一個(gè)選擇是增加點(diǎn)的大小，這取決于該點(diǎn)中有多少點(diǎn)。因此，點(diǎn)的大小越大，其周圍的點(diǎn)的集中度越高。

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")
df_counts = df.groupby(['hwy', 'cty']).size().reset_index(name='counts')

# Draw Stripplot
fig, ax = plt.subplots(figsize=(16,10), dpi= 80)    
sns.stripplot(df_counts.cty, df_counts.hwy, size=df_counts.counts*2, ax=ax)

# Decorations
plt.title('Counts Plot - Size of circle is bigger as more points overlap', fontsize=22)
plt.show()

6. 邊緣直方圖 (Marginal Histogram)

邊緣直方圖具有沿 X 和 Y 軸變量的直方圖。這用于可視化 X 和 Y 之間的關(guān)系以及單獨(dú)的 X 和 Y 的單變量分布。這種圖經(jīng)常用于探索性數(shù)據(jù)分析(EDA)。

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")

# Create Fig and gridspec
fig = plt.figure(figsize=(16, 10), dpi= 80)
grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2)

# Define the axes
ax_main = fig.add_subplot(grid[:-1, :-1])
ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[])
ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])

# Scatterplot on main ax
ax_main.scatter('displ', 'hwy', s=df.cty*4, c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, cmap="tab10", edgecolors='gray', linewidths=.5)

# histogram on the right
ax_bottom.hist(df.displ, 40, histtype='stepfilled', orientation='vertical', color='deeppink')
ax_bottom.invert_yaxis()

# histogram in the bottom
ax_right.hist(df.hwy, 40, histtype='stepfilled', orientation='horizontal', color='deeppink')

# Decorations
ax_main.set(title='Scatterplot with Histograms \n displ vs hwy', xlabel='displ', ylabel='hwy')
ax_main.title.set_fontsize(20)
for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):
    item.set_fontsize(14)

xlabels = ax_main.get_xticks().tolist()
ax_main.set_xticklabels(xlabels)
plt.show()

7. 邊緣箱形圖 (Marginal Boxplot)

邊緣箱圖與邊緣直方圖具有相似的用途。然而，箱線圖有助于精確定位 X 和 Y 的中位數(shù)、第25和第75百分位數(shù)。

# Import Data
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/mpg_ggplot2.csv")

# Create Fig and gridspec
fig = plt.figure(figsize=(16, 10), dpi= 80)
grid = plt.GridSpec(4, 4, hspace=0.5, wspace=0.2)

# Define the axes
ax_main = fig.add_subplot(grid[:-1, :-1])
ax_right = fig.add_subplot(grid[:-1, -1], xticklabels=[], yticklabels=[])
ax_bottom = fig.add_subplot(grid[-1, 0:-1], xticklabels=[], yticklabels=[])

# Scatterplot on main ax
ax_main.scatter('displ', 'hwy', s=df.cty*5, c=df.manufacturer.astype('category').cat.codes, alpha=.9, data=df, cmap="Set1", edgecolors='black', linewidths=.5)

# Add a graph in each part
sns.boxplot(df.hwy, ax=ax_right, orient="v")
sns.boxplot(df.displ, ax=ax_bottom, orient="h")

# Decorations ------------------
# Remove x axis name for the boxplot
ax_bottom.set(xlabel='')
ax_right.set(ylabel='')

# Main Title, Xlabel and YLabel
ax_main.set(title='Scatterplot with Histograms \n displ vs hwy', xlabel='displ', ylabel='hwy')

# Set font size of different components
ax_main.title.set_fontsize(20)
for item in ([ax_main.xaxis.label, ax_main.yaxis.label] + ax_main.get_xticklabels() + ax_main.get_yticklabels()):
    item.set_fontsize(14)

plt.show()

8. 相關(guān)圖 (Correllogram)

相關(guān)圖用于直觀地查看給定數(shù)據(jù)框(或二維數(shù)組)中所有可能的數(shù)值變量對(duì)之間的相關(guān)度量。

# Import Dataset
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")

# Plot
plt.figure(figsize=(12,10), dpi= 80)
sns.heatmap(df.corr(), xticklabels=df.corr().columns, yticklabels=df.corr().columns, cmap='RdYlGn', center=0, annot=True)

# Decorations
plt.title('Correlogram of mtcars', fontsize=22)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.show()

9. 矩陣圖 (Pairwise Plot)

矩陣圖是探索性分析中的最愛，用于理解所有可能的數(shù)值變量對(duì)之間的關(guān)系。它是雙變量分析的必備工具。

# Load Dataset
df = sns.load_dataset('iris')

# Plot
plt.figure(figsize=(10,8), dpi= 80)
sns.pairplot(df, kind="scatter", hue="species", plot_kws=dict(s=80, edgecolor="white", linewidth=2.5))
plt.show()

# Load Dataset
df = sns.load_dataset('iris')

# Plot
plt.figure(figsize=(10,8), dpi= 80)
sns.pairplot(df, kind="reg", hue="species")
plt.show()

二、偏差 (Deviation)

10. 發(fā)散型條形圖 (Diverging Bars)

如果您想根據(jù)單個(gè)指標(biāo)查看項(xiàng)目的變化情況，并可視化此差異的順序和數(shù)量，那么散型條形圖 (Diverging Bars) 是一個(gè)很好的工具。它有助于快速區(qū)分?jǐn)?shù)據(jù)中組的性能，并且非常直觀，并且可以立即傳達(dá)這一點(diǎn)。

# Prepare Data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mtcars.csv")
x = df.loc[:, ['mpg']]
df['mpg_z'] = (x - x.mean())/x.std()
df['colors'] = ['red' if x < 0 else 'green' for x in df['mpg_z']]
df.sort_values('mpg_z', inplace=True)
df.reset_index(inplace=True)

# Draw plot
plt.figure(figsize=(14,10), dpi= 80)
plt.hlines(y=df.index, xmin=0, xmax=df.mpg_z, color=df.colors, alpha=0.4, linewidth=5)

# Decorations
plt.gca().set(ylabel='$Model$', xlabel='$Mileage$')
plt.yticks(df.index, df.cars, fontsize=12)
plt.title('Diverging Bars of Car Mileage', fontdict={'size':20})
plt.grid(linestyle='--', alpha=0.5)
plt.show()

標(biāo)簽： isp 代碼數(shù)據(jù)分析

版權(quán)申明：本站文章部分自網(wǎng)絡(luò)，如有侵權(quán)，請(qǐng)聯(lián)系：west999com@outlook.com
特別注意：本站所有轉(zhuǎn)載文章言論不代表本站觀點(diǎn)！
本站所提供的圖片等素材，版權(quán)歸原作者所有，如需使用，請(qǐng)與原作者聯(lián)系。

上一篇:數(shù)據(jù)分析慣用的5種思維方法

下一篇:2018 年度GtiHub開源項(xiàng)目 TOP 25：數(shù)據(jù)科學(xué) & 機(jī)器學(xué)習(xí)

相關(guān)文章

最新資訊

熱門推薦

為學(xué)習(xí)和知識(shí)分享目的，本站文章部分自網(wǎng)絡(luò)，本站文章部分自網(wǎng)絡(luò)，如有侵權(quán)，請(qǐng)聯(lián)系：2653426586@qq.com QQ：2653426586

如有其他需求，請(qǐng)聯(lián)系：2653426586@qq.com QQ：2653426586

友情鏈接：網(wǎng)絡(luò)安全運(yùn)維經(jīng)驗(yàn) IT技術(shù)分享運(yùn)維隨筆錄鮮花東郊到家往約到家

中文字幕在线观看,亚洲а∨天堂久久精品9966,亚洲成a人片在线观看你懂的,亚洲av成人片无码网站,亚洲国产精品无码久久久五月天

Matplotlib數(shù)據(jù)可視化最有價(jià)值的50個(gè)圖表（附完整Python源代碼）