This repository has been archived on 2024-01-06. You can view files and clone it, but cannot push or open issues or pull requests.
justhomework/AIandML/e2_matchine_learning/e2.2_linearreg.ipynb

1236 lines
888 KiB
Text
Raw Permalink Normal View History

2022-11-30 02:39:17 +00:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 线性回归实验\n",
"\n",
"在这个练习中我们使用一个Kaggle竞赛中提供的共享单车的数据集[Bike Sharing Demand](https://www.kaggle.com/c/bike-sharing-demand/data)。\n",
"该数据集包含2011到2012年Capital Bikeshare系统中记录的每日每小时单车的租赁数以及相应的季节和气候等信息。\n",
"\n",
"数据列:\n",
"* **datetime** - hourly date + timestamp \n",
"* **season** - 1 = spring, 2 = summer, 3 = fall, 4 = winter \n",
"* **holiday** - whether the day is considered a holiday\n",
"* **workingday** - whether the day is neither a weekend nor holiday\n",
"* **weather** - 1: Clear, Few clouds, Partly cloudy, Partly cloudy2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog \n",
"* **temp** - temperature in Celsius\n",
"* **atemp** - \"feels like\" temperature in Celsius\n",
"* **humidity** - relative humidity\n",
"* **windspeed** - wind speed\n",
"* **casual** - number of non-registered user rentals initiated\n",
"* **registered** - number of registered user rentals initiated\n",
"* **count** - number of total rentals"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第一步:读入数据"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [],
"source": [
"# read the data and set the datetime as the index\n",
"import pandas as pd\n",
"\n",
"bikes = pd.read_csv('e2.2_bikeshare.csv', index_col='datetime', parse_dates=True)"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>season</th>\n",
" <th>holiday</th>\n",
" <th>workingday</th>\n",
" <th>weather</th>\n",
" <th>temp</th>\n",
" <th>atemp</th>\n",
" <th>humidity</th>\n",
" <th>windspeed</th>\n",
" <th>casual</th>\n",
" <th>registered</th>\n",
" <th>count</th>\n",
" </tr>\n",
" <tr>\n",
" <th>datetime</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2011-01-01 00:00:00</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>9.84</td>\n",
" <td>14.395</td>\n",
" <td>81</td>\n",
" <td>0.0</td>\n",
" <td>3</td>\n",
" <td>13</td>\n",
" <td>16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-01-01 01:00:00</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>9.02</td>\n",
" <td>13.635</td>\n",
" <td>80</td>\n",
" <td>0.0</td>\n",
" <td>8</td>\n",
" <td>32</td>\n",
" <td>40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-01-01 02:00:00</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>9.02</td>\n",
" <td>13.635</td>\n",
" <td>80</td>\n",
" <td>0.0</td>\n",
" <td>5</td>\n",
" <td>27</td>\n",
" <td>32</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-01-01 03:00:00</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>9.84</td>\n",
" <td>14.395</td>\n",
" <td>75</td>\n",
" <td>0.0</td>\n",
" <td>3</td>\n",
" <td>10</td>\n",
" <td>13</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-01-01 04:00:00</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>9.84</td>\n",
" <td>14.395</td>\n",
" <td>75</td>\n",
" <td>0.0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" season holiday workingday weather temp atemp \\\n",
"datetime \n",
"2011-01-01 00:00:00 1 0 0 1 9.84 14.395 \n",
"2011-01-01 01:00:00 1 0 0 1 9.02 13.635 \n",
"2011-01-01 02:00:00 1 0 0 1 9.02 13.635 \n",
"2011-01-01 03:00:00 1 0 0 1 9.84 14.395 \n",
"2011-01-01 04:00:00 1 0 0 1 9.84 14.395 \n",
"\n",
" humidity windspeed casual registered count \n",
"datetime \n",
"2011-01-01 00:00:00 81 0.0 3 13 16 \n",
"2011-01-01 01:00:00 80 0.0 8 32 40 \n",
"2011-01-01 02:00:00 80 0.0 5 27 32 \n",
"2011-01-01 03:00:00 75 0.0 3 10 13 \n",
"2011-01-01 04:00:00 75 0.0 0 1 1 "
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"bikes.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第二步:可视化数据\n",
"\n",
"* 用matplotlib画出温度“temp”和自行车租赁数“count”之间的散点图\n",
"* 用seborn画出温度“temp”和自行车租赁数“count”之间带线性关系的散点图提示使用seaborn中的lmplot绘制"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.collections.PathCollection at 0x7f5f9314c460>"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAlcAAAG6CAYAAAAlCWNFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy89olMNAAAACXBIWXMAAA9hAAAPYQGoP6dpAACCZklEQVR4nO2de3hV5Zn2750DIU1IUhSFIAJSQFQQqVjbIAcdQUulWr+q9LJFK0iZUUZRC9QiUIYCHZQK/Uarmc/j1MNUa3FSJlglUKkHRlDkMDbIsQFqLU1C0iTksL4/spO6yD7cZD9ZWWvn/l1Xrlf3flnrXYe93ns9z/M+T8hxHAdCCCGEEMKElM4egBBCCCFEMiFxJYQQQghhiMSVEEIIIYQhEldCCCGEEIZIXAkhhBBCGCJxJYQQQghhiMSVEEIIIYQhaZ09gK5GU1MTDh8+jB49eiAUCnX2cIQQQghB4DgOjh8/jvz8fKSkxLZNSVx5zOHDh9GvX7/OHoYQQggh2sGhQ4dw1llnxewjceUxPXr0ANB8cXJycjp5NEIIIYRgqKysRL9+/Vrn8VhIXHlMiyswJydH4koIIYQIGExIjy8D2p999lnMnDkTF198MTIyMhAKhfDkk09G7V9ZWYk5c+agf//+yMjIwIABA3DfffehqqoqYv+mpiasWbMGw4cPR2ZmJnr16oWpU6di7969UfdRXFyMcePGoUePHsjJycGECRPw+uuvJ3qoQgghhEgyfCmufvjDH+Kxxx7DgQMH0KdPn5h9q6urMW7cOKxatQrnnnsu7r77bgwdOhQrV67E5Zdfjtra2jb/ZubMmZg9ezYcx8Hs2bNx1VVX4eWXX8bo0aNRWlrapv+zzz6Lq666Crt378Ytt9yCadOmYefOnbjyyivxy1/+0uy4hRBCCJEEOD7ktddec/bv3+84juMsW7bMAeA88cQTEfs+8MADDgBn7ty5rs/nzp3rAHB+/OMfuz5/4403HADO2LFjnbq6utbPf/Ob3zgAnIkTJ7r6Hzt2zMnLy3NOP/1059ChQ62fHzp0yDn99NOd008/3amsrKSPraKiwgHgVFRU0P9GCCGEEJ3LqczfvrRc/cM//AP69+8ft5/jOCgsLER2djYWLFjg+m7BggXIzs5GYWGh6/PHH38cALBkyRJ069at9fOrr74a48ePx/r163Hw4MHWz//zP/8T5eXluPPOO12rA8466yzccccd+PTTT/GrX/2qXccphBBCiOTDl+KKpbS0FIcPH0ZBQQGysrJc32VlZaGgoAB79+7FoUOHWj8vKSlp/e5kJk2aBADYuHGjqz8ATJw4keovhBBCiK5N4MUVAAwePDji9y2ft/Srrq7GkSNHMHDgQKSmpsbtH28fkfqfTF1dHSorK11/QgghhEheAi2uKioqAAC5ubkRv29JddDS71T7x/s3kfqfzLJly5Cbm9v6pwSiQgghRHITaHEVBObPn4+KiorWv8+6KIUQQgiRfAQ6iWiLNSma5ajFBdfS71T7n/xvTjvttLj9TyYjIwMZGRmxD0QIIYQQSUOgLVfxYp5OjpfKyspCnz59sG/fPjQ2NsbtH28f8WK+hBBCCNH1CLy4ys/Px+bNm1FdXe36rrq6Gps3b8bAgQNdcU7jxo1r/e5kiouLAQBjx4519QeA9evXR+3f0kcIIYQQItDiKhQKYfr06aiqqsKSJUtc3y1ZsgRVVVWYMWOG6/Pbb78dQHMerBMnTrR+vm7dOpSUlGDixImuHFs33HADcnNzsWbNGvzxj39s/fyPf/wjfvazn+H000/Hdddd1xGHJ4QQQogAEnIcx+nsQZxMYWEh3nzzTQDAhx9+iK1bt6KgoABf+MIXAABjxozB9OnTATRbqAoKCvDBBx9g4sSJGDVqFLZu3Yr169dj9OjR2LhxIzIzM13bnzFjBgoLC3H++edj8uTJOHLkCF544QVkZ2fjrbfewpAhQ1z9n332WXz7299Gr169cOONNwIAXnjhBXz66ad44YUX8M1vfpM+tsrKSuTm5qKiokKFm4UQQoiAcErzd0eni28P06ZNcwBE/Zs2bZqrf3l5uXPXXXc5/fr1c9LT052zzz7bueeee6KWpWlsbHQefvhh5/zzz3cyMjKc0047zbnxxhudPXv2RB3TunXrnMsuu8zJyspysrOznXHjxjmvvfbaKR+byt8IIboSz7y13/nKstedZ97a39lDESIhTmX+9qXlKpmR5UoI0ZUoWP4Gyspr0DcvE5vnXd7ZwxGi3ZzK/B3omCshhBD+Ztb4Qeibl4lZ4wclvK1n3z6AguVv4Nm3DxiMTIiOQ+JKCCFEG6yEzM2X9sfmeZfj5kv7x+8ch0dKPkZZeQ0eKfk44W0J0ZFIXAkhhGiDH4WMpRVMiI4k0BnahRBCdAyzxg/CIyUf+0rI3HxpfxMLmBAdjQLaPUYB7UIIIUTwUEC7EEIIIUQnIXElhBBCCGGIxJUQQgghhCESV0IIkUQoF5QQnY/ElRBCJBF+TKEgRFdD4koIIToQry1JygUlROejVAweo1QMQnQtVFtPiORAqRiEEMInyJIkRNdDliuPkeVKCCGECB6yXAkhhBBCdBISV0IIIYQQhkhcCSGECATK4SWCgsSVEEKIQKAcXiIoSFwJIYQIBFp5KYKCxJUQQgghhCESV0IIIQKB3IIiKEhcCSGECARyC4qgoCSiHqMkokIIIUTwUBJRIYQQQohOQuJKCCFEIJj93DYMml+E2c9t6+yhCBETiSshhBCBoGj7YTQ6zW2yoQSpyYXElRBCiEAweUQ+UkPNbTSCKlK0EjK5kLgSQogkIqjigmH11Ivw8bLJWD31oqh9gipStBIyuZC4EkKIJGJl8UcoK6/ByuKPOnsonYKVSPFapN58aX9snnc5br60vyf7Ex2LxJUQQnQyXk/kXu4vqCIlqBYw4Q8kroQQogNhxIXlRH7vpKHom5eJeycN9WR/8fBapFiJObnpRCJIXAkhRAfCiAvLiZyx3HgpHLwWKVZiTm46kQjK0O4xytAuRNfi2bcP4JGSjzFr/KCkm6j9eGx+HJNIDk5l/pa48hiJKyFEslCw/A2Uldegb14mNs+7vLOHI0SHovI3QgghOhzG5RfUYH12O8mc+kK0H4krIYQQ7YKJS/I6NYTV/tjYra6e+kJERuJKCCFE0lDX0Ohq24vXgfiygCUXEldCCCE6DCY1hCUZaamutqOxOj7l1UouJK6EEEJ0GF6nNAiq2FFereRC4koIIYQ4CVbsMCJMLr+uh8SVEEIEAE3QHFYWp3f3HcPRihq8u+9YzH6MCGPGJLdgciFxJYQQASCok6/XKQ2s0kMUbT+MRqe5jYVVRny5BZMLJRH1GCURFUK0B8vM415mMWcTjXqZkHTk4vUor6lHXmY63l84MWKf2c9tQ9H2w5g8Ih+rp14UdVvKCN91UBJRIYRIMiwDw720grEWGSvLjZUFbPXUi/DxsskxhRUQXIui6FgkroQQIgBYxlx5KWRYUWglHhmxY5kewm+iUPgDuQU9Rm5BIZIHP7rXvMSPY7K6Jqxb0Ao/nkvhRm5BIYSIgqWFwMvSJ34MePbjmKwsYGxAOwNzz/nxXIr2I3ElhEgamEksqLXgvE7GGVSsxPPkEflIDTW3icLcc7q+yYXElRAiafA6uNjr0i5+w4/B3FZjYgPaGarrGlytSH4kroQQSQPjWrEURF3d2uBHV5Yfx1Tf5LhakfwooN1jFNAuRPKgHEfJi+W1HbZgHWrqm5CZnoLdS642GqHwGgW0CyGEBwQ1fsuKoKYPmP3cNgyaX4TZz22L2sfS5Xn/5PPQNy8T908+L+FtiWAgcSWEEKJdeB1zZSXmmJWAlu7Fru4+7opIXAkhOp2gWkC6ekC71/FNVmKOWQloKYiCen+L9pMU4spxHLz88suYMGEC+vTpg8997nMYOnQoZs6cib1797bpX1lZiTlz5qB///7IyMjAgAEDcN9996Gqqiri9puamrBmzRoMHz4cmZmZ6NWrF6ZOnRpx20KIU8ePq84YvLRI+HGC9toi88X+n0dqqLlNBGYloGXB6aD
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# matplotlib\n",
"#用matplotlib画出温度“temp”和自行车租赁数“count”之间的散点图\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"plt.rcParams['font.size'] = 14\n",
"plt.xlabel('temp')\n",
"plt.ylabel('count')\n",
"plt.scatter(bikes['temp'], bikes['count'],s=1)\n"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<seaborn.axisgrid.FacetGrid at 0x7f5f9312e200>"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA9EAAAHeCAYAAACc+YiPAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy89olMNAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOz9eZBk6Xmfhz7f2XPPWrur1+kezILBYCNWemAsEgI0QV/KkkVTZIjXUgQoSgqQYZKWSTqMICWYFslQEJJA2TI4YctXDEWACpKW5JCEkUQsFEwQgAYAsQxmQff0MlXVXVuuJ8/+3T9OZXVVV2VWTWVWZ3bP+0QgBp3rWb48dX7v8nuV1lojCIIgCIIgCIIgCMKhGJPeAEEQBEEQBEEQBEG4XxARLQiCIAiCIAiCIAhHRES0IAiCIAiCIAiCIBwREdGCIAiCIAiCIAiCcERERAuCIAiCIAiCIAjCERERLQiCIAiCIAiCIAhHRES0IAiCIAiCIAiCIBwREdGCIAiCIAiCIAiCcERERN9jtNa0Wi201pPeFEEQBEEQBEEQBOFVIiL6HtNut6nVarTb7UlviiAIgiAIgiAIgvAqEREtCIIgCIIgCIIgCEdERLQgCIIgCIIgCIIgHJGpFNG/8zu/w0/91E/x9re/Hdd1UUrxT/7JPxn4+larxc/93M9x8eJFXNfloYce4m/9rb9Fp9M58PVZlvHJT36SN77xjRQKBRYWFvixH/sxrly5MvA7PvOZz/C+972PSqVCtVrlAx/4AP/hP/yHUXdVEARBEARBEARBuI9Qegodrh566CGuXbvG/Pw8pVKJa9eu8X/+n/8nf+Wv/JV9r+12u7znPe/h61//Oh/60Id461vfyte+9jWeeeYZ3vGOd/CFL3wBz/P2vOcnf/Inefrpp3nDG97AD/3QD7G8vMzv/u7vUi6X+dKXvsQjjzyy5/W/8zu/w0/8xE+wsLDAj/7ojwLw6U9/mvX1dX73d3+Xv/gX/+KR963ValGr1Wg2m1Sr1Vd/cARBEARBEARBEISJMZWZ6KeffpqXX36ZtbU1/vpf/+tDX/sbv/EbfP3rX+cXfuEX+MxnPsOv/dqv8ZnPfIZf+IVf4Ctf+Qqf+MQn9rz+s5/9LE8//TTvfe97efbZZ/n1X/91/uk//af83//3/83m5iYf/ehH97x+a2uLn/7pn2Z+fp5nn32WT37yk3zyk5/k2WefZW5ujr/xN/6GmIQJgiAIgiAIgiC8RphKEf3BD36QixcvHvo6rTVPP/005XKZj33sY3ue+9jHPka5XObpp5/e8/hv//ZvA/Dxj38cx3F2Hv/BH/xB3v/+9/PMM89w/fr1ncf/+T//5zQaDX76p3+ac+fO7Tx+7tw5PvrRj7K+vs4f/MEfHGs/BUEQBEEQBEEQhPuLqRTRR+XFF19keXmZp556ilKptOe5UqnEU089xZUrV7hx48bO45/73Od2nrubH/iBHwDg85///J7XA3zoQx860usFQRAEQRAEQRCEB5f7XkQD+3qY+/Qf77+u2+2ysrLCpUuXME3z0Ncf9h0HvV4QBEEQBEEQBEF4cLEmvQGj0Gw2AajVagc+3zfu6r/u1b7+sPcc9Pq7CcOQMAx3/t1qtQa+VhAEQRAEQRAEQZhu7utM9P3A3/27f5darbbzv/Pnz096kwRBEARBEARBEIRjcl+L6H52eFAmuJ/17b/u1b7+sPcc9Pq7+aVf+iWazebO/3b3ZwuCIAiCIAiCIAj3F/e1iD6sJ/nufuZSqcTS0hJXr14lTdNDX3/YdxzWkw3gui7VanXP/wRBEARBEARBEIT7k/teRJ85c4YvfvGLdLvdPc91u12++MUvcunSpT0l1O973/t2nrubz3zmMwC8973v3fN6gGeeeWbg6/uvEQRBEARBeJDQWtMJExp+RCdM0FpPepMEQRAmzn0topVSfOQjH6HT6fDxj398z3Mf//jH6XQ6/ORP/uSex//aX/trQD5HOoqincf/zb/5N3zuc5/jQx/60J4Z1f/Nf/PfUKvV+OQnP8nNmzd3Hr958ya/9Vu/xfz8PH/+z//5k9g9QRAEQRCEidHsxXzzlSZfvbrJV1/e4qtXN/nmK02avXjSmyYIgjBRlJ7CkOLTTz/Nf/yP/xGAb37zmzz77LM89dRTvO51rwPgPe95Dx/5yEeAPOP81FNP8Y1vfIMPfehDfN/3fR/PPvsszzzzDO94xzv4/Oc/T6FQ2PP5P/mTP8nTTz/NG97wBn7oh36IlZUVPv3pT1Mul/njP/5jHn300T2v/53f+R1+4id+goWFBX70R38UgE9/+tOsr6/z6U9/mh/5kR858r61Wi1qtRrNZlNKuwVBEARBmEqavZhvvdKkGybMFB0cyyBKMrb8iJJr8eTZGrWCPenNFARBmAhTKaL/yl/5K/xf/9f/NfD5//a//W/5J//kn+z8u9ls8iu/8iv83u/9HqurqywtLfEjP/Ij/PIv/zKVSmXf+7Ms47d+67f41Kc+xUsvvUS5XOaDH/wgv/qrv8rDDz984Hf+23/7b/lf/pf/hWeffRalFG9729v4n/6n/4kPfvCDr2rfREQLgiAIgjDNaK355itNVpsBS7XCvudXmj2W6h5PnqmhlLon29ONUpI0wzINSo55T75XEARhEFMpoh9kREQLgiAIgnBUJiEgO2HCV69uUnItPNvc93wQp3TDhLdfmqXsWie6Lc1ezLWNLpudiCTTWIZituxwca4kmXBBECbGyV75BEEQBEEQhGMxKQGZpBlJpnGsg61zbNMgyTRJmp3YNsDgkvLVZkA7SKSkXBCEiXFfG4sJgiAIgiA8iPQF5GozoORaLFRcSq7FajPgWyds7mWZBpahiJKDRXKcZliGwjJP7jZSa821jS7dMGGpVsCzTQyl8GyTpVqBbphwfbMrbuGCIEwEEdGCIAiCIAhTxKQFZMkxmS07bPnRgc9v+RFzFYeSs7/Ue1x0o5TNTsRM0Tnw+Zmiw0Y7ohulJ7YNgiAIgxARLQiCIAiCMEVMWkAqpbg4V6LkWqw0ewRxSpppgjhlpdmj5FpcmC2daG/2tJSUC4IgHIT0RAuCIAiCIEwR0yAgawWbJ8/W9vVkL9U9LsyevKnX7pLyg8zN7kVJuSAIwiBERAuCIAiCIEwR0yIgawWbN56tTWS8VL+kfNCYrS0/YqnunWhJuSAIwiBERAuCIAiCIBzApOYTT5OAVEqd+BirQd97ca5EO0hYafaYKTrYpkGcZmz50T0pKRcEQRiEiGhBEARBEIS7mOR8YhGQOZMuKRcEQRiE0jIb4J7SarWo1Wo0m02q1eqkN0cQBEEQToxJZXJHZdB84r6AvVfziQ8S8nMV5zUnIO/XdSQIwoOLZKIFQRAEQRg7k8zkjsLd46X69MdLrTR7XN/s8uSZ2okLuUn2JE8TkyopFwRBGIRckQRBEARBGCuDMrmrzYB2kNyzTO5xeDXjpe6FsBMBKQiCMH3IXABBEARBEMbG3ZlczzYxlNrJ5HbDhOubXaa1m2waxksJgiAI042IaEEQBEEQxsaryeROI7vHSx2EzCcWBEEQ5C+AIAiCIAhj437P5PbHS2350YHPb/kRcxVH5hMLgiC8hhERLQiCIAjC2LjfM7n98VIl12Kl2SOIU9JME8QpK83ea2a8lCAIgjCY6fwLJgiCIAjCfcmDkMntzyc+XfPohgnrnTDv8a57U22K9qCitaYTJjT8iE6YTG0/vSAIrx3E7lEQBEEQhLHRz+S2g4SVZo+ZooNtGsTpnTnL90MmV8ZLTQf366g0QRAebEREC4IgCIIwVvqZ3LvFz1Ld48Ls/SN+ZLzUZLmfR6UJgvBgI38ZBEEQBEEYO5LJFUZh96i001WPXpzSDRNMQ3G66rHaCri+2eXJMzVZU4Ig3HNERAuCIAiCcCJIJlc4Lv1RaY5p8L21Dk0/JtEaSylqRXvPqDRZY4Ig3GvkqiMIgiAIgiBMFUma0QpiGr2YIMqoFqyd3vq1dkQrSJgp2lM7Kk0QhAcbEdGCIAiCIAjCVGEaio1OhB8lnKkXdx53LZOFislyw0dnGtO
"text/plain": [
"<Figure size 1000x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# seaborn\n",
"import seaborn as sns\n",
"sns.lmplot(x='temp', y='count', data=bikes, aspect=2, scatter_kws={'alpha':0.2})\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第三步:一元线性回归\n",
"\n",
"用温度预测自行车租赁数"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [],
"source": [
"# create X and y\n",
"X_train=bikes[['temp']]\n",
"Y_train=bikes['count']"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [],
"source": [
"# import, instantiate, fit\n",
"from sklearn.linear_model import LinearRegression\n",
"LR=LinearRegression()\n",
"LR=LR.fit(X_train,Y_train)\n",
"Y_pred=LR.predict(X_train)"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 96.2843313 88.7644881 88.7644881 ... 133.88354727 133.88354727\n",
" 126.36370408]\n"
]
}
],
"source": [
"# print the coefficients\n",
"print(Y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 第四步:探索多个特征"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {},
"outputs": [],
"source": [
"# explore more features\n",
"feature_cols = ['temp', 'season', 'weather', 'humidity']"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/ir/dev/justhomework/AIandML/.venv/lib/python3.10/site-packages/seaborn/axisgrid.py:2095: UserWarning: The `size` parameter has been renamed to `height`; please update your code.\n",
" warnings.warn(msg, UserWarning)\n"
]
},
{
"data": {
"text/plain": [
"<seaborn.axisgrid.PairGrid at 0x7f5f930e11b0>"
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAACJMAAAKuCAYAAAAfXc8KAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy89olMNAAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOzde5xT9Z038E/ut5nMjBmHQstlasaiMMC4aFszYKFXKyrUR7e0u48w2nZXkWdrd4s31IJStVp3xXZ3W0Xt08rT3SJYqa2uohZot7VlWi5iyygVulBghplkkkxycnv+GBJzOTnnJHOSk5x83q9XX686J2TOOZPL93x/3/P9GlKpVApERERERERERERERERERERERACMWu8AEREREREREREREREREREREdUOFpMQERERERERERERERERERERUQaLSYiIiIiIiIiIiIiIiIiIiIgog8UkRERERERERERERERERERERJTBYhIiIiIiIiIiIiIiIiIiIiIiymAxCRERERERERERERERERERERFlsJiEiIiIiIiIiIiIiIiIiIiIiDJYTFJlqVQKgUAAqVRK610hIiKiOsDYgYiIiErB2IGIiIhKwdiBiIiIimExSZWNjo6ipaUFo6OjWu8KERER1QHGDkRERFQKxg5ERERUCsYOREREVAyLSYiIiIiIiIiIiIiIiIiIiIgog8UkRERERERERERERERERERERJTBYhIiIiIiIiIiIiIiIiIiIiIiymAxCRERERERERERERERERERERFlsJiEiIiIiIiIiIiIiIiIiIiIiDJYTEJEREREREREREREREREREREGSwmISIiIiIiIiIiIiIiIiIiIqIMFpMQERERERERERERERERERERUUZNFpN8//vfx5e+9CXMnz8fNpsNBoMBTz75ZNHHBwIB3HzzzZg+fTpsNhtmzJiBf/qnf0IwGBR9fDKZxMaNG9Hd3Q2Hw4Gzzz4by5cvx9tvv130d7zwwgu45JJL0NzcDLfbjUWLFuHll1+e6KESERERERERERERERERERER1ZSaLCa544478J3vfAfvvPMOJk+eLPnYUCiESy65BA8//DBmzpyJL3/5y/jABz6ABx98EIsXL0YkEin4N1/60pewevVqpFIprF69Gp/61KfwzDPP4MILL8ShQ4cKHv/9738fn/rUp3Dw4EGsWLEC1157LQ4cOICPf/zj+NGPfqTacRMRERERERERERERERERERFprSaLSR577DH86U9/wqlTp/B3f/d3ko994IEH8Lvf/Q5r1qzBCy+8gPvuuw8vvPAC1qxZg9dffx0PP/xwzuNfeeUVPPbYY1i4cCH27NmD+++/H//3//5fbNu2DadPn8aqVatyHj88PIybbroJ7e3t2LNnDzZu3IiNGzdiz5498Hg8+Pu//3uMjo6qfg6IiIiIiIiIiIiIiIiIiIiItFCTxSQf+9jHMH36dNnHpVIpPPbYY2hqasLatWtztq1duxZNTU147LHHcn7+3e9+FwCwfv16WK3WzM8vvfRSfOQjH8GLL76II0eOZH7+n//5nxgZGcFNN92E973vfZmfv+9978OqVaswODiIrVu3lnWcRERERERERERERERERERERLWmJotJlDp06BCOHTsGn88Hl8uVs83lcsHn8+Htt9/G0aNHMz9/9dVXM9vyffKTnwQAvPbaazmPB4BPfOITih5PREREREREREREREREREREVM/MWu/ARBw6dAgA0NXVJbq9q6sLL7zwAg4dOoSpU6ciFArh+PHjmD17Nkwmk+jjs59X7neIPT5fNBpFNBrN/HcgEJA7LCIiImpgjB2IiIioFIwdiIiIqBSMHYiIiEipuu5M4vf7AQAtLS2i291ud87jSn283L8Re3y+r3/962hpacn8b+rUqcUPiIiIiBoeYwciIiIqBWMHIiIiKgVjByIiIlKqrotJ6sGtt94Kv9+f+V/2yB0iIiKifIwdiIiIqBSMHYiIiKgUjB2IiIhIqboec5PuFlKsM0i6PVv6caU+Pv/feDwe2cfns9lssNls0gdCREREdAZjByIiIioFYwciIiIqBWMHIiIiUqquO5N0dXUBAA4dOiS6Pf3z9ONcLhcmT56Mw4cPI5FIyD5e7neIPZ6IiIiIiIiIiIiIiIiIiIiontV9McmUKVOwe/duhEKhnG2hUAi7d+9GZ2dnzsy/Sy65JLMt3wsvvAAAWLhwYc7jAeDFF18s+vj0Y4iIiIiIiIiIiIiIiIiIiIjqXV0XkxgMBlx//fUIBoNYv359zrb169cjGAziC1/4Qs7Pv/jFLwIA1q5dC0EQMj//6U9/ildffRWf+MQnMH369MzPr7nmGrS0tGDjxo3485//nPn5n//8Zzz66KNob2/HsmXLKnF4RERERERERERERERERERERFVnSKVSKa13It9jjz2GXbt2AQD27duHPXv2wOfzwev1AgB6e3tx/fXXAxjvQOLz+fD73/8en/jEJ3DBBRdgz549ePHFF3HhhRfitddeg8PhyHn+L3zhC3jssccwa9YsXHbZZTh+/Dh++MMfoqmpCb/85S9x7rnn5jz++9//Pv72b/8WZ599Nv76r/8aAPDDH/4Qg4OD+OEPf4irr75a8bEFAgG0tLTA7/fD7XaXfY6IiIioMTB2ICIiolIwdiAiIqJSMHYgIiKiYmqymGTFihV46qmnim6/9tpr8eSTT2b+2+/34+6778aWLVvwl7/8BZMnT8bVV1+Nu+66C83NzQX/PplM4tFHH8V3vvMdDAwMoKmpCR/72Mdw77334pxzzhH9nT/72c+wYcMG7NmzBwaDAX/1V3+FO+64Ax/72MdKOjYGZkRERFQKxg5ERERUCsYOREREVArGDkRERFRMTRaT6BkDMyIiIioFYwciIiIqBWMHIiIiKgVjByIiIirGrPUOEBER+cMCBoMCApEY3A4L2l1WtDitWu8WERER1RnGFESVw/cXEREREVHlMN4molrEYhIiItLUsZExrNmyFzsPDWZ+trCrHfddNQdTWh0a7hkRERHVE8YURJXD9xcRERERUeUw3iaiWmXUegeIiKhx+cNCQZAMAD8/NIhbtuyFPyxotGdERERUTxhTEFUO319ERERERJXDeJuIahmLSYiISDODQaEgSE77+aFBDAYZKBMREZE8xhRElcP3FxERERFR5TDeJqJaxjE3RESkmUAkJrl9VGa7GM6WJCIiajyViCmIaBzfX0RERERULuZq5THeJqJaxmISIiIqixoXAm67RXJ7s8z2fJwtSURE1JjUjimI6F18fxERERFROWolV1vrBS2Mt4molrGYhIiISqbWhUB7kxULu9rxc5E2fgu72tHepDyol5stuXF5T01dJBAREZF6muxm9Ho92DUwVLCt1+tBk52XvkTl4vuLiIiIiEpVK7naWilokaJmjpyISG1GrXeAiIjqi9yFgD+sfIZji9OK+66ag4Vd7Tk/X9jVjvuvmlPSBQVnSxIRETWu0UgMK3yd8Hk9OT/3eT1Y4etkW2CiCeD7i4iIiIhKVQu5WjXz2JWkZo6ciEhtvH2EiIhKouRCoJQAd0qrAxuX92AwKGA0EkOz3YL2ptJbDXK2JBERUeMaDsewenM/+no70efrRDSehM1sRP/REaze3I/v9V2k9S4S1S2+v4iIiIioVLWQq1U7j11JauXIiYjUxmISIiIqSSUuBFqcEw+MOVuSiIiocbmsJoSFBB7dMSC63Wk1VXmPiPSD7y8iIiIiKlUt5GproaClFGrkyImI1MYxN0REVJJauBAQk54tKYazJYmIiPTNZTUXjOBI83k9cFl5HwVRufj+IiIiIqJS1UKutlbz2ERE9YTFJEREVJJauBAQw9mSREREjavVacFNi7sKFrx9Xg9uWtyFVieThETl4vuLiIiIiEpVC7naWs1jExHVE0MqlUppvRONJBAIoKWlBX6/H263W+vdISIqy7GRMdyyZS9+njVzMn0hMLnVoeGeAf6wwNmSpCuMHYiIlDk+MoZX/3gKHc02RONJ2MxGnByNYtG5Z+M9GscnRNVUidiB7y8
"text/plain": [
"<Figure size 2240x700 with 4 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# using seaborn, draw multiple scatter plots between each feature in feature_cols and 'count'\n",
"sns.pairplot(bikes, x_vars=feature_cols, y_vars='count', size=7, aspect=0.8)"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>season</th>\n",
" <th>holiday</th>\n",
" <th>workingday</th>\n",
" <th>weather</th>\n",
" <th>temp</th>\n",
" <th>atemp</th>\n",
" <th>humidity</th>\n",
" <th>windspeed</th>\n",
" <th>casual</th>\n",
" <th>registered</th>\n",
" <th>count</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>season</th>\n",
" <td>1.000000</td>\n",
" <td>0.029368</td>\n",
" <td>-0.008126</td>\n",
" <td>0.008879</td>\n",
" <td>0.258689</td>\n",
" <td>0.264744</td>\n",
" <td>0.190610</td>\n",
" <td>-0.147121</td>\n",
" <td>0.096758</td>\n",
" <td>0.164011</td>\n",
" <td>0.163439</td>\n",
" </tr>\n",
" <tr>\n",
" <th>holiday</th>\n",
" <td>0.029368</td>\n",
" <td>1.000000</td>\n",
" <td>-0.250491</td>\n",
" <td>-0.007074</td>\n",
" <td>0.000295</td>\n",
" <td>-0.005215</td>\n",
" <td>0.001929</td>\n",
" <td>0.008409</td>\n",
" <td>0.043799</td>\n",
" <td>-0.020956</td>\n",
" <td>-0.005393</td>\n",
" </tr>\n",
" <tr>\n",
" <th>workingday</th>\n",
" <td>-0.008126</td>\n",
" <td>-0.250491</td>\n",
" <td>1.000000</td>\n",
" <td>0.033772</td>\n",
" <td>0.029966</td>\n",
" <td>0.024660</td>\n",
" <td>-0.010880</td>\n",
" <td>0.013373</td>\n",
" <td>-0.319111</td>\n",
" <td>0.119460</td>\n",
" <td>0.011594</td>\n",
" </tr>\n",
" <tr>\n",
" <th>weather</th>\n",
" <td>0.008879</td>\n",
" <td>-0.007074</td>\n",
" <td>0.033772</td>\n",
" <td>1.000000</td>\n",
" <td>-0.055035</td>\n",
" <td>-0.055376</td>\n",
" <td>0.406244</td>\n",
" <td>0.007261</td>\n",
" <td>-0.135918</td>\n",
" <td>-0.109340</td>\n",
" <td>-0.128655</td>\n",
" </tr>\n",
" <tr>\n",
" <th>temp</th>\n",
" <td>0.258689</td>\n",
" <td>0.000295</td>\n",
" <td>0.029966</td>\n",
" <td>-0.055035</td>\n",
" <td>1.000000</td>\n",
" <td>0.984948</td>\n",
" <td>-0.064949</td>\n",
" <td>-0.017852</td>\n",
" <td>0.467097</td>\n",
" <td>0.318571</td>\n",
" <td>0.394454</td>\n",
" </tr>\n",
" <tr>\n",
" <th>atemp</th>\n",
" <td>0.264744</td>\n",
" <td>-0.005215</td>\n",
" <td>0.024660</td>\n",
" <td>-0.055376</td>\n",
" <td>0.984948</td>\n",
" <td>1.000000</td>\n",
" <td>-0.043536</td>\n",
" <td>-0.057473</td>\n",
" <td>0.462067</td>\n",
" <td>0.314635</td>\n",
" <td>0.389784</td>\n",
" </tr>\n",
" <tr>\n",
" <th>humidity</th>\n",
" <td>0.190610</td>\n",
" <td>0.001929</td>\n",
" <td>-0.010880</td>\n",
" <td>0.406244</td>\n",
" <td>-0.064949</td>\n",
" <td>-0.043536</td>\n",
" <td>1.000000</td>\n",
" <td>-0.318607</td>\n",
" <td>-0.348187</td>\n",
" <td>-0.265458</td>\n",
" <td>-0.317371</td>\n",
" </tr>\n",
" <tr>\n",
" <th>windspeed</th>\n",
" <td>-0.147121</td>\n",
" <td>0.008409</td>\n",
" <td>0.013373</td>\n",
" <td>0.007261</td>\n",
" <td>-0.017852</td>\n",
" <td>-0.057473</td>\n",
" <td>-0.318607</td>\n",
" <td>1.000000</td>\n",
" <td>0.092276</td>\n",
" <td>0.091052</td>\n",
" <td>0.101369</td>\n",
" </tr>\n",
" <tr>\n",
" <th>casual</th>\n",
" <td>0.096758</td>\n",
" <td>0.043799</td>\n",
" <td>-0.319111</td>\n",
" <td>-0.135918</td>\n",
" <td>0.467097</td>\n",
" <td>0.462067</td>\n",
" <td>-0.348187</td>\n",
" <td>0.092276</td>\n",
" <td>1.000000</td>\n",
" <td>0.497250</td>\n",
" <td>0.690414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>registered</th>\n",
" <td>0.164011</td>\n",
" <td>-0.020956</td>\n",
" <td>0.119460</td>\n",
" <td>-0.109340</td>\n",
" <td>0.318571</td>\n",
" <td>0.314635</td>\n",
" <td>-0.265458</td>\n",
" <td>0.091052</td>\n",
" <td>0.497250</td>\n",
" <td>1.000000</td>\n",
" <td>0.970948</td>\n",
" </tr>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>0.163439</td>\n",
" <td>-0.005393</td>\n",
" <td>0.011594</td>\n",
" <td>-0.128655</td>\n",
" <td>0.394454</td>\n",
" <td>0.389784</td>\n",
" <td>-0.317371</td>\n",
" <td>0.101369</td>\n",
" <td>0.690414</td>\n",
" <td>0.970948</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" season holiday workingday weather temp atemp \\\n",
"season 1.000000 0.029368 -0.008126 0.008879 0.258689 0.264744 \n",
"holiday 0.029368 1.000000 -0.250491 -0.007074 0.000295 -0.005215 \n",
"workingday -0.008126 -0.250491 1.000000 0.033772 0.029966 0.024660 \n",
"weather 0.008879 -0.007074 0.033772 1.000000 -0.055035 -0.055376 \n",
"temp 0.258689 0.000295 0.029966 -0.055035 1.000000 0.984948 \n",
"atemp 0.264744 -0.005215 0.024660 -0.055376 0.984948 1.000000 \n",
"humidity 0.190610 0.001929 -0.010880 0.406244 -0.064949 -0.043536 \n",
"windspeed -0.147121 0.008409 0.013373 0.007261 -0.017852 -0.057473 \n",
"casual 0.096758 0.043799 -0.319111 -0.135918 0.467097 0.462067 \n",
"registered 0.164011 -0.020956 0.119460 -0.109340 0.318571 0.314635 \n",
"count 0.163439 -0.005393 0.011594 -0.128655 0.394454 0.389784 \n",
"\n",
" humidity windspeed casual registered count \n",
"season 0.190610 -0.147121 0.096758 0.164011 0.163439 \n",
"holiday 0.001929 0.008409 0.043799 -0.020956 -0.005393 \n",
"workingday -0.010880 0.013373 -0.319111 0.119460 0.011594 \n",
"weather 0.406244 0.007261 -0.135918 -0.109340 -0.128655 \n",
"temp -0.064949 -0.017852 0.467097 0.318571 0.394454 \n",
"atemp -0.043536 -0.057473 0.462067 0.314635 0.389784 \n",
"humidity 1.000000 -0.318607 -0.348187 -0.265458 -0.317371 \n",
"windspeed -0.318607 1.000000 0.092276 0.091052 0.101369 \n",
"casual -0.348187 0.092276 1.000000 0.497250 0.690414 \n",
"registered -0.265458 0.091052 0.497250 1.000000 0.970948 \n",
"count -0.317371 0.101369 0.690414 0.970948 1.000000 "
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# correlation matrix (ranges from 1 to -1)\n",
"bikes.corr()"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot: >"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAoAAAAIICAYAAAD+Ll+WAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy89olMNAAAACXBIWXMAAA9hAAAPYQGoP6dpAAC260lEQVR4nOzdd3xN9/8H8Ne52TIFicRI0KjYe8uwokZRW4sYQVu0pFVJjURLULPaqlpBK3ZTvmrLIHaFEnvEiIggg5AhOb8/8sut495M5w7yevZxHo/6nM/9nPe9InnnMwVRFEUQERERUamh0HUARERERKRdTACJiIiIShkmgERERESlDBNAIiIiolKGCSARERFRKcMEkIiIiKiUYQJIREREVMowASQiIiIqZZgAEhEREZUyTACJiIiIShkmgERERERq/P777xgzZgyaNm0KExMTCIKA4ODgYreTk5ODpUuXol69ejAzM0OFChUwaNAg3Lx5U/6gi4gJIBEREZEaU6dOxW+//Ybbt2/DwcGhxO2MGTMGEyZMgCiKmDBhArp06YLt27ejWbNmuHbtmowRFx0TQCIiIiI1Vq5cidjYWCQmJmLs2LElaiMsLAwrV66Em5sbzpw5g7lz52L9+vUIDQ3FkydPMG7cOJmjLhpDnTyViIiISM917NjxjdtYsWIFAOC7776DsbGxsvyDDz6Ah4cH9u3bhzt37qBq1apv/KziYA8gERERkYaEh4fD3Nwcbdq0Ubnn5eUFAIiIiNB2WOwBJCIiotIhIyMDGRkZkjITExOYmJho5HlpaWmIj49H3bp1YWBgoHLfxcUFAHQyD5AJIKnIeqS7VUnq+Db103UIEs+QresQJIwg6DoEid4vVL/J6ZK+DXM8MNSvb7vLEKfrEFR0NXTUdQgSdtn69W9M34y597vGnyHXz6Wgn9YhMDBQUjZjxgwEBATI0v7rUlJSAADW1tZq71tZWUnqaZN+fSciIiIi0hA/Pz9MmjRJUqap3j99xwSQiIiI9FuOPCMvmhzuVSev5y+/Hr7U1FRJPW1iAkhERET6TczRdQQlYm5uDgcHB9y6dQvZ2dkq8wDz5v7lzQXUJn2bHkNERET0znB3d0daWhqioqJU7u3duxcA4Obmpu2wmAASERGRnsvJkefSoEePHuHy5ct49OiRpHz06NEAgGnTpiEzM1NZvnv3boSHh6Nz585wcnLSaGzqcAiYiIiI9JqooyHglStX4siRIwCA8+fPK8vCw8MBAG3btsWoUaMAAD/99BMCAwNVVhV7enpi1KhRWLlyJRo3boxu3bohPj4emzZtgq2tLZYuXarV95SHCSARERGRGkeOHMHatWslZVFRUZLh3LwEsCDLly9HvXr18Ntvv2HJkiWwsLBA7969MWvWLNSoUUP2uItCEEVR1MmTSW9xH8CCcR/AgnEfwIJxH8DCcR/At4s29gHMvHdelnaMK9eTpZ13gX59JyIiIiJ63Vu6Clif6dsvx0RERESkYewBJCIiIv0m00bQ9J9S0QO4bds2uLu7w87ODqampnB0dETHjh2xbds2Sb1///0XAwcOhIODA4yNjeHk5ITx48fj8ePHKm2uXr0aPXv2hLOzM0xNTWFrawsvLy+EhYW9UQwAsHPnTnh6esLa2hpmZmZo0KABFi5ciJcvX0rqxcbGQhAEeHt74/r16+jduzfKli0Lc3NzdOzYEefOnXuDT42IiEhPiDnyXKT0zvcALlu2DJ999hkcHBzQu3dvlCtXDg8ePMDJkyfx559/ok+fPgCAHTt2oH///lAoFOjZsyeqVKmCixcv4qeffsLevXtx4sQJlC1bVtnu559/jgYNGqBjx46oUKEC4uLiEBoaio4dO2L79u3o2bNnsWMAgIULF8LX1xe2trYYPHgwzM3NsWPHDvj6+uLw4cPYvn07BEE6ITk2NhYtW7ZEnTp1MGLECNy4cQN//fUXPD09cenSJdjb22v4UyYiItIgDe/hVxq986uAmzRpggsXLuDu3buws7OT3Hv8+DHKlSuHx48fo3r16rC0tERUVJRkQ8aNGzdi0KBBGDdunGSvnlu3bqFatWqS9uLj49G0aVOYm5vj6tWrxYoBAG7cuIFatWrB1tYWp0+fRpUqVQAAGRkZ6NixI44cOYJ169ZhyJAhAHITv7wY5syZg2+++UbZ7rRp0/D9998jKCgIU6ZMKdZnxlXABeMq4IJxFXDBuAq4cFwF/HbRyirgmydlace4enNZ2nkX6Nv3Ro0wMjKCkZGRSnle4rVu3TqkpqYiKChIZTfugQMHonHjxti4caOk/PXkDwAcHBzQp08fXLt2Dbdv3y5WDACwYcMGvHz5Er6+vsrkD8g9vHru3LkAgODgYJU2qlWrhq+//lpSNnLkSADAqVOnVOoTERG9TUQxR5aL/qNfv4pqwMCBAzF58mTUrVsXgwcPhqenJ9q2bQsrKytlnePHjwMATpw4gRs3bqi0kZ6ejkePHuHRo0coX748AODmzZsICgrCoUOHEBcXh4yMDMlr7t+/r0wmixIDAERHRwMAPDw8VGJo1aoVTE1NcfbsWZV7DRs2hEIhzeUrV64MAEhOTi7g08ntXXw9dkVGBkxMTAp8HRERkdZwCFh273wC+NVXX6FcuXJYtmwZFixYgPnz58PQ0BDdunXDokWLUK1aNTx58gQA8PPPPxfYVlpaGsqXL4/r16+jefPmSE1NhaenJ3r06AErKysoFAqEh4cjIiJCklQVJQYASE1NBQC1c/YEQYC9vT3i4lSHa15PJAHA8P+HmbKzCx6uDAoKQmBgoKRs6tcTMH3yFwW+joiIiN5e73wCKAgCRowYgREjRuDx48c4fPgwQkJCsHnzZly7dg3//vuvMoE6f/486tatW2ibixYtQlJSEtavX49PPvlEcm/s2LGIiIgodgwGBgbKOBISElSGokVRREJCgtpk7034+flh0qRJkjLFU/2bE0RERKUYh29lVyrmAOYpV64cevXqhU2bNqF9+/a4ePEirl+/jhYtWgAAjh07VqR28oaJX13pC+Qmaa+eD1icGACgUaNGAKA8ZPpVJ06cQHp6Oho2bFikGIvKxMQEVlZWkovDv0REpFdysuW5SOmdTwDDw8Px+kLnrKws5bCvqakphg8fDktLS3z77beIiYlRaeP58+fKeYIAlL1zR44ckdSbM2cOLly4UKIYAGDw4MEwNDTEwoULcf/+fWXdzMxM5Qpfb2/vIr1vIiIiovy880PAvXr1gpWVFVq2bAknJydkZWVh//79uHjxIvr27atM5kJCQtCvXz80aNAAXbp0Qa1atZCRkYHY2FhERESgdevW2LNnD4DcYd41a9agT58+6N+/P8qVK4fjx4/jzJkz6NatG3bt2lWiGGrUqIG5c+fC19cX9evXR//+/WFubo6dO3fiypUr6Nmzp8qQMxER0TuPQ8Cye+cTwKCgIOzZswcnT57Ezp07YW5ujho1amDZsmXKrVIAoFu3boiOjsYPP/yAAwcOYP/+/TA3N0flypUxfPhwSeLVqFEj7Nu3D1OnTsX27dthYGCA1q1bIyoqCjt27FBJAIsaAwBMmjQJ7733HhYuXIjff/8dmZmZqFmzJhYsWIAJEyaobAJNRET0zuMqYNm98xtBU/FxI+iCcSPognEj6IJxI+jCcSPot4s2NoLOiDkoSzsmdTrI0s67QL++ExERERG9jkPAsmMCSERERPqNQ8CyYwJIREREek0U9WvqzbtA36bHEBEREZGGsQeQiIiI9BvnAMqOCSARERHpN84BlB2HgImIiIhKGfYAEhERkX7jELDsmAASERGRfsvhKmC5cQiYiIiIqJRhDyCp0Lej1xacDtJ1CBLVan6o6xAkOlvV0nUIEtvM9Os3dRH6ddqloGdHCdZHeV2HoCJJzz6j64ZZug5BQt+OfxyjjYdwCFh2TACJiIhIv3EVsOw4BExERERUyrAHkIiIiPQbh4BlxwSQiIiI9BuHgGXHBJCIiIj0GxNA2XEOIBEREVEp884mgOHh4RAEAQEBARpp39nZGc7OzpKy4OBgCIKA4ODgN2qHiIiI/iO
"text/plain": [
"<Figure size 640x480 with 2 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"sns.heatmap(bikes.corr())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 用'temp', 'season', 'weather', 'humidity'四个特征预测单车租赁数'count'"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [],
"source": [
"# create X and y\n",
"feature_cols = ['temp', 'season', 'weather', 'humidity']\n",
"X_train=bikes[feature_cols]\n",
"Y_train=bikes['count']"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [],
"source": [
"# import, instantiate, fit\n",
"from sklearn.linear_model import LinearRegression\n",
"LR=LinearRegression()\n",
"LR=LR.fit(X_train,Y_train)\n",
"Y_pred=LR.predict(X_train)"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"159.5206878612979\n",
"[ 7.86482499 22.53875753 6.67030204 -3.11887338]\n"
]
}
],
"source": [
"# print the coefficients\n",
"print(LR.intercept_)\n",
"print(LR.coef_)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 使用train/test split和RMSE来比较多个不同的模型"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"155.99832684186404\n",
"164.87950848896241\n",
"156.0458677100528\n"
]
}
],
"source": [
"# compare different sets of features\n",
"feature_cols1 = ['temp', 'season', 'weather', 'humidity']\n",
"feature_cols2 = ['temp', 'season', 'weather']\n",
"feature_cols3 = ['temp', 'season', 'humidity']\n",
"\n",
"X_train1=bikes[feature_cols1]\n",
"X_train2=bikes[feature_cols2]\n",
"X_train3=bikes[feature_cols3]\n",
"Y_train=bikes['count']\n",
"\n",
"from sklearn.linear_model import LinearRegression\n",
"LR1=LinearRegression()\n",
"LR2=LinearRegression()\n",
"LR3=LinearRegression()\n",
"LR1=LR1.fit(X_train1,Y_train)\n",
"LR2=LR2.fit(X_train2,Y_train)\n",
"LR3=LR3.fit(X_train3,Y_train)\n",
"Y_pred1=LR1.predict(X_train1)\n",
"Y_pred2=LR2.predict(X_train2)\n",
"Y_pred3=LR3.predict(X_train3)\n",
"\n",
"import numpy as np\n",
"from sklearn import metrics\n",
"print(np.sqrt(metrics.mean_squared_error(Y_train,Y_pred1)))\n",
"print(np.sqrt(metrics.mean_squared_error(Y_train,Y_pred2)))\n",
"print(np.sqrt(metrics.mean_squared_error(Y_train,Y_pred3)))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 补充:处理类别特征\n",
"\n",
"有两种类别特征:\n",
"\n",
"- **有序类别值:** 转换成相应的数字值(例如: small=1, medium=2, large=3)\n",
"- **无序类别值:** 使用dummy encoding (0/1编码)\n",
"\n",
"此数据集中的类别特征有:\n",
"\n",
"- **有序类别值:** weather (已经被编码成相应的数字值1,2,3,4)\n",
"- **无序类别值:** season (需要进行dummy encoding), holiday (已经被dummy encoded), workingday (已经被dummy encoded)"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>season_1</th>\n",
" <th>season_2</th>\n",
" <th>season_3</th>\n",
" <th>season_4</th>\n",
" </tr>\n",
" <tr>\n",
" <th>datetime</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2011-09-05 11:00:00</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-03-18 04:00:00</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-10-14 17:00:00</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-04-04 15:00:00</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-12-11 02:00:00</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" season_1 season_2 season_3 season_4\n",
"datetime \n",
"2011-09-05 11:00:00 0 0 1 0\n",
"2012-03-18 04:00:00 1 0 0 0\n",
"2012-10-14 17:00:00 0 0 0 1\n",
"2011-04-04 15:00:00 0 1 0 0\n",
"2012-12-11 02:00:00 0 0 0 1"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create dummy variables\n",
"season_dummies = pd.get_dummies(bikes.season, prefix='season')\n",
"\n",
"# print 5 random rows\n",
"season_dummies.sample(n=5, random_state=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们只需要 **三个 dummy 变量 (不是四个)** (为什么?), 所以可以删除第一个dummy变量。"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>season_2</th>\n",
" <th>season_3</th>\n",
" <th>season_4</th>\n",
" </tr>\n",
" <tr>\n",
" <th>datetime</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2011-09-05 11:00:00</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-03-18 04:00:00</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-10-14 17:00:00</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-04-04 15:00:00</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-12-11 02:00:00</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" season_2 season_3 season_4\n",
"datetime \n",
"2011-09-05 11:00:00 0 1 0\n",
"2012-03-18 04:00:00 0 0 0\n",
"2012-10-14 17:00:00 0 0 1\n",
"2011-04-04 15:00:00 1 0 0\n",
"2012-12-11 02:00:00 0 0 1"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# drop the first column\n",
"season_dummies.drop(season_dummies.columns[0], axis=1, inplace=True)\n",
"\n",
"# print 5 random rows\n",
"season_dummies.sample(n=5, random_state=1)"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>season</th>\n",
" <th>holiday</th>\n",
" <th>workingday</th>\n",
" <th>weather</th>\n",
" <th>temp</th>\n",
" <th>atemp</th>\n",
" <th>humidity</th>\n",
" <th>windspeed</th>\n",
" <th>casual</th>\n",
" <th>registered</th>\n",
" <th>count</th>\n",
" <th>season_2</th>\n",
" <th>season_3</th>\n",
" <th>season_4</th>\n",
" </tr>\n",
" <tr>\n",
" <th>datetime</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2011-09-05 11:00:00</th>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>28.70</td>\n",
" <td>33.335</td>\n",
" <td>74</td>\n",
" <td>11.0014</td>\n",
" <td>101</td>\n",
" <td>207</td>\n",
" <td>308</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-03-18 04:00:00</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>17.22</td>\n",
" <td>21.210</td>\n",
" <td>94</td>\n",
" <td>11.0014</td>\n",
" <td>6</td>\n",
" <td>8</td>\n",
" <td>14</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-10-14 17:00:00</th>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>26.24</td>\n",
" <td>31.060</td>\n",
" <td>44</td>\n",
" <td>12.9980</td>\n",
" <td>193</td>\n",
" <td>346</td>\n",
" <td>539</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2011-04-04 15:00:00</th>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>31.16</td>\n",
" <td>33.335</td>\n",
" <td>23</td>\n",
" <td>36.9974</td>\n",
" <td>47</td>\n",
" <td>96</td>\n",
" <td>143</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2012-12-11 02:00:00</th>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>16.40</td>\n",
" <td>20.455</td>\n",
" <td>66</td>\n",
" <td>22.0028</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" season holiday workingday weather temp atemp \\\n",
"datetime \n",
"2011-09-05 11:00:00 3 1 0 2 28.70 33.335 \n",
"2012-03-18 04:00:00 1 0 0 2 17.22 21.210 \n",
"2012-10-14 17:00:00 4 0 0 1 26.24 31.060 \n",
"2011-04-04 15:00:00 2 0 1 1 31.16 33.335 \n",
"2012-12-11 02:00:00 4 0 1 2 16.40 20.455 \n",
"\n",
" humidity windspeed casual registered count season_2 \\\n",
"datetime \n",
"2011-09-05 11:00:00 74 11.0014 101 207 308 0 \n",
"2012-03-18 04:00:00 94 11.0014 6 8 14 0 \n",
"2012-10-14 17:00:00 44 12.9980 193 346 539 0 \n",
"2011-04-04 15:00:00 23 36.9974 47 96 143 1 \n",
"2012-12-11 02:00:00 66 22.0028 0 1 1 0 \n",
"\n",
" season_3 season_4 \n",
"datetime \n",
"2011-09-05 11:00:00 1 0 \n",
"2012-03-18 04:00:00 0 0 \n",
"2012-10-14 17:00:00 0 1 \n",
"2011-04-04 15:00:00 0 0 \n",
"2012-12-11 02:00:00 0 1 "
]
},
"execution_count": 80,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# concatenate the original DataFrame and the dummy DataFrame (axis=0 means rows, axis=1 means columns)\n",
"bikes = pd.concat([bikes, season_dummies], axis=1)\n",
"\n",
"# print 5 random rows\n",
"bikes.sample(n=5, random_state=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 将编码成的dummy变量加入回归模型的特征预测单车租赁数"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"134.90282635847132\n",
"[ 11.18640586 -3.3905431 -41.73686071 64.41596147 -2.81948164]\n",
"154.31453057695444\n"
]
}
],
"source": [
"# include dummy variables for season in the model\n",
"feature_cols = ['temp', 'season_2', 'season_3', 'season_4', 'humidity']\n",
"X_train=bikes[feature_cols]\n",
"Y_train=bikes['count']\n",
"LR=LinearRegression()\n",
"LR=LR.fit(X_train,Y_train)\n",
"Y_pred=LR.predict(X_train)\n",
"print(LR.intercept_)\n",
"print(LR.coef_)\n",
"print(np.sqrt(metrics.mean_squared_error(Y_train,Y_pred)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"和前面的模型进行比较"
]
}
],
"metadata": {
"kernelspec": {
2022-12-09 16:14:16 +00:00
"display_name": "Python 3 (ipykernel)",
2022-11-30 02:39:17 +00:00
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
},
"vscode": {
"interpreter": {
"hash": "1f0d395e06aa83586067b19165efc9b683889967164248deef4bbf1fa27cfb00"
}
}
},
"nbformat": 4,
"nbformat_minor": 1
}