{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# K近邻分类实验" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在这个练习中,我们使用电信企业的客户流失数据集,`e2.1_Orange_Telecom_Churn_Data.csv`(存放在当前目录下)。我们先读入数据集,做一些数据预处理,然后使用K近邻模型根据用户的特点来预测其是否会流失。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 第一步:\n", "* 将数据集读入变量`data`中,并查看其前5行。\n", "* 去除其中的`\"state\"`,`\"area_code\"`和`\"phone_number\"`三列。" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | state | \n", "account_length | \n", "area_code | \n", "phone_number | \n", "intl_plan | \n", "voice_mail_plan | \n", "number_vmail_messages | \n", "total_day_minutes | \n", "total_day_calls | \n", "total_day_charge | \n", "total_eve_minutes | \n", "total_eve_calls | \n", "total_eve_charge | \n", "total_night_minutes | \n", "total_night_calls | \n", "total_night_charge | \n", "total_intl_minutes | \n", "total_intl_calls | \n", "total_intl_charge | \n", "number_customer_service_calls | \n", "churned | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "KS | \n", "128 | \n", "415 | \n", "382-4657 | \n", "no | \n", "yes | \n", "25 | \n", "265.1 | \n", "110 | \n", "45.07 | \n", "197.4 | \n", "99 | \n", "16.78 | \n", "244.7 | \n", "91 | \n", "11.01 | \n", "10.0 | \n", "3 | \n", "2.70 | \n", "1 | \n", "False | \n", "
1 | \n", "OH | \n", "107 | \n", "415 | \n", "371-7191 | \n", "no | \n", "yes | \n", "26 | \n", "161.6 | \n", "123 | \n", "27.47 | \n", "195.5 | \n", "103 | \n", "16.62 | \n", "254.4 | \n", "103 | \n", "11.45 | \n", "13.7 | \n", "3 | \n", "3.70 | \n", "1 | \n", "False | \n", "
2 | \n", "NJ | \n", "137 | \n", "415 | \n", "358-1921 | \n", "no | \n", "no | \n", "0 | \n", "243.4 | \n", "114 | \n", "41.38 | \n", "121.2 | \n", "110 | \n", "10.30 | \n", "162.6 | \n", "104 | \n", "7.32 | \n", "12.2 | \n", "5 | \n", "3.29 | \n", "0 | \n", "False | \n", "
3 | \n", "OH | \n", "84 | \n", "408 | \n", "375-9999 | \n", "yes | \n", "no | \n", "0 | \n", "299.4 | \n", "71 | \n", "50.90 | \n", "61.9 | \n", "88 | \n", "5.26 | \n", "196.9 | \n", "89 | \n", "8.86 | \n", "6.6 | \n", "7 | \n", "1.78 | \n", "2 | \n", "False | \n", "
4 | \n", "OK | \n", "75 | \n", "415 | \n", "330-6626 | \n", "yes | \n", "no | \n", "0 | \n", "166.7 | \n", "113 | \n", "28.34 | \n", "148.3 | \n", "122 | \n", "12.61 | \n", "186.9 | \n", "121 | \n", "8.41 | \n", "10.1 | \n", "3 | \n", "2.73 | \n", "3 | \n", "False | \n", "
\n", " | account_length | \n", "intl_plan | \n", "voice_mail_plan | \n", "number_vmail_messages | \n", "total_day_minutes | \n", "total_day_calls | \n", "total_day_charge | \n", "total_eve_minutes | \n", "total_eve_calls | \n", "total_eve_charge | \n", "total_night_minutes | \n", "total_night_calls | \n", "total_night_charge | \n", "total_intl_minutes | \n", "total_intl_calls | \n", "total_intl_charge | \n", "number_customer_service_calls | \n", "churned | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "128 | \n", "0 | \n", "1 | \n", "25 | \n", "265.1 | \n", "110 | \n", "45.07 | \n", "197.4 | \n", "99 | \n", "16.78 | \n", "244.7 | \n", "91 | \n", "11.01 | \n", "10.0 | \n", "3 | \n", "2.70 | \n", "1 | \n", "0 | \n", "
1 | \n", "107 | \n", "0 | \n", "1 | \n", "26 | \n", "161.6 | \n", "123 | \n", "27.47 | \n", "195.5 | \n", "103 | \n", "16.62 | \n", "254.4 | \n", "103 | \n", "11.45 | \n", "13.7 | \n", "3 | \n", "3.70 | \n", "1 | \n", "0 | \n", "
2 | \n", "137 | \n", "0 | \n", "0 | \n", "0 | \n", "243.4 | \n", "114 | \n", "41.38 | \n", "121.2 | \n", "110 | \n", "10.30 | \n", "162.6 | \n", "104 | \n", "7.32 | \n", "12.2 | \n", "5 | \n", "3.29 | \n", "0 | \n", "0 | \n", "
3 | \n", "84 | \n", "1 | \n", "0 | \n", "0 | \n", "299.4 | \n", "71 | \n", "50.90 | \n", "61.9 | \n", "88 | \n", "5.26 | \n", "196.9 | \n", "89 | \n", "8.86 | \n", "6.6 | \n", "7 | \n", "1.78 | \n", "2 | \n", "0 | \n", "
4 | \n", "75 | \n", "1 | \n", "0 | \n", "0 | \n", "166.7 | \n", "113 | \n", "28.34 | \n", "148.3 | \n", "122 | \n", "12.61 | \n", "186.9 | \n", "121 | \n", "8.41 | \n", "10.1 | \n", "3 | \n", "2.73 | \n", "3 | \n", "0 | \n", "
KNeighborsClassifier(n_neighbors=3)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
KNeighborsClassifier(n_neighbors=3)