Chapter 3 Data transformation
We downloaded data as csv files from Zillow and transformed it based on our need. We selected the related variables only and transformed them into the format for plotting, by dplyr package functions such as mutate()
, select()
, filter()
, group_by()
, summarise()
and tidyr package functions like gather()
and spread()
.
We picked City, State, and median house prices, and used it for our EDA and other analysis. In the interactive chart we wanted to explore various data sources in d3 and we used csv and static Json created inside the html page.
Here’s the static Json we created for the interactive d3 Chart:
var data19 = [
{group: "San Jose, CA", value: 3090.5},
{group: "San Francisco, CA", value: 3015},
{group: "New York, NY", value: 2743.5},
{group: "Boston, MA", value: 2483.5},
{group: "Ventura, CA", value: 2436.5},
{group: "Urban Honolulu, HI", value: 2349.5},
{group: "Los Angeles-Long Beach-Anaheim, CA", value: 2304.5},
{group: "San Diego, CA", value: 2230},
{group: "Stamford, CT", value: 2169.5}
];
var data20 = [
{group: "San Jose, CA", value: 3027},
{group: "San Francisco, CA", value: 2936.5},
{group: "New York, NY", value: 2659},
{group: "Boston, MA", value: 2454.5},
{group: "Ventura, CA", value: 2495},
{group: "Urban Honolulu, HI", value: 2356.5},
{group: "Los Angeles-Long Beach-Anaheim, CA", value: 2322.5},
{group: "San Diego, CA", value: 2282.5},
{group: "Stamford, CT", value: 2237.5}];
var data21 = [
{group: "San Jose, CA", value: 2993.5},
{group: "San Francisco, CA", value: 2909.5},
{group: "New York, NY", value: 2667},
{group: "Boston, MA", value: 2498},
{group: "Ventura, CA", value: 2746},
{group: "Urban Honolulu, HI", value: 2487},
{group: "Los Angeles-Long Beach-Anaheim, CA", value: 2430.5},
{group: "San Diego, CA", value: 2490.5},
{group: "Stamford, CT", value: 2421}];
As the data itself is in a clean format, there was no necessity for us to transform the data much.
Here is a quick glimpse of the main data.
## RegionID RegionName SizeRank X2014.01 X2014.02 X2014.03 X2014.04
## 96 395075 Scranton, PA 95 801 803 804 805
## 97 394666 Harrisburg, PA 96 966 964 962 959
## 98 394957 Melbourne, FL 97 944 948 953 958
## 99 395113 Spokane, WA 99 936 932 928 923
## 100 395006 Provo, UT 100 986 991 995 999
## 101 394549 Durham, NC 104 1102 1105 1109 1112
## 102 394995 Port St. Lucie, FL 119 1057 1065 1073 1081
## 103 394602 Fort Collins, CO 159 1213 1219 1226 1233
## 104 394405 Boulder, CO 162 1502 1509 1517 1524
## 105 394623 Gainesville, GA 229 940 943 945 947
## X2014.05 X2014.06 X2014.07 X2014.08 X2014.09 X2014.10 X2014.11 X2014.12
## 96 807 808 809 810 811 813 813 813
## 97 957 955 954 952 951 949 950 950
## 98 963 968 973 978 983 988 993 999
## 99 920 916 913 910 907 903 904 904
## 100 1003 1008 1012 1016 1020 1024 1028 1032
## 101 1116 1119 1123 1127 1130 1134 1137 1140
## 102 1089 1097 1104 1112 1120 1127 1134 1140
## 103 1239 1246 1253 1260 1267 1273 1282 1290
## 104 1532 1540 1548 1556 1564 1572 1582 1592
## 105 950 953 956 958 961 964 968 972
## X2015.01 X2015.02 X2015.03 X2015.04 X2015.05 X2015.06 X2015.07 X2015.08
## 96 813 813 813 813 814 815 815 817
## 97 951 953 956 958 960 962 963 964
## 98 1005 1012 1019 1026 1033 1040 1047 1054
## 99 905 909 913 917 923 929 935 941
## 100 1036 1040 1044 1048 1052 1056 1061 1065
## 101 1143 1146 1148 1151 1155 1158 1161 1165
## 102 1146 1152 1158 1164 1171 1177 1184 1192
## 103 1298 1305 1313 1321 1327 1333 1338 1343
## 104 1602 1613 1624 1635 1646 1658 1669 1680
## 105 976 982 988 994 1000 1006 1013 1017
## X2015.09 X2015.10 X2015.11 X2015.12 X2016.01 X2016.02 X2016.03 X2016.04
## 96 818 819 821 823 825 827 828 830
## 97 966 967 968 970 972 974 977 979
## 98 1061 1068 1075 1082 1089 1095 1102 1109
## 99 946 952 956 961 965 969 973 977
## 100 1070 1074 1079 1085 1090 1095 1101 1106
## 101 1169 1173 1177 1181 1185 1189 1193 1197
## 102 1200 1209 1218 1227 1236 1243 1251 1258
## 103 1347 1351 1356 1360 1364 1368 1371 1374
## 104 1690 1700 1708 1716 1724 1729 1734 1739
## 105 1021 1026 1030 1033 1037 1042 1046 1051
## X2016.05 X2016.06 X2016.07 X2016.08 X2016.09 X2016.10 X2016.11 X2016.12
## 96 831 833 835 838 840 842 844 845
## 97 981 983 986 987 989 991 992 993
## 98 1115 1122 1129 1136 1143 1150 1157 1164
## 99 980 984 987 990 992 995 998 1001
## 100 1111 1116 1120 1125 1129 1134 1138 1143
## 101 1202 1206 1211 1215 1218 1222 1225 1228
## 102 1265 1271 1278 1284 1290 1296 1302 1308
## 103 1377 1380 1383 1387 1390 1394 1396 1398
## 104 1742 1744 1747 1749 1751 1753 1755 1757
## 105 1056 1061 1066 1072 1077 1083 1088 1094
## X2017.01 X2017.02 X2017.03 X2017.04 X2017.05 X2017.06 X2017.07 X2017.08
## 96 847 848 850 851 852 853 854 854
## 97 994 994 995 996 997 998 999 1001
## 98 1171 1177 1184 1191 1197 1204 1210 1217
## 99 1005 1009 1014 1018 1023 1027 1032 1037
## 100 1147 1151 1155 1159 1163 1168 1172 1177
## 101 1232 1234 1237 1240 1242 1244 1246 1248
## 102 1314 1321 1328 1334 1341 1348 1355 1362
## 103 1401 1403 1405 1407 1410 1413 1416 1419
## 104 1760 1763 1766 1770 1775 1779 1784 1789
## 105 1100 1104 1109 1114 1119 1124 1128 1134
## X2017.09 X2017.10 X2017.11 X2017.12 X2018.01 X2018.02 X2018.03 X2018.04
## 96 855 856 857 858 859 861 862 864
## 97 1002 1003 1005 1007 1008 1011 1014 1016
## 98 1224 1231 1238 1246 1253 1260 1267 1274
## 99 1042 1047 1053 1059 1064 1069 1074 1080
## 100 1182 1186 1191 1196 1201 1207 1212 1217
## 101 1250 1252 1255 1258 1261 1264 1267 1270
## 102 1369 1376 1384 1392 1399 1407 1415 1422
## 103 1422 1425 1428 1432 1436 1440 1444 1448
## 104 1794 1799 1805 1811 1816 1823 1829 1835
## 105 1139 1145 1149 1154 1158 1162 1166 1170
## X2018.05 X2018.06 X2018.07 X2018.08 X2018.09 X2018.10 X2018.11 X2018.12
## 96 866 868 870 873 875 877 880 882
## 97 1020 1024 1028 1033 1037 1042 1045 1049
## 98 1281 1288 1294 1300 1306 1312 1317 1322
## 99 1085 1090 1095 1101 1106 1112 1117 1123
## 100 1222 1227 1232 1237 1241 1246 1250 1255
## 101 1274 1278 1281 1286 1290 1295 1300 1305
## 102 1430 1437 1444 1450 1457 1463 1467 1472
## 103 1452 1457 1461 1466 1470 1475 1479 1483
## 104 1841 1847 1852 1858 1864 1870 1875 1881
## 105 1175 1180 1184 1191 1197 1203 1210 1217
## X2019.01 X2019.02 X2019.03 X2019.04 X2019.05 X2019.06 X2019.07 X2019.08
## 96 885 888 891 894 898 901 904 907
## 97 1053 1057 1060 1063 1066 1068 1071 1072
## 98 1326 1331 1335 1339 1344 1348 1353 1359
## 99 1129 1134 1140 1145 1150 1155 1160 1165
## 100 1259 1263 1267 1272 1275 1279 1282 1285
## 101 1309 1314 1319 1324 1329 1333 1337 1340
## 102 1476 1481 1486 1491 1497 1503 1509 1516
## 103 1487 1491 1495 1498 1502 1506 1510 1513
## 104 1886 1891 1896 1901 1906 1911 1916 1920
## 105 1223 1230 1237 1243 1249 1254 1260 1264
## X2019.09 X2019.10 X2019.11 X2019.12 X2020.01 X2020.02 X2020.03 X2020.04
## 96 910 913 916 918 921 924 927 930
## 97 1074 1076 1077 1079 1080 1082 1083 1085
## 98 1364 1369 1374 1379 1384 1389 1395 1400
## 99 1171 1176 1181 1185 1190 1196 1202 1208
## 100 1288 1291 1293 1296 1299 1302 1305 1308
## 101 1343 1346 1347 1348 1349 1350 1351 1352
## 102 1522 1529 1535 1541 1546 1553 1560 1567
## 103 1517 1520 1523 1526 1529 1531 1533 1535
## 104 1924 1928 1929 1931 1933 1934 1935 1937
## 105 1267 1271 1273 1275 1277 1279 1282 1285
## X2020.05 X2020.06 X2020.07 X2020.08 X2020.09 X2020.10 X2020.11 X2020.12
## 96 934 938 943 949 955 961 970 979
## 97 1087 1090 1093 1098 1103 1108 1117 1126
## 98 1406 1412 1419 1427 1436 1445 1461 1478
## 99 1216 1225 1234 1246 1259 1271 1291 1311
## 100 1315 1321 1328 1338 1348 1359 1373 1388
## 101 1354 1356 1358 1361 1365 1368 1376 1385
## 102 1577 1586 1596 1609 1622 1636 1658 1680
## 103 1538 1540 1542 1546 1550 1554 1563 1572
## 104 1940 1943 1947 1954 1961 1969 1981 1994
## 105 1290 1296 1301 1312 1322 1333 1352 1372
## X2021.01 X2021.02 X2021.03 X2021.04 X2021.05 X2021.06 X2021.07 X2021.08
## 96 988 1000 1011 1022 1033 1045 1056 1068
## 97 1136 1147 1159 1171 1183 1195 1207 1219
## 98 1494 1518 1541 1564 1588 1612 1635 1660
## 99 1332 1355 1378 1401 1424 1447 1470 1494
## 100 1402 1420 1438 1457 1475 1494 1512 1531
## 101 1393 1407 1420 1434 1448 1462 1476 1491
## 102 1703 1732 1760 1789 1818 1848 1877 1907
## 103 1581 1594 1608 1621 1634 1648 1662 1676
## 104 2006 2022 2038 2054 2070 2087 2103 2119
## 105 1392 1416 1439 1463 NA 1511 1535 1560
## X2021.09 X2021.10
## 96 1079 1091
## 97 1232 NA
## 98 1685 1710
## 99 1518 1542
## 100 1550 1569
## 101 1506 1520
## 102 1938 1968
## 103 1690 1704
## 104 2136 2152
## 105 1585 1610