Chapter 3 Data transformation

We downloaded data as csv files from Zillow and transformed it based on our need. We selected the related variables only and transformed them into the format for plotting, by dplyr package functions such as mutate(), select(), filter(), group_by(), summarise() and tidyr package functions like gather() and spread().

We picked City, State, and median house prices, and used it for our EDA and other analysis. In the interactive chart we wanted to explore various data sources in d3 and we used csv and static Json created inside the html page.

Here’s the static Json we created for the interactive d3 Chart:

var data19 = [
{group: "San Jose, CA", value: 3090.5},
{group: "San Francisco, CA", value: 3015},
{group: "New York, NY", value: 2743.5},
{group: "Boston, MA", value: 2483.5},
{group: "Ventura, CA", value: 2436.5},
{group: "Urban Honolulu, HI", value: 2349.5},
{group: "Los Angeles-Long Beach-Anaheim, CA", value: 2304.5},
{group: "San Diego, CA", value: 2230},
{group: "Stamford, CT", value: 2169.5}
];

var data20 = [
{group: "San Jose, CA", value: 3027},
{group: "San Francisco, CA", value: 2936.5},
{group: "New York, NY", value: 2659},
{group: "Boston, MA", value: 2454.5},
{group: "Ventura, CA", value: 2495},
{group: "Urban Honolulu, HI", value: 2356.5},
{group: "Los Angeles-Long Beach-Anaheim, CA", value: 2322.5},
{group: "San Diego, CA", value: 2282.5},
{group: "Stamford, CT", value: 2237.5}];

var data21 = [
{group: "San Jose, CA", value: 2993.5},
{group: "San Francisco, CA", value: 2909.5},
{group: "New York, NY", value: 2667},
{group: "Boston, MA", value: 2498},
{group: "Ventura, CA", value: 2746},
{group: "Urban Honolulu, HI", value: 2487},
{group: "Los Angeles-Long Beach-Anaheim, CA", value: 2430.5},
{group: "San Diego, CA", value: 2490.5},
{group: "Stamford, CT", value: 2421}];  

As the data itself is in a clean format, there was no necessity for us to transform the data much.

Here is a quick glimpse of the main data.

##     RegionID         RegionName SizeRank X2014.01 X2014.02 X2014.03 X2014.04
## 96    395075       Scranton, PA       95      801      803      804      805
## 97    394666     Harrisburg, PA       96      966      964      962      959
## 98    394957      Melbourne, FL       97      944      948      953      958
## 99    395113        Spokane, WA       99      936      932      928      923
## 100   395006          Provo, UT      100      986      991      995      999
## 101   394549         Durham, NC      104     1102     1105     1109     1112
## 102   394995 Port St. Lucie, FL      119     1057     1065     1073     1081
## 103   394602   Fort Collins, CO      159     1213     1219     1226     1233
## 104   394405        Boulder, CO      162     1502     1509     1517     1524
## 105   394623    Gainesville, GA      229      940      943      945      947
##     X2014.05 X2014.06 X2014.07 X2014.08 X2014.09 X2014.10 X2014.11 X2014.12
## 96       807      808      809      810      811      813      813      813
## 97       957      955      954      952      951      949      950      950
## 98       963      968      973      978      983      988      993      999
## 99       920      916      913      910      907      903      904      904
## 100     1003     1008     1012     1016     1020     1024     1028     1032
## 101     1116     1119     1123     1127     1130     1134     1137     1140
## 102     1089     1097     1104     1112     1120     1127     1134     1140
## 103     1239     1246     1253     1260     1267     1273     1282     1290
## 104     1532     1540     1548     1556     1564     1572     1582     1592
## 105      950      953      956      958      961      964      968      972
##     X2015.01 X2015.02 X2015.03 X2015.04 X2015.05 X2015.06 X2015.07 X2015.08
## 96       813      813      813      813      814      815      815      817
## 97       951      953      956      958      960      962      963      964
## 98      1005     1012     1019     1026     1033     1040     1047     1054
## 99       905      909      913      917      923      929      935      941
## 100     1036     1040     1044     1048     1052     1056     1061     1065
## 101     1143     1146     1148     1151     1155     1158     1161     1165
## 102     1146     1152     1158     1164     1171     1177     1184     1192
## 103     1298     1305     1313     1321     1327     1333     1338     1343
## 104     1602     1613     1624     1635     1646     1658     1669     1680
## 105      976      982      988      994     1000     1006     1013     1017
##     X2015.09 X2015.10 X2015.11 X2015.12 X2016.01 X2016.02 X2016.03 X2016.04
## 96       818      819      821      823      825      827      828      830
## 97       966      967      968      970      972      974      977      979
## 98      1061     1068     1075     1082     1089     1095     1102     1109
## 99       946      952      956      961      965      969      973      977
## 100     1070     1074     1079     1085     1090     1095     1101     1106
## 101     1169     1173     1177     1181     1185     1189     1193     1197
## 102     1200     1209     1218     1227     1236     1243     1251     1258
## 103     1347     1351     1356     1360     1364     1368     1371     1374
## 104     1690     1700     1708     1716     1724     1729     1734     1739
## 105     1021     1026     1030     1033     1037     1042     1046     1051
##     X2016.05 X2016.06 X2016.07 X2016.08 X2016.09 X2016.10 X2016.11 X2016.12
## 96       831      833      835      838      840      842      844      845
## 97       981      983      986      987      989      991      992      993
## 98      1115     1122     1129     1136     1143     1150     1157     1164
## 99       980      984      987      990      992      995      998     1001
## 100     1111     1116     1120     1125     1129     1134     1138     1143
## 101     1202     1206     1211     1215     1218     1222     1225     1228
## 102     1265     1271     1278     1284     1290     1296     1302     1308
## 103     1377     1380     1383     1387     1390     1394     1396     1398
## 104     1742     1744     1747     1749     1751     1753     1755     1757
## 105     1056     1061     1066     1072     1077     1083     1088     1094
##     X2017.01 X2017.02 X2017.03 X2017.04 X2017.05 X2017.06 X2017.07 X2017.08
## 96       847      848      850      851      852      853      854      854
## 97       994      994      995      996      997      998      999     1001
## 98      1171     1177     1184     1191     1197     1204     1210     1217
## 99      1005     1009     1014     1018     1023     1027     1032     1037
## 100     1147     1151     1155     1159     1163     1168     1172     1177
## 101     1232     1234     1237     1240     1242     1244     1246     1248
## 102     1314     1321     1328     1334     1341     1348     1355     1362
## 103     1401     1403     1405     1407     1410     1413     1416     1419
## 104     1760     1763     1766     1770     1775     1779     1784     1789
## 105     1100     1104     1109     1114     1119     1124     1128     1134
##     X2017.09 X2017.10 X2017.11 X2017.12 X2018.01 X2018.02 X2018.03 X2018.04
## 96       855      856      857      858      859      861      862      864
## 97      1002     1003     1005     1007     1008     1011     1014     1016
## 98      1224     1231     1238     1246     1253     1260     1267     1274
## 99      1042     1047     1053     1059     1064     1069     1074     1080
## 100     1182     1186     1191     1196     1201     1207     1212     1217
## 101     1250     1252     1255     1258     1261     1264     1267     1270
## 102     1369     1376     1384     1392     1399     1407     1415     1422
## 103     1422     1425     1428     1432     1436     1440     1444     1448
## 104     1794     1799     1805     1811     1816     1823     1829     1835
## 105     1139     1145     1149     1154     1158     1162     1166     1170
##     X2018.05 X2018.06 X2018.07 X2018.08 X2018.09 X2018.10 X2018.11 X2018.12
## 96       866      868      870      873      875      877      880      882
## 97      1020     1024     1028     1033     1037     1042     1045     1049
## 98      1281     1288     1294     1300     1306     1312     1317     1322
## 99      1085     1090     1095     1101     1106     1112     1117     1123
## 100     1222     1227     1232     1237     1241     1246     1250     1255
## 101     1274     1278     1281     1286     1290     1295     1300     1305
## 102     1430     1437     1444     1450     1457     1463     1467     1472
## 103     1452     1457     1461     1466     1470     1475     1479     1483
## 104     1841     1847     1852     1858     1864     1870     1875     1881
## 105     1175     1180     1184     1191     1197     1203     1210     1217
##     X2019.01 X2019.02 X2019.03 X2019.04 X2019.05 X2019.06 X2019.07 X2019.08
## 96       885      888      891      894      898      901      904      907
## 97      1053     1057     1060     1063     1066     1068     1071     1072
## 98      1326     1331     1335     1339     1344     1348     1353     1359
## 99      1129     1134     1140     1145     1150     1155     1160     1165
## 100     1259     1263     1267     1272     1275     1279     1282     1285
## 101     1309     1314     1319     1324     1329     1333     1337     1340
## 102     1476     1481     1486     1491     1497     1503     1509     1516
## 103     1487     1491     1495     1498     1502     1506     1510     1513
## 104     1886     1891     1896     1901     1906     1911     1916     1920
## 105     1223     1230     1237     1243     1249     1254     1260     1264
##     X2019.09 X2019.10 X2019.11 X2019.12 X2020.01 X2020.02 X2020.03 X2020.04
## 96       910      913      916      918      921      924      927      930
## 97      1074     1076     1077     1079     1080     1082     1083     1085
## 98      1364     1369     1374     1379     1384     1389     1395     1400
## 99      1171     1176     1181     1185     1190     1196     1202     1208
## 100     1288     1291     1293     1296     1299     1302     1305     1308
## 101     1343     1346     1347     1348     1349     1350     1351     1352
## 102     1522     1529     1535     1541     1546     1553     1560     1567
## 103     1517     1520     1523     1526     1529     1531     1533     1535
## 104     1924     1928     1929     1931     1933     1934     1935     1937
## 105     1267     1271     1273     1275     1277     1279     1282     1285
##     X2020.05 X2020.06 X2020.07 X2020.08 X2020.09 X2020.10 X2020.11 X2020.12
## 96       934      938      943      949      955      961      970      979
## 97      1087     1090     1093     1098     1103     1108     1117     1126
## 98      1406     1412     1419     1427     1436     1445     1461     1478
## 99      1216     1225     1234     1246     1259     1271     1291     1311
## 100     1315     1321     1328     1338     1348     1359     1373     1388
## 101     1354     1356     1358     1361     1365     1368     1376     1385
## 102     1577     1586     1596     1609     1622     1636     1658     1680
## 103     1538     1540     1542     1546     1550     1554     1563     1572
## 104     1940     1943     1947     1954     1961     1969     1981     1994
## 105     1290     1296     1301     1312     1322     1333     1352     1372
##     X2021.01 X2021.02 X2021.03 X2021.04 X2021.05 X2021.06 X2021.07 X2021.08
## 96       988     1000     1011     1022     1033     1045     1056     1068
## 97      1136     1147     1159     1171     1183     1195     1207     1219
## 98      1494     1518     1541     1564     1588     1612     1635     1660
## 99      1332     1355     1378     1401     1424     1447     1470     1494
## 100     1402     1420     1438     1457     1475     1494     1512     1531
## 101     1393     1407     1420     1434     1448     1462     1476     1491
## 102     1703     1732     1760     1789     1818     1848     1877     1907
## 103     1581     1594     1608     1621     1634     1648     1662     1676
## 104     2006     2022     2038     2054     2070     2087     2103     2119
## 105     1392     1416     1439     1463       NA     1511     1535     1560
##     X2021.09 X2021.10
## 96      1079     1091
## 97      1232       NA
## 98      1685     1710
## 99      1518     1542
## 100     1550     1569
## 101     1506     1520
## 102     1938     1968
## 103     1690     1704
## 104     2136     2152
## 105     1585     1610