The proliferation of WiFi has enabled the use of real-time location systems (RTLS) inside buildings for the tracking of critical personnel and/or valuable equipment. The following case study extends the work of Nolan and Lang’s kNN analysis of data from a RTLS at the University of Mannheim. K-nearest neighbors (kNN) will be used to predict the location of a device within one floor of a campus building. Four kNN models that include either six or seven access point MAC addresses will be compared based on two different classification techniques, unweighted and weighted. Though the error estimates did not dramatically vary, the weighted kNN with seven access point MAC addresses attained the lowest estimated error.
The use of Global Positioning Systems (GPS) has spread vastly over the past ten years and become a common part of most people’s everyday experience. Unfortunately, GPS does not work well inside of buildings. Instead, Indoor Positioning Systems (IPS) are used for locating people and objects indoors. With WiFi, an IPS can implement a real-time location systems (RTLS) that can keep track of items and/or people in real-time. These systems can be used for indoor location tracking of critical personnel or equipment, such as doctors, nurses and medical material in a hospital. They can also be used for many practical purposes such as inventory management, shipping management, or locating equipment in a warehouse.
For this case study, modeling and analysis of data from an RTLS in a building at the University of Mannheim will be conducted. Measurements of signal strength were recorded at 7 stationary WiFi access points within one floor of the building. Signal strengths were recorded 110 times for 166 locations in 8 orientations in 45 degree increments (0, 45, 90, etc.). This makes the dataset contain over one million data points.
This study will be an extention the work previously done on this dataset by Nolan and Lang. Their aim was to build a k-nearest neighbors (kNN) model for the location of a device as a function of the signal strength between the device and each access point. They then used their model to predict the location of a new unknown device based on the detected signals for the device. In their analysis, Nolan and Lang used only six access points of the seven total. They decided to match access points to their locations and keep the access point with MAC address 00:0f:a3:39:e1:c0 and to remove the data corresponding to MAC address 00:0f:a3:39:dd:cd. This study will add to their work by conducting kNN analysis on the omitted MAC, 00:0f:a3:39:dd:cd, and also fitting the model with both MAC addresses.
The task will be to process the data and prepare it for analysis, as Nolan and Lang did. Then use kNN to determine locations and to ascertain which of two MAC addresses or both, are most appropriate for inclusion in the RTLS for the building in question. Additionally, this study will implement an alternative prediction method. While kNN has proven to be a good approach to determining location for Nolan and Lang, there are simple alternatives such as implementing a weighted model. This technique would weight the signal strength inversely proportional to the distance from the test observation. Nearer points will have larger contribution to the kNN location calculation than the points that are farther away. The models will be compared our by using a function called calcError() that computes estimates against actuals for a measure of the size of the error.
For the analysis, we will utilize two different datasets. The “offline” dataset will be used as the training data to develop the prediction model. The “online” dataset is used as the test set. Much of the data cleaning process is unraveling the original format of the data and extracting the variables to create a more useful dataset. It also involved converting variables to the appropriate data types for analysis and scaling variables, such as time correctly. All the records for adhoc measurements are dropped from data frame, which removes more than 100,000 rows from the dataset. Additionally, the scanMac, posZ, channel, and the type variables are removed because there was only one value for these variables or they were not relevant. The orientation variable values were rounded to correspond with eight angle increments and the 12 MAC addresses were sifted for the seven applicable addresses.
| Variable Name | Definition |
|---|---|
| t | Timestamp in milliseconds since midnight, January 1, 1970 UTC |
| id | MAC address of the scanning device |
| pos | The physical coordinate of the scanning device |
| degree | The orientation of the user carrying the scanning device in degrees (45 degree increments) |
| MAC | MAC address of a responding peer (e.g., an access point or a device in adhoc mode) with the corresponding values for signal strength in dBm (Decibel-milliwatts), the channel frequency and its mode (access point = 3, device in adhoc mode = 1) |
Just as Nolan and Logan did, this analysis will be using kNN with averaged X-Y coordinates for the first part of this study. The kNN algorithm can be used for classification, as well as regression tasks. It has becomes a widely used technique due to its simplicity, ease of interpretation and low calculation time. There are several distance measures that can be used for kNN, but for this study the Euclidean distance will be used where $S_i$ is the signal strength measured between the hand-held device and the $i$-th access point for a training observation taken at a specified location, and $S^∗_i$ is the signal measured between the same access point and our new point whose $(x, y)$ values are what is being predicted.
The last model in this study will utilized weighted X-Y coordinates, instead of averaged X-Y points. A weighted kNN is designed to give more weight to the data points which are close by and less weight to the points which are farther away. Instead of averaging, this modified kNN applies weights to the X-Y positions of each k closest points and take the sum of those results. This could increase the accuracy of the model. The weighted kNN again utilizes the Euclidean Distance, where the weights are
for the $i$-th closest neighboring observation where $d_i$ is the distance from the new point to the closest reference point in terms of signal strength.
The results from the four models are below. The models are compared by using a function called calcError() that computes estimates against actuals for a measure of the size of the error.
The graph above and the analysis indicates that for the model with six MACs, including 00:0f:a3:39:e1:c0, a $k =$ 6 produces the least amount of error. The SSE is 1198 and calcError function gives an error estimate of 272.4092.
The graph above and the analysis indicates that for the model with six MACs, including 00:0f:a3:39:dd:cd, a $k =$ 7 produces the least amount of error. The SSE is 1003 and calcError function gives an error estimate of 258.075.
The graph above and the analysis indicates that for the model with all seven MACs, including both 00:0f:a3:39:e1:c0 and 00:0f:a3:39:dd:cd, a $k =$ 8 produces the least amount of error. The SSE is 1459 and calcError function gives an error estimate of 235.196.
The graph above and the analysis indicates that for the weighted model with all seven MACs, including both 00:0f:a3:39:e1:c0 and 00:0f:a3:39:dd:cd, a $k =$ 9 produces the least amount of error. The SSE is 1396 and calcError function gives an error estimate of 227.4423.
The model that resulted in the lowest calculated error was the weighted model that includes all seven access points. Models with only six of the access points had higher estimated error than either model with all seven, weighted and unweighted.
kNN Model | $k$ | Estimated Error MAC 00:0f:a3:39:e1:c0 | 6 | 272.4092 MAC 00:0f:a3:39:dd:cd | 7 | 258.0750 Both MACs | 8 | 235.1960 Weighted with both MACs | 9 | 227.4423
By mirroring Nolan and Lang’s approach of preparing, exploring and reshaping the datasets, this study was able to successfully implement their kNN methodology for location detection, as well as, test additional models for comparison. Based on the results, the best course of action for the RTLS in the Mannheim University building would be to use a weighted model to achieve the most accurate results. In the future, the analysis and comparison of additional distance measures, such as Manhattan, could yield even better results. As the use of IPS and RTLS proliferate, studies such as these will become more important for efficient and accurate position location.
# load necessary libraries
#install.packages('fields', repos='http://cran.us.r-project.org')
#install.packages('tictoc', repos='http://cran.us.r-project.org')
library(codetools)
library(tictoc)
library(ggplot2)
library(lattice)
library(fields)
options(digits = 2)
# read in the entire file into a variable txt
# each line will be its own element
set.seed(15)
txt = readLines("offline.final.trace.txt")
#find all lines that begin with the '#' symbol
sum(substr(txt, 1, 1) == "#")
#total length of file (txt)
length(txt)
#Taking a look at the data
txt[1]
txt[2]
txt[3]
txt[4]
txt[5]
txt[6]
txt[7]
txt[8]
txt[9]
txt[10]
# split up line 4 by ";" the [[1]] is because it returns a list (of length 1) containg the objects
print(strsplit(txt[4], ";")[[1]])
# split line 4 multple break symbols
tokens = strsplit(txt[4], "[;=,]")[[1]]
# list the first 10 tokens
print(tokens[1:10])
# take a look at tokens in specific columns
tokens[c(2, 4, 6:8, 10)]
# take a look at all the columns except first 10
tokens[ - ( 1:10 ) ]
# create a matrix of the data in the 'end' columns (not the first 10)
tmp = matrix(tokens[ - (1:10) ], ncol = 4, byrow = TRUE)
# now create a psuedo-pivot table using the first 10 columns as repeat labels and the last columns as indifidual
# data points
# basically take 1 row and turn it into an 11 x 10 matrix.
mat = cbind(matrix(tokens[c(2, 4, 6:8, 10)], nrow = nrow(tmp),
ncol = 6, byrow = TRUE),
tmp)
dim(mat)
mat
# first stab at creating a function to parse lines. Note that this causes warnings a few cells down
processLine =
function(x)
{
tokens = strsplit(x, "[;=,]")[[1]]
tmp = matrix(tokens[ - (1:10) ], ncol = 4, byrow = TRUE)
cbind(matrix(tokens[c(2, 4, 6:8, 10)], nrow = nrow(tmp),
ncol = 6, byrow = TRUE), tmp)
}
#doing some spot checking on our function
tmp = lapply(txt[4:20], processLine)
sapply(tmp, nrow)
offline = as.data.frame(do.call("rbind", tmp))
dim(offline)
#more spot checking--look we get a warning. Let's deal with that warning
lines = txt[ substr(txt, 1, 1) != "#" ]
tmp = lapply(lines, processLine)
# put it all together to process a line as a function
# note that the if statement handles null values to remove warnings
processLine = function(x)
{
tokens = strsplit(x, "[;=,]")[[1]]
if (length(tokens) == 10)
return(NULL)
tmp = matrix(tokens[ - (1:10) ], , 4, byrow = TRUE)
cbind(matrix(tokens[c(2, 4, 6:8, 10)], nrow(tmp), 6,
byrow = TRUE), tmp)
}
# do the same thing for offlin
options(error = recover, warn = 1)
tmp = lapply(lines, processLine)
offline = as.data.frame(do.call("rbind", tmp),
stringsAsFactors = FALSE)
dim(offline)
length(lines)
# Label our data
# Because we stack the data, the names have to be entered twice
names(offline) = c("time", "scanMac", "posX", "posY", "posZ",
"orientation", "mac", "signal",
"channel", "type")
# notice the repeated names
numVars = c("time", "posX", "posY", "posZ",
"orientation", "signal")
# apply the variable names
offline[ numVars ] = lapply(offline[ numVars ], as.numeric)
offline = offline[ offline$type == "3", ]
offline = offline[ , "type" != names(offline) ]
dim(offline)
head(offline)
time correctly¶offline$rawTime = offline$time
offline$time = offline$time/1000
class(offline$time) = c("POSIXt", "POSIXct")
unlist(lapply(offline, class))
head(offline)
# do a summary exam of data
summary(offline[, numVars])
# summary of labels
summary(sapply(offline[ , c("mac", "channel", "scanMac")],
as.factor))
scanMac and posZ because thte only have one value.¶# eliminate unused variables
offline = offline[ , !(names(offline) %in% c("scanMac", "posZ"))]
head(offline)
The orientation, or angle to the sensor at which the measurement was taken, also needed some cleaning to group the angles into the appropriate 8 orientations. This was accomplished be rounding the angles to the closest of the 45 degree increments.
# get unique orientations
length(unique(offline$orientation))
# Create a plot of the orientation
# not my personal choice but it works
plot(ecdf(offline$orientation), pch = 19, cex = 0.3,
xlim = c(-5, 365), axes = FALSE,
xlab = "orientation", ylab = "Empirical CDF", main = "")
box()
axis(2)
axis(side = 1, at = seq(0, 360, by = 45))
# create a function that will round off to the nearest major angle
roundOrientation = function(angles) {
refs = seq(0, by = 45, length = 9)
q = sapply(angles, function(o) which.min(abs(o - refs)))
c(refs[1:8], 0)[q]
}
# we built it, now apply it
offline$angle = roundOrientation(offline$orientation)
# plot of rounded angles
with(offline, boxplot(orientation ~ angle,
xlab = "nearest 45 degree angle",
ylab="orientation"))
# find number of macids and channels
c(length(unique(offline$mac)), length(unique(offline$channel)))
# create a table for all the count of macid readings
# default is count
table(offline$mac)
# get names of the macs (top 7)
subMacs = names(sort(table(offline$mac), decreasing = TRUE))[1:7]
offline = offline[ offline$mac %in% subMacs, ]
macChannel = with(offline, table(mac, channel))
apply(macChannel, 1, function(x) sum(x > 0))
# channel and mac are 1:1 so drop channel
offline = offline[ , "channel" != names(offline)]
# Create a list containing a dataframe for each location
locDF = with(offline,
by(offline, list(posX, posY), function(x) x))
length(locDF)
# find the number of nulls
sum(sapply(locDF, is.null))
# drop the nulls
locDF = locDF[ !sapply(locDF, is.null) ]
# how much do we have left
length(locDF)
# get obs at each location and then store the data
locCounts = sapply(locDF, nrow)
locCounts = sapply(locDF,
function(df)
c(df[1, c("posX", "posY")], count = nrow(df)))
class(locCounts)
dim(locCounts)
# take a look at the first 8 only (bottom left of following plot)
locCounts[ , 1:8]
#Create graph
locCounts = t(locCounts)
plot(locCounts, type = "n", xlab = "", ylab = "")
text(locCounts, labels = locCounts[,3], cex = .8, srt = 45)
Now that the data has been cleaned, the analysis can move forward.
# re-read our data combining all the anlysis we did here there are 7 SEVEN mac ids
readData =
function(filename = 'offline.final.trace.txt',
subMacs = c("00:0f:a3:39:e1:c0", "00:0f:a3:39:dd:cd", "00:14:bf:b1:97:8a",
"00:14:bf:3b:c7:c6", "00:14:bf:b1:97:90", "00:14:bf:b1:97:8d",
"00:14:bf:b1:97:81"))
{
txt = readLines(filename)
lines = txt[ substr(txt, 1, 1) != "#" ]
tmp = lapply(lines, processLine)
offline = as.data.frame(do.call("rbind", tmp),
stringsAsFactors= FALSE)
names(offline) = c("time", "scanMac",
"posX", "posY", "posZ", "orientation",
"mac", "signal", "channel", "type")
# keep only signals from access points
offline = offline[ offline$type == "3", ]
# drop scanMac, posZ, channel, and type - no info in them
dropVars = c("scanMac", "posZ", "channel", "type")
offline = offline[ , !( names(offline) %in% dropVars ) ]
# drop more unwanted access points
offline = offline[ offline$mac %in% subMacs, ]
# convert numeric values
numVars = c("time", "posX", "posY", "orientation", "signal")
offline[ numVars ] = lapply(offline[ numVars ], as.numeric)
# convert time to POSIX
offline$rawTime = offline$time
offline$time = offline$time/1000
class(offline$time) = c("POSIXt", "POSIXct")
# round orientations to nearest 45
offline$angle = roundOrientation(offline$orientation)
return(offline)
}
# implement our function
offlineRedo = readData()
identical(offline, offlineRedo)
Signal strength is our response variable so it is important to visually inspect the signal for each MAC address.
# plot signal strength for each device - all 7
# poor mac id 00:0f:a3:39:dd:cd. he gets dropped as he did in Nolan and Lang's analysis
library(lattice)
bwplot(signal ~ factor(angle) | mac, data = offline,
subset = posX == 2 & posY == 12,
layout = c(1,7))
# plot signal strength for the two MAC address that will be swapped in the models
# this time 00:0f:a3:39:e1:c0 is dropped
library(lattice)
bwplot(signal ~ factor(angle) | mac, data = offline,
subset = posX == 2 & posY == 12
& mac == c("00:0f:a3:39:e1:c0", "00:0f:a3:39:dd:cd"),
layout = c(2,1))
# examine signal strength overall
summary(offline$signal)
# signal density plots for ALL seven mac ids at X=24, Y=4
densityplot( ~ signal | mac + factor(angle), data = offline,
subset = posX == 24 & posY == 4,
bw = 0.5, plot.points = FALSE)
# code to examine signal strength distribution for all 166 locations, angles, and 6 access points
#Setup all the data using the data summary
offline$posXY = paste(offline$posX, offline$posY, sep = "-")
byLocAngleAP = with(offline,
by(offline, list(posXY, angle, mac),
function(x) x))
signalSummary =
lapply(byLocAngleAP,
function(oneLoc) {
ans = oneLoc[1, ]
ans$medSignal = median(oneLoc$signal)
ans$avgSignal = mean(oneLoc$signal)
ans$num = length(oneLoc$signal)
ans$sdSignal = sd(oneLoc$signal)
ans$iqrSignal = IQR(oneLoc$signal)
ans
})
offlineSummary = do.call("rbind", signalSummary)
# look at the distribution of signal strength
breaks = seq(-90, -30, by = 5)
bwplot(sdSignal ~ cut(avgSignal, breaks = breaks),
data = offlineSummary,
subset = mac != "00:0f:a3:39:dd:cd",
xlab = "Mean Signal", ylab = "SD Signal")
oneAPAngle = subset(offlineSummary,
mac == subMacs[5] & angle == 0)
library(fields)
smoothSS = Tps(oneAPAngle[, c("posX","posY")],
oneAPAngle$avgSignal)
vizSmooth = predictSurface(smoothSS)
plot.surface(vizSmooth, type = "C")
points(oneAPAngle$posX, oneAPAngle$posY, pch=19, cex = 0.5)
surfaceSS = function(data, mac, angle = 45) {
require(fields)
oneAPAngle = data[ data$mac == mac & data$angle == angle, ]
smoothSS = Tps(oneAPAngle[, c("posX","posY")],
oneAPAngle$avgSignal)
vizSmooth = predictSurface(smoothSS)
plot.surface(vizSmooth, type = "C",
xlab = "", ylab = "", xaxt = "n", yaxt = "n")
points(oneAPAngle$posX, oneAPAngle$posY, pch=19, cex = 0.5)
}
# creating a backup of original
# drop 00:0f:a3:39:dd:cd
offlineSummary_backup = offlineSummary
offlineSummary <- offlineSummary[ offlineSummary$mac != "00:0f:a3:39:dd:cd", ]
# Look at the access points
# signal strength vs distance
AP = matrix( c( 7.5, 6.3, 2.5, -.8, 12.8, -2.8,
1, 14, 33.5, 9.3, 33.5, 2.8),
ncol = 2, byrow = TRUE,
dimnames = list(subMacs[ -2 ], c("x", "y") ))
AP
diffs = offlineSummary[ , c("posX", "posY")] -
AP[ offlineSummary$mac, ]
offlineSummary$dist = sqrt(diffs[ , 1]^2 + diffs[ , 2]^2)
xyplot(signal ~ dist | factor(mac) + factor(angle),
data = offlineSummary, pch = 19, cex = 0.3,
xlab ="distance")
# tally signal strength
macs = unique(offlineSummary$mac)
online = readData("online.final.trace.txt", subMacs = macs)
online$posXY = paste(online$posX, online$posY, sep = "-")
length(unique(online$posXY))
tabonlineXYA = table(online$posXY, online$angle)
tabonlineXYA[1:6, ]
#organize the data so that we have 6 columns of signal strengths, i.e., one for each of the access points.
keepVars = c("posXY", "posX","posY", "orientation", "angle")
byLoc = with(online,
by(online, list(posXY),
function(x) {
ans = x[1, keepVars]
avgSS = tapply(x$signal, x$mac, mean)
y = matrix(avgSS, nrow = 1, ncol = 6,
dimnames = list(ans$posXY, names(avgSS)))
cbind(ans, y)
}))
onlineSummary = do.call("rbind", byLoc)
# then confirm with dim() and names()
dim(onlineSummary)
names(onlineSummary)
# create data frame and functions to aggregate/select data with similar angles
m = 3; angleNewObs = 230
refs = seq(0, by = 45, length = 8)
nearestAngle = roundOrientation(angleNewObs)
if (m %% 2 == 1) {
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
} else {
m = m + 1
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
if (sign(angleNewObs - nearestAngle) > -1)
angles = angles[ -1 ]
else
angles = angles[ -m ]
}
angles = angles + nearestAngle
angles[angles < 0] = angles[ angles < 0 ] + 360
angles[angles > 360] = angles[ angles > 360 ] - 360
offlineSubset =
offlineSummary[ offlineSummary$angle %in% angles, ]
reshapeSS = function(data, varSignal = "signal",
keepVars = c("posXY", "posX","posY")) {
byLocation =
with(data, by(data, list(posXY),
function(x) {
ans = x[1, keepVars]
avgSS = tapply(x[ , varSignal ], x$mac, mean)
y = matrix(avgSS, nrow = 1, ncol = 6,
dimnames = list(ans$posXY,
names(avgSS)))
cbind(ans, y)
}))
newDataSS = do.call("rbind", byLocation)
return(newDataSS)
}
#Summarize and reshape offlineSubset
trainSS = reshapeSS(offlineSubset, varSignal = "avgSignal")
selectTrain = function(angleNewObs, signals = NULL, m = 1){
# m is the number of angles to keep between 1 and 5
refs = seq(0, by = 45, length = 8)
nearestAngle = roundOrientation(angleNewObs)
if (m %% 2 == 1)
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
else {
m = m + 1
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
if (sign(angleNewObs - nearestAngle) > -1)
angles = angles[ -1 ]
else
angles = angles[ -m ]
}
angles = angles + nearestAngle
angles[angles < 0] = angles[ angles < 0 ] + 360
angles[angles > 360] = angles[ angles > 360 ] - 360
angles = sort(angles)
offlineSubset = signals[ signals$angle %in% angles, ]
reshapeSS(offlineSubset, varSignal = "avgSignal")
}
train130 = selectTrain(130, offlineSummary, m = 3)
head(train130)
length(train130[[1]])
# here is the kNN function.
findNN = function(newSignal, trainSubset) {
diffs = apply(trainSubset[ , 4:9], 1,
function(x) x - newSignal)
dists = apply(diffs, 2, function(x) sqrt(sum(x^2)) )
closest = order(dists)
return(trainSubset[closest, 1:3 ])
}
# predict X-Y based on the the neasest k neighbors (default 3)
predXY = function(newSignals, newAngles, trainData,
numAngles = 1, k = 3){
closeXY = list(length = nrow(newSignals))
for (i in 1:nrow(newSignals)) {
trainSS = selectTrain(newAngles[i], trainData, m = numAngles)
closeXY[[i]] =
findNN(newSignal = as.numeric(newSignals[i, ]), trainSS)
}
estXY = lapply(closeXY,
function(x) sapply(x[ , 2:3],
function(x) mean(x[1:k])))
estXY = do.call("rbind", estXY)
return(estXY)
}
# nearest 3 neighbors
estXYk3 = predXY(newSignals = onlineSummary[ , 6:11],
newAngles = onlineSummary[ , 4],
offlineSummary, numAngles = 3, k = 3)
# nearest 1 neighbor
estXYk1 = predXY(newSignals = onlineSummary[ , 6:11],
newAngles = onlineSummary[ , 4],
offlineSummary, numAngles = 3, k = 1)
# predict and map errors
floorErrorMap = function(estXY, actualXY, trainPoints = NULL, AP = NULL){
plot(0, 0, xlim = c(0, 35), ylim = c(-3, 15), type = "n",
xlab = "", ylab = "", axes = FALSE)
box()
if ( !is.null(AP) ) points(AP, pch = 15)
if ( !is.null(trainPoints) )
points(trainPoints, pch = 19, col="grey", cex = 0.6)
points(x = actualXY[, 1], y = actualXY[, 2],
pch = 19, cex = 0.8 )
points(x = estXY[, 1], y = estXY[, 2],
pch = 8, cex = 0.8 )
segments(x0 = estXY[, 1], y0 = estXY[, 2],
x1 = actualXY[, 1], y1 = actualXY[ , 2],
lwd = 2, col = "red")
}
trainPoints = offlineSummary[ offlineSummary$angle == 0 &
offlineSummary$mac == "00:0f:a3:39:e1:c0" ,
c("posX", "posY")]
# 3 NN
floorErrorMap(estXYk3, onlineSummary[ , c("posX","posY")],
trainPoints = trainPoints, AP = AP)
# 1 NN
floorErrorMap(estXYk1, onlineSummary[ , c("posX","posY")],
trainPoints = trainPoints, AP = AP)
v non-overlapping subsets of equal size. Each subset a model is built with the data that are not in that subset and the predictive ability of the model is assessed using the subset that was left out. This process is repeated and assessed for each of the v folds and the prediciton errors are aggregated across the folds.¶set.seed(15) was used for all models.
set.seed(15)
options(error = recover, warn = 1)
calcError =
function(estXY, actualXY)
sum( rowSums( (estXY - actualXY)^2) )
actualXY = onlineSummary[ , c("posX", "posY")]
sapply(list(estXYk1, estXYk3), calcError, actualXY)
v = 11
permuteLocs = sample(unique(offlineSummary$posXY))
permuteLocs = matrix(permuteLocs, ncol = v,
nrow = floor(length(permuteLocs)/v))
onlineFold = subset(offlineSummary, posXY %in% permuteLocs[ , 1])
reshapeSS = function(data, varSignal = "signal",
keepVars = c("posXY", "posX","posY"),
sampleAngle = FALSE,
refs = seq(0, 315, by = 45)) {
byLocation =
with(data, by(data, list(posXY),
function(x) {
if (sampleAngle) {
x = x[x$angle == sample(refs, size = 1), ]}
ans = x[1, keepVars]
avgSS = tapply(x[ , varSignal ], x$mac, mean)
y = matrix(avgSS, nrow = 1, ncol = 6,
dimnames = list(ans$posXY,
names(avgSS)))
cbind(ans, y)
}))
newDataSS = do.call("rbind", byLocation)
return(newDataSS)
}
# up to 20 neighbors, 11 folds
# this one can run for a while (5-10 mins)
# this cell and the next are the same, but the angles change slightly!!
offline = offline[ offline$mac != "00:0f:a3:39:dd:cd", ]
keepVars = c("posXY", "posX","posY", "orientation", "angle")
set.seed(15)
onlineCVSummary = reshapeSS(offline, keepVars = keepVars,
sampleAngle = TRUE)
onlineFold = subset(onlineCVSummary,
posXY %in% permuteLocs[ , 1])
offlineFold = subset(offlineSummary,
posXY %in% permuteLocs[ , -1])
estFold = predXY(newSignals = onlineFold[ , 6:11],
newAngles = onlineFold[ , 4],
offlineFold, numAngles = 3, k = 3)
actualFold = onlineFold[ , c("posX", "posY")]
calcError(estFold, actualFold)
K = 20
err = rep(0, K)
for (j in 1:v) {
onlineFold = subset(onlineCVSummary,
posXY %in% permuteLocs[ , j])
offlineFold = subset(offlineSummary,
posXY %in% permuteLocs[ , -j])
actualFold = onlineFold[ , c("posX", "posY")]
for (k in 1:K) {
estFold = predXY(newSignals = onlineFold[ , 6:11],
newAngles = onlineFold[ , 4],
offlineFold, numAngles = 3, k = k)
err[k] = err[k] + calcError(estFold, actualFold)
}
}
plot(y = err, x = (1:K), type = "l", lwd= 2,
ylim = c(1000, 2100),
xlab = "Number of Neighbors",
ylab = "Sum of Square Errors")
title("SSE for k values 1 to 20 with MAC 00:0f:a3:39:e1:c0")
rmseMin = min(err)
kMin = which(err == rmseMin)[1]
rmseMin
kMin
segments(x0 = 0, x1 = kMin, y0 = rmseMin, col = gray(0.4),
lty = 2, lwd = 2)
segments(x0 = kMin, x1 = kMin, y0 = 975, y1 = rmseMin,
col = gray(0.4), lty = 2, lwd = 2)
text(x = kMin - 2, y = rmseMin + 40,
label = as.character(round(rmseMin)), col = grey(0.4))
#Estimated Error
estXYk6 = predXY(newSignals = onlineSummary[ , 6:11],
newAngles = onlineSummary[ , 4],
offlineSummary, numAngles = 3, k = 6)
rmseMin
calcError(estXYk6, actualXY)
predXY2 = function(newSignals, newAngles, trainData,
numAngles = 3, k = 3){
closeXY = list(length = nrow(newSignals))
for (i in 1:nrow(newSignals)) {
trainSS = selectTrain(newAngles[i], trainData, m = numAngles)
closeXY[[i]] = findNN(newSignal = as.numeric(newSignals[i, ]),
trainSS)
}
estXY = lapply(closeXY, function(x)
sapply(x[ , 2:3],
function(x) mean(x[1:k])))
estXY = do.call("rbind", estXY)
return(estXY)
}
#Reset data and omit the Mac 00:0f:a3:39:e1:c0
subMacs = c("00:0f:a3:39:e1:c0", "00:0f:a3:39:dd:cd", "00:14:bf:b1:97:8a",
"00:14:bf:3b:c7:c6", "00:14:bf:b1:97:90", "00:14:bf:b1:97:8d", "00:14:bf:b1:97:81")
unusedMac = "00:0f:a3:39:e1:c0"
Mac2 = subMacs[subMacs != unusedMac]
offlineRedo = readData(subMacs = Mac2)
summary(offlineRedo$signal)
offlineRedo$posXY = paste(offlineRedo$posX, offlineRedo$posY, sep = "-")
byLocAngleAP = with(offlineRedo,
by(offlineRedo, list(posXY, angle, mac),
function(x) x))
signalSummary =
lapply(byLocAngleAP,
function(oneLoc) {
ans = oneLoc[1, ]
ans$medSignal = median(oneLoc$signal)
ans$avgSignal = mean(oneLoc$signal)
ans$num = length(oneLoc$signal)
ans$sdSignal = sd(oneLoc$signal)
ans$iqrSignal = IQR(oneLoc$signal)
ans
})
offlineSummary = do.call("rbind", signalSummary)
breaks = seq(-90, -30, by = 5)
bwplot(sdSignal ~ cut(avgSignal, breaks = breaks),
data = offlineSummary,
subset = mac != unusedMac,
xlab = "Mean Signal", ylab = "SD Signal")
AP = matrix( c( 7.5, 6.3, 2.5, -.8, 12.8, -2.8,
1, 14, 33.5, 9.3, 33.5, 2.8),
ncol = 2, byrow = TRUE,
dimnames = list(subMacs[ -1 ], c("x", "y") ))
AP
diffs = offlineSummary[ , c("posX", "posY")] -
AP[ offlineSummary$mac, ]
offlineSummary$dist = sqrt(diffs[ , 1]^2 + diffs[ , 2]^2)
xyplot(signal ~ dist | factor(mac) + factor(angle),
data = offlineSummary, pch = 19, cex = 0.3,
xlab ="distance")
macs = unique(offlineSummary$mac)
online = readData("online.final.trace.txt", subMacs = macs)
online$posXY = paste(online$posX, online$posY, sep = "-")
length(unique(online$posXY))
tabonlineXYA = table(online$posXY, online$angle)
tabonlineXYA[1:6, ]
keepVars = c("posXY", "posX","posY", "orientation", "angle")
byLoc = with(online,
by(online, list(posXY),
function(x) {
ans = x[1, keepVars]
avgSS = tapply(x$signal, x$mac, mean)
y = matrix(avgSS, nrow = 1, ncol = 6,
dimnames = list(ans$posXY, names(avgSS)))
cbind(ans, y)
}))
onlineSummary = do.call("rbind", byLoc)
dim(onlineSummary)
names(onlineSummary)
m = 3; angleNewObs = 230
refs = seq(0, by = 45, length = 8)
nearestAngle = roundOrientation(angleNewObs)
if (m %% 2 == 1) {
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
} else {
m = m + 1
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
if (sign(angleNewObs - nearestAngle) > -1)
angles = angles[ -1 ]
else
angles = angles[ -m ]
}
angles = angles + nearestAngle
angles[angles < 0] = angles[ angles < 0 ] + 360
angles[angles > 360] = angles[ angles > 360 ] - 360
offlineSubset =
offlineSummary[ offlineSummary$angle %in% angles, ]
reshapeSS = function(data, varSignal = "signal",
keepVars = c("posXY", "posX","posY")) {
byLocation =
with(data, by(data, list(posXY),
function(x) {
ans = x[1, keepVars]
avgSS = tapply(x[ , varSignal ], x$mac, mean)
y = matrix(avgSS, nrow = 1, ncol = 6,
dimnames = list(ans$posXY,
names(avgSS)))
cbind(ans, y)
}))
newDataSS = do.call("rbind", byLocation)
return(newDataSS)
}
trainSS = reshapeSS(offlineSubset, varSignal = "avgSignal")
selectTrain = function(angleNewObs, signals = NULL, m = 1) {
# m is the number of angles to keep between 1 and 5
refs = seq(0, by = 45, length = 8)
nearestAngle = roundOrientation(angleNewObs)
if (m %% 2 == 1)
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
else {
m = m + 1
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
if (sign(angleNewObs - nearestAngle) > -1)
angles = angles[ -1 ]
else
angles = angles[ -m ]
}
angles = angles + nearestAngle
angles[angles < 0] = angles[ angles < 0 ] + 360
angles[angles > 360] = angles[ angles > 360 ] - 360
angles = sort(angles)
offlineSubset = signals[ signals$angle %in% angles, ]
reshapeSS(offlineSubset, varSignal = "avgSignal")
}
train130 = selectTrain(130, offlineSummary, m = 3)
head(train130)
length(train130[[1]])
findNN = function(newSignal, trainSubset) {
diffs = apply(trainSubset[ , 4:9], 1,
function(x) x - newSignal)
dists = apply(diffs, 2, function(x) sqrt(sum(x^2)) )
closest = order(dists)
return(trainSubset[closest, 1:3 ])
}
predXY = function(newSignals, newAngles, trainData, numAngles = 1, k = 3) {
closeXY = list(length = nrow(newSignals))
for (i in 1:nrow(newSignals)) {
trainSS = selectTrain(newAngles[i], trainData, m = numAngles)
closeXY[[i]] =
findNN(newSignal = as.numeric(newSignals[i, ]), trainSS)
}
estXY = lapply(closeXY,
function(x) sapply(x[ , 2:3],
function(x) mean(x[1:k])))
estXY = do.call("rbind", estXY)
return(estXY)
}
estXYk3 = predXY(newSignals = onlineSummary[ , 6:11],
newAngles = onlineSummary[ , 4],
offlineSummary, numAngles = 3, k = 3)
estXYk1 = predXY(newSignals = onlineSummary[ , 6:11],
newAngles = onlineSummary[ , 4],
offlineSummary, numAngles = 3, k = 1)
floorErrorMap = function(estXY, actualXY, trainPoints = NULL, AP = NULL) {
plot(0, 0, xlim = c(0, 35), ylim = c(-3, 15), type = "n",
xlab = "", ylab = "", axes = FALSE)
cap = "Figure x: Prediction vs. Real Location"
box()
if ( !is.null(AP) ) points(AP, pch = 15)
if ( !is.null(trainPoints) )
points(trainPoints, pch = 19, col="grey", cex = 0.6)
points(x = actualXY[, 1], y = actualXY[, 2],
pch = 19, cex = 0.8 )
points(x = estXY[, 1], y = estXY[, 2],
pch = 8, cex = 0.8 )
segments(x0 = estXY[, 1], y0 = estXY[, 2],
x1 = actualXY[, 1], y1 = actualXY[ , 2],
lwd = 2, col = "red")
}
trainPoints = offlineSummary[ offlineSummary$angle == 0 &
offlineSummary$mac == "00:0f:a3:39:dd:cd" ,
c("posX", "posY")]
# 3
floorErrorMap(estXYk3, onlineSummary[ , c("posX","posY")],
trainPoints = trainPoints, AP = AP)
# 1
floorErrorMap(estXYk1, onlineSummary[ , c("posX","posY")],
trainPoints = trainPoints, AP = AP)
v non-overlapping subsets of equal size. Each subset a model is built with the data that are not in that subset and the predictive ability of the model is assessed using the subset that was left out. This process is repeated and assessed for each of the v folds and the prediciton errors are aggregated across the folds.¶set.seed(15) was used for all models.
set.seed(15)
calcError =
function(estXY, actualXY)
sum( rowSums( (estXY - actualXY)^2) )
actualXY = onlineSummary[ , c("posX", "posY")]
sapply(list(estXYk1, estXYk3), calcError, actualXY)
set.seed(15)
v = 11
permuteLocs = sample(unique(offlineSummary$posXY))
permuteLocs = matrix(permuteLocs, ncol = v,
nrow = floor(length(permuteLocs)/v))
onlineFold = subset(offlineSummary, posXY %in% permuteLocs[ , 1])
reshapeSS = function(data, varSignal = "signal",
keepVars = c("posXY", "posX","posY"),
sampleAngle = FALSE,
refs = seq(0, 315, by = 45)) {
byLocation =
with(data, by(data, list(posXY),
function(x) {
if (sampleAngle) {
x = x[x$angle == sample(refs, size = 1), ]}
ans = x[1, keepVars]
avgSS = tapply(x[ , varSignal ], x$mac, mean)
y = matrix(avgSS, nrow = 1, ncol = 6,
dimnames = list(ans$posXY,
names(avgSS)))
cbind(ans, y)
}))
newDataSS = do.call("rbind", byLocation)
return(newDataSS)
}
offlineRedo = offlineRedo[ offlineRedo$mac != unusedMac, ]
keepVars = c("posXY", "posX","posY", "orientation", "angle")
onlineCVSummary = reshapeSS(offlineRedo, keepVars = keepVars,
sampleAngle = TRUE)
onlineFold = subset(onlineCVSummary,
posXY %in% permuteLocs[ , 1])
offlineFold = subset(offlineSummary,
posXY %in% permuteLocs[ , -1])
estFold = predXY(newSignals = onlineFold[ , 6:11],
newAngles = onlineFold[ , 4],
offlineFold, numAngles = 3, k = 3)
actualFold = onlineFold[ , c("posX", "posY")]
calcError(estFold, actualFold)
K = 20
err = rep(0, K)
for (j in 1:v) {
onlineFold = subset(onlineCVSummary,
posXY %in% permuteLocs[ , j])
offlineFold = subset(offlineSummary,
posXY %in% permuteLocs[ , -j])
actualFold = onlineFold[ , c("posX", "posY")]
for (k in 1:K) {
estFold = predXY(newSignals = onlineFold[ , 6:11],
newAngles = onlineFold[ , 4],
offlineFold, numAngles = 3, k = k)
err[k] = err[k] + calcError(estFold, actualFold)
}
}
plot(y = err, x = (1:K), type = "l", lwd= 2,
ylim = c(900, 2100),
xlab = "Number of Neighbors",
ylab = "Sum of Square Errors")
title("SSE for k values 1 to 20 with MAC 00:0f:a3:39:dd:cd")
rmseMin = min(err)
rmseMin
kMin = which(err == rmseMin)[1]
kMin
segments(x0 = 0, x1 = kMin, y0 = rmseMin, col = gray(0.4),
lty = 2, lwd = 2)
segments(x0 = kMin, x1 = kMin, y0 = 870, y1 = rmseMin,
col = grey(0.4), lty = 2, lwd = 2)
text(x = kMin - 2, y = rmseMin + 40,
label = as.character(round(rmseMin)), col = grey(0.4))
#Estimated Error
estXYk7 = predXY(newSignals = onlineSummary[ , 6:11],
newAngles = onlineSummary[ , 4],
offlineSummary, numAngles = 3, k = 7)
calcError(estXYk7, actualXY)
#redo data and include both Macs 00:0f:a3:39:e1:c0 and 00:0f:a3:39:dd:cd
subMacs = c("00:0f:a3:39:e1:c0", "00:0f:a3:39:dd:cd", "00:14:bf:b1:97:8a",
"00:14:bf:3b:c7:c6", "00:14:bf:b1:97:90", "00:14:bf:b1:97:8d",
"00:14:bf:b1:97:81")
offlineRedo = readData()
oldPar = par(mar = c(3.1, 3, 1, 1))
offlineRedo$posXY = paste(offlineRedo$posX, offlineRedo$posY, sep = "-")
byLocAngleAP = with(offlineRedo,
by(offlineRedo, list(posXY, angle, mac),
function(x) x))
signalSummary =
lapply(byLocAngleAP,
function(oneLoc) {
ans = oneLoc[1, ]
ans$medSignal = median(oneLoc$signal)
ans$avgSignal = mean(oneLoc$signal)
ans$num = length(oneLoc$signal)
ans$sdSignal = sd(oneLoc$signal)
ans$iqrSignal = IQR(oneLoc$signal)
ans
})
offlineSummary = do.call("rbind", signalSummary)
macs = unique(offlineSummary$mac)
macs
online = readData("online.final.trace.txt", subMacs = macs)
online$posXY = paste(online$posX, online$posY, sep = "-")
length(unique(online$posXY))
tabonlineXYA = table(online$posXY, online$angle)
tabonlineXYA[1:7, ]
keepVars = c("posXY", "posX","posY", "orientation", "angle")
byLoc = with(online,
by(online, list(posXY),
function(x) {
ans = x[1, keepVars]
avgSS = tapply(x$signal, x$mac, mean)
y = matrix(avgSS, nrow = 1, ncol = 7,
dimnames = list(ans$posXY, names(avgSS)))
cbind(ans, y)
}))
onlineSummary = do.call("rbind", byLoc)
names(onlineSummary)
m = 3; angleNewObs = 230
refs = seq(0, by = 45, length = 8)
nearestAngle = roundOrientation(angleNewObs)
if (m %% 2 == 1) {
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
} else {
m = m + 1
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
if (sign(angleNewObs - nearestAngle) > -1)
angles = angles[ -1 ]
else
angles = angles[ -m ]
}
angles = angles + nearestAngle
angles[angles < 0] = angles[ angles < 0 ] + 360
angles[angles > 360] = angles[ angles > 360 ] - 360
offlineSubset =
offlineSummary[ offlineSummary$angle %in% angles, ]
reshapeSS = function(data, varSignal = "signal",
keepVars = c("posXY", "posX","posY")) {
byLocation =
with(data, by(data, list(posXY),
function(x) {
ans = x[1, keepVars]
avgSS = tapply(x[ , varSignal ], x$mac, mean)
y = matrix(avgSS, nrow = 1, ncol = 7,
dimnames = list(ans$posXY,
names(avgSS)))
cbind(ans, y)
}))
newDataSS = do.call("rbind", byLocation)
return(newDataSS)
}
trainSS = reshapeSS(offlineSubset, varSignal = "avgSignal")
selectTrain = function(angleNewObs, signals = NULL, m = 1) {
# m is the number of angles to keep between 1 and 5
refs = seq(0, by = 45, length = 8)
nearestAngle = roundOrientation(angleNewObs)
if (m %% 2 == 1)
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
else {
m = m + 1
angles = seq(-45 * (m - 1) /2, 45 * (m - 1) /2, length = m)
if (sign(angleNewObs - nearestAngle) > -1)
angles = angles[ -1 ]
else
angles = angles[ -m ]
}
angles = angles + nearestAngle
angles[angles < 0] = angles[ angles < 0 ] + 360
angles[angles > 360] = angles[ angles > 360 ] - 360
angles = sort(angles)
offlineSubset = signals[ signals$angle %in% angles, ]
reshapeSS(offlineSubset, varSignal = "avgSignal")
}
train130 = selectTrain(130, offlineSummary, m = 3)
findNN = function(newSignal, trainSubset) {
diffs = apply(trainSubset[ , 4:9], 1,
function(x) x - newSignal)
dists = apply(diffs, 2, function(x) sqrt(sum(x^2)) )
closest = order(dists)
return(trainSubset[closest, 1:3 ])
}
predXY = function(newSignals, newAngles, trainData, numAngles = 1, k = 3) {
closeXY = list(length = nrow(newSignals))
for (i in 1:nrow(newSignals)) {
trainSS = selectTrain(newAngles[i], trainData, m = numAngles)
closeXY[[i]] =
findNN(newSignal = as.numeric(newSignals[i, ]), trainSS)
}
estXY = lapply(closeXY,
function(x) sapply(x[ , 2:3],
function(x) mean(x[1:k])))
estXY = do.call("rbind", estXY)
return(estXY)
}
estXYk3 = predXY(newSignals = onlineSummary[ , 6:11],
newAngles = onlineSummary[ , 4],
offlineSummary, numAngles = 3, k = 3)
estXYk1 = predXY(newSignals = onlineSummary[ , 6:11],
newAngles = onlineSummary[ , 4],
offlineSummary, numAngles = 3, k = 1)
calcError =
function(estXY, actualXY)
sum( rowSums( (estXY - actualXY)^2) )
actualXY = onlineSummary[ , c("posX", "posY")]
sapply(list(estXYk1, estXYk3), calcError, actualXY)
v non-overlapping subsets of equal size. Each subset a model is built with the data that are not in that subset and the predictive ability of the model is assessed using the subset that was left out. This process is repeated and assessed for each of the v folds and the prediciton errors are aggregated across the folds.¶set.seed(15)
v = 11
permuteLocs = sample(unique(offlineSummary$posXY))
permuteLocs = matrix(permuteLocs, ncol = v,
nrow = floor(length(permuteLocs)/v))
onlineFold = subset(offlineSummary, posXY %in% permuteLocs[ , 1])
reshapeSS = function(data, varSignal = "signal",
keepVars = c("posXY", "posX","posY"),
sampleAngle = FALSE,
refs = seq(0, 315, by = 45)) {
byLocation =
with(data, by(data, list(posXY),
function(x) {
if (sampleAngle) {
x = x[x$angle == sample(refs, size = 1), ]}
ans = x[1, keepVars]
avgSS = tapply(x[ , varSignal ], x$mac, mean)
y = matrix(avgSS, nrow = 1, ncol = 7,
dimnames = list(ans$posXY,
names(avgSS)))
cbind(ans, y)
}))
newDataSS = do.call("rbind", byLocation)
return(newDataSS)
}
set.seed(15)
keepVars = c("posXY", "posX","posY", "orientation", "angle")
onlineCVSummary = reshapeSS(offlineRedo, keepVars = keepVars,
sampleAngle = TRUE)
onlineFold = subset(onlineCVSummary,
posXY %in% permuteLocs[ , 1])
offlineFold = subset(offlineSummary,
posXY %in% permuteLocs[ , -1])
estFold = predXY(newSignals = onlineFold[ , 6:11],
newAngles = onlineFold[ , 4],
offlineFold, numAngles = 3, k = 3)
actualFold = onlineFold[ , c("posX", "posY")]
calcError(estFold, actualFold)
K = 20
err = rep(0, K)
for (j in 1:v) {
onlineFold = subset(onlineCVSummary,
posXY %in% permuteLocs[ , j])
offlineFold = subset(offlineSummary,
posXY %in% permuteLocs[ , -j])
actualFold = onlineFold[ , c("posX", "posY")]
for (k in 1:K) {
estFold = predXY(newSignals = onlineFold[ , 6:11],
newAngles = onlineFold[ , 4],
offlineFold, numAngles = 3, k = k)
err[k] = err[k] + calcError(estFold, actualFold)
}
}
plot(y = err, x = (1:K), type = "l", lwd= 2,
ylim = c(1300, 2100),
xlab = "Number of Neighbors",
ylab = "Sum of Square Errors")
title("SSE for k values 1 to 20 with both MACs")
rmseMin = min(err)
rmseMin
kMin = which(err == rmseMin)[1]
kMin
segments(x0 = 0, x1 = kMin, y0 = rmseMin, col = gray(0.4),
lty = 2, lwd = 2)
segments(x0 = kMin, x1 = kMin, y0 = 1280, y1 = rmseMin,
col = grey(0.4), lty = 2, lwd = 2)
text(x = kMin - 2, y = rmseMin + 40,
label = as.character(round(rmseMin)), col = grey(0.4))
#Estimated Error
estXYk8 = predXY(newSignals = onlineSummary[ , 6:11],
newAngles = onlineSummary[ , 4],
offlineSummary, numAngles = 3, k = 8)
calcError(estXYk8, actualXY)
A weighted kNN is designed to give more weight to the data points which are close by and less weight to the points which are farther away. Instead of averaging as in the previous models, this modified kNN applies weights to the X-Y positions of each k closest locations and takes the sum of those results. The predXY function is modified to calculate weight as inversely proportional to the distance from the test observation.
# changing the findNN function for weighted knn
findNN = function(newSignal, trainSubset) {
diffs = apply(trainSubset[ , 4:9], 1,
function(x) x - newSignal)
dists = apply(diffs, 2, function(x) sqrt(sum(x^2)) )
closest = order(dists)
return(list(trainSubset[closest, 1:3 ], dists[order(dists)])) #also now returns distances for finding nearest points
}
#modifying predXY function for weighted knn - Euclidean
#
set.seed(15)
predXY = function(newSignals, newAngles, trainData,
numAngles = 1, k = 3){
closeXY = list(length = nrow(newSignals))
distance = list(length = nrow(newSignals))
for (i in 1:nrow(newSignals)) {
trainSS = selectTrain(newAngles[i], trainData, m = numAngles)
wgtNN = findNN(newSignal = as.numeric(newSignals[i, ]), trainSS)
closeXY[[i]] = wgtNN[[1]]
distance[[i]] = wgtNN[[2]]
}
distWgt = list(length = length(distance))
for (i in 1:length(distance)){
distW = list(length = k)
for (j in 1:k){
distW[j] = (1/distance[[i]][j])/sum(1/distance[[i]][1:k])
}
distWgt[[i]] = distW
}
estXY1 = list(length = length(closeXY))
for(i in 1:length(closeXY)){
estXY1[[i]] = as.matrix(closeXY[[i]][1:k,2:3]) * unlist(distWgt[[i]])
}
estXY = lapply(estXY1,
function(x) apply(x, 2,
function(x) sum(x)))
estXY = do.call("rbind", estXY)
return(estXY)
}
set.seed(15)
K = 20
err = rep(0, K)
for (j in 1:v) {
onlineFold = subset(onlineCVSummary,
posXY %in% permuteLocs[ , j])
offlineFold = subset(offlineSummary,
posXY %in% permuteLocs[ , -j])
actualFold = onlineFold[ , c("posX", "posY")]
for (k in 1:K) {
estFold = predXY(newSignals = onlineFold[ , 6:11],
newAngles = onlineFold[ , 4],
offlineFold, numAngles = 3, k = k)
err[k] = err[k] + calcError(estFold, actualFold)
}
}
#plot
plot(y = err, x = (1:K), type = "l", lwd= 2,
ylim = c(1200, 2100),
xlab = "Number of Neighbors",
ylab = "Sum of Square Errors")
title("SSE for k values 1 to 20, weighted with both MACs")
rmseMin = min(err)
rmseMin
kMin = which(err == rmseMin)[1]
kMin
segments(x0 = 0, x1 = kMin, y0 = rmseMin, col = gray(0.4),
lty = 2, lwd = 2)
segments(x0 = kMin, x1 = kMin, y0 = 1170, y1 = rmseMin,
col = grey(0.4), lty = 2, lwd = 2)
text(x = kMin - 2, y = rmseMin + 40,
label = as.character(round(rmseMin)), col = grey(0.4))
#Estimated Error
set.seed(15)
estXYk9 = predXY(newSignals = onlineSummary[ , 6:11],
newAngles = onlineSummary[ , 4],
offlineSummary, numAngles = 3, k = kMin)
calcError(estXYk9, actualXY)