Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (2024)

Jingyu XiaoTsinghua Shenzhen International Graduate SchoolPeng Cheng LaboratoryShenzhenChinajy-xiao21@mails.tsinghua.edu.cn,Zhiyao XuXi’an University of Electronic Science and TechnologyXi’anChina21009200843@stu.xidian.edu.cn,Qingsong ZouTsinghua Shenzhen International Graduate SchoolPeng Cheng LaboratoryShenzhenChinazouqs21@mails.tsinghua.edu.cn,Qing LiPeng Cheng LaboratoryShenzhenChinaliq@pcl.ac.cn,Dan ZhaoPeng Cheng LaboratoryShenzhenChinazhaod01@pcl.ac.cn,Dong FangTencentShenzhenChinavictordfang@tencent.com,Ruoyu LiTsinghua Shenzhen International Graduate SchoolShenzhenChinaliry19@mails.tsinghua.edu.cn,Wenxin TangTsinghua Shenzhen International Graduate SchoolShenzhenChinavinsontang2126@gmail.com,Kang LiTsinghua Shenzhen International Graduate SchoolShenzhenChinalk26603878@gmail.com,Xudong ZuoTsinghua Shenzhen International Graduate SchoolShenzhenChinazuoxd20@mails.tsinghua.edu.cn,Penghui HuTsinghua UniversityBeijingChinahuph22@mails.tsinghua.edu.cn,Yong JiangTsinghua Shenzhen International Graduate SchoolPeng Cheng LaboratoryShenzhenChinajiangy@sz.tsinghua.edu.cn,Zixuan WengBeijing Jiaotong UniversityBeijingChina20722027@bjtu.edu.cnandMichael R.LyuThe Chinese University of Hong KongHong KongChinalyu@cse.cuhk.edu.hk

(2024)

Abstract.

Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effectively learn less frequent behaviors, consider temporal context, or account for the impact of noise in human behaviors. In this paper, we propose SmartGuard, an autoencoder-based unsupervised user behavior anomaly detection framework. First, we design a Loss-guided Dynamic Mask Strategy (LDMS) to encourage the model to learn less frequent behaviors, which are often overlooked during learning. Second, we propose a Three-level Time-aware Position Embedding (TTPE) to incorporate temporal information into positional embedding to detect temporal context anomaly. Third, we propose a Noise-aware Weighted Reconstruction Loss (NWRL) that assigns different weights for routine behaviors and noise behaviors to mitigate the interference of noise behaviors during inference. Comprehensive experiments on three datasets with ten types of anomaly behaviors demonstrates that SmartGuardconsistently outperforms state-of-the-art baselines and also offers highly interpretable results.

User Behavior Modeling, Anomaly Detection, Transformer.

^†^†copyright: acmlicensed^†^†journalyear: 2024^†^†doi: 10.1145/3637528.3671708^†^†conference: the 30th ACM SIGKDDConference on Knowledge Discovery and Data Mining ; July 14–18, 2024; Barcelona, Spain^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†ccs: Security and privacyHuman and societal aspects of security and privacy

1. Introduction

The rapid growth of IoT solutions has led to an unprecedented increase in smart devices within homes, expected to reach approximately 5 billion by 2025 (Lueth, 2018). However, the abnormal behaviors pose substantial security risks within smart homes. These abnormal behaviors usually originate from two primary sources. First, improper operations by users can cause abnormal behaviors, such as inadvertently activating the air conditioner’s cooling mode during winter or forgetting to close a water valve. Second, malicious attackers can exploit vulnerabilities within IoT devices and platforms, taking unauthorized control of these devices. For example, hackers can compromise IoT platforms, allowing them to disable security cameras and manipulate home automation systems, creating opportunities for burglary. These security concerns emphasize the urgency of robust behavioral modeling methods and enhanced security measures to safeguard smart home environments.

Deep learning has been employed across various domains to mine correlations between behaviors for modeling user behavior sequences (Tang etal., 2022, 2023; Li etal., 2024). DeepMove(Feng etal., 2018) leverages RNNs to model both long and short-term mobility patterns of users for human mobility prediction. To capture the dynamics of user’s behaviors, SASRec (Kang and McAuley, 2018) proposes a self-attention based model to achieve sequential recommendation. More recent efforts(Chen etal., 2019; Sun etal., 2019; deSouza PereiraMoreira etal., 2021) primarily focus on transformer-based models for their superior ability to handle sequential behavior data.

However, we cannot borrow the above models to directly apply them in our scenarios, because of the following three challenges of user behavior modeling in smart homes.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (1)

First, the occurrence frequencies of different user behaviors may be imbalanced, leading to challenges in learning the semantics of these behaviors. This user behavior imbalance can be attributed to individuals’ living habits. For example, cook-related behaviors (e.g., using microwave and oven) of office workers may be infrequent, because they dine at their workplace on weekdays and only cook on weekends. On the other hand, some daily behaviors like turning on lights and watching TV of the same users can be more frequent. Behavior imbalance complicates the learning process for models: some behaviors, which occur frequently in similar contexts, can be easily inferred, while others that rarely appear or manifest in diverse contexts can be more challenging to infer. We train an autoencoder model on AN dataset (shown in Table1), record the occurrences and reconstruction loss of different behaviors. As shown in Figure1, with the number of occurrences of behavior decreases, the reconstruction loss tends to increase.

Second, temporal context, e.g., the timing and duration of user behaviors, plays a significant role in abnormal behavior detection but is overlooked by existing solutions. For example, turning on the cooling mode of the air conditioner in winter is abnormal, but is normal in summer. Showering for 30-40 minutes is normal, but exceeding 2 hour suggests a user accident. Ignoring timing information hinders the identification of abnormal behavior patterns. As shown in Figure2, sequence 1 represents a user’s normal laundry-related behaviors. Sequences 2 and 3 follow the same order as sequence 1. However, in sequence 2, the water valve were opens at 2 o’clock in the night. In sequence 3, the duration between opening and closing the water valve is excessively long. Therefore these two sequences should be identified as abnormal behaviors possibly conducted by attackers intending to induce water leakage.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (2)

Third, arbitrary intents and passive device actions can cause noise behaviors in user behavior sequences, which interfere model’s inference. Figure3 shows noise behaviors in a behavior sequence related to a user’s behaviors after getting up. The user do some routine behaviors like “turn on the bed light”, “open the curtains”, “switch off the air conditioner”, “open the refrigerator”, “close the refrigerator” and “switch on the oven”. However, there are also some sporadic actions which are not tightly related to the behavior sequence, including 1) active behaviors, e.g., suddenly deciding to “turn on the network audio” to listen to music; 2) passive behavior from devices, e.g., the “self-refresh” of the air purifier. These noise behaviors may also occur in other sequences with varying patterns.These noise behaviors introduces uncertainty that can disrupt the learning process and lead the model to misclassify sequences containing noise behaviors as anomalies. Therefore, treating noise behaviors on par with normal behaviors could potentially harm the model’s performance, leading to increased losses.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (3)

In this paper, we propose SmartGuardto solve above challenges. SmartGuardis an autoencoder-based architecture, which learns to reconstruct normal behavior sequences during training and identify the behavior sequences with high reconstruction loss as anomaly. Firstly, we devise a Loss-guided Dynamic Mask Strategy (LDMS) to promote the model’s learning of infrequent hard-to-learn behaviors. Secondly, we introduce a Three-level Time-aware Position Embedding (TTPE) to integrate temporal information into positional embedding for detecting temporal context anomalies. Lastly, we propose a Noise-aware Weighted Reconstruction Loss (NWRL) to assign distinct weights to routine behaviors and noise behaviors, thereby mitigating the impact of noise behaviors. Our codes are released to the GitHub ¹¹1https://github.com/xjywhu/SmartGuard. Our contributions can be summarized as follows:

•
We design LDMS to mask the behaviors with high reconstruction loss, thus encouraging the model to learn these hard-to-learn behaviors.
•
We propose TTPE to consider the order-level, moment-level and duration-level information of user behaviors meanwhile.
•
We design NWRL to treat noisy behaviors and normal behaviors differently for learning robust behavior representations.

2. Related Work

2.1. User Behavior Modeling in Smart Homes

Some works propose to model user behavior (i.e., user device interaction) based on deep learning.(Gu etal., 2020) uses event transition graph to model IoT context and detect anomalies. In(Wang etal., 2023), authors build device interaction graph to learn the device state transition relationship caused by user actions. (Fu etal., 2021) detects anomalies through correlational analysis of device actions and physical environment. (Srinivasan etal., 2008) infers user behavior through readings from various sensors installed in the user’s home.IoTBeholder (Zou etal., 2023) utilizes attention-based LSTM to predict the user behavior from history sequences. SmartSense (Jeon etal., 2022) leverages query-based transformer to model contextual information of user behavior sequences. DeepUDI (Xiao etal., 2023a) and SmartUDI (Xiao etal., 2023b) use relational gated graph neural networks, capsule neural networks and contrastive learning to model users’ routines, intents and multi-level periodicities. However, above methods aim at predicting next behavior of user accurately, they can not be applied into abnormal behavior detection.

2.2. Attacks and Defenses in Smart Homes

An increasing number of attack vectors have been identified in smart homes in recent years. In addition to cyber attacks, it is also a concerning factor that IoT devices are often close association with the user’s physical environment and they have the ability to alter physical environment. In this context, the automation introduces more serious security risks. Prior research has revealed that adversaries can leak personal information, and gain physical access to the home(Jia etal., 2017; Celik etal., 2018). In(Fernandes etal., 2016), spoof attack is employed to exploit automation rules and trigger unexpected device actions. (Chi etal., 2022; Fu etal., 2022) apply delay-based attacks to disrupt cross-platform IoT information exchanges, resulting in unexpected interactions, rendering IoT devices and smart homes in an insecure state. This series of attacks aim at causing smart home devices to exhibit expected actions, thereby posing significant security threats. Therefore, designing an effective mechanism to detect such attacks is necessary. 6thSense (Sikder etal., 2017) utilizes Naive Bayes to detect malicious behavior associated with sensors in smart homes. Aegis (Sikder etal., 2019) utilizes a Markov Chain to detect malicious behaviors. ARGUS (Rieger etal., 2023) designed an Autoencoder based on Gated Recurrent Units (GRU) to detect infiltration attacks. However, these methods ignore the behavior imbalance, temporal information and noise behaviors.

3. Problem Formulation

Let $\mathcal{D}$ denote a set of devices, $\mathcal{C}$ denote a set of device controls and $\mathcal{S}$ denote a set of behavior sequences.

Definition 1.

(Behavior) A behavior $b=(t,d,c)$ , is a 3-tuple consisting of time stamp $t$ , device $d\in\mathcal{D}$ and device control $c\in\mathcal{C}$ .

For example, behavior b = (2022-08-04 18:30, air conditioner, air conditioner:switch on) describes the behavior “swich on the air conditioner” at 18:30 on 2022-08-04.

Definition 2.

(Behavior Sequence) A behavior sequence $s=[b_{1},b_{2},\cdots,b_{n}]\in\mathcal{S}$ is a list of behaviors,ordered by their timestamps, and $n$ is the length of $s$ .

We define the User Behavior Sequence (UBS) anomaly detection problem as follows.

Problem 1.

(UBSAnomaly Detection) Given a behavior sequence $s$ , determine whether $s$ is an anomaly event or a normal event.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (4)

In this paper, we consider four types of abnormal behaviors:

•
(SD) Single Device context anomaly (Figure4(a)), defined as unusual high frequency operations on a single device, e.g., frequently switching light on and off to break the light.
•
(MD) Multiple Devices context anomaly (Figure4(b)), defined as the simultaneous occurrence of behaviors on multiple devices that are not supposed to occur in the same sequence, e.g., turning off the camera and opening the window for burglary.
•
(DM) Device control-Moment context anomaly (Figure4(c)), defined as a device control occurring at an inappropriate time, e.g., turning on the cooling mode of an air conditioner in winter, potentially causing the user to catch a cold.
•
(DD) Device control-Duration context anomaly (Figure4(d)), defined as device controls that last for an inappropriate duration, e.g., leaving a water valve open for 3 hours for flood attack.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (5)

4. Methodology

4.1. Solution Overview

To achieve accurate user behavior sequence anomaly detection in smart homes, we propose SmartGuard, depicted in Figure5. The workflow of SmartGuardcan be summarized as follows. During training, the Loss-guided Dynamic Mask Strategy (§4.2) is initially employed to mask hard-to-learn behaviors based on the loss vector $\mathcal{L}_{\text{vec}}$ from the previous epoch. Subsequently, the Three-level Time-aware Positional Encoder (§4.3.1) is applied to capture order-level, moment-level, and duration-level temporal information of the behaviors, producing the positional embedding $\overline{PE}$ . This embedding is then added to the device control embedding $h_{c}$ to form the behavior embedding $\mathbf{h}$ . Finally, $\mathbf{h}$ is fed into an $L$ -layer attention-based encoder and decoder to extract contextual information for reconstructing the source sequence. During the inference phase, the Noise-aware Weighted Reconstruction Loss Noise-aware Weighted Reconstruction Loss (§4.4) is utilized to assign different weights to various behaviors, determined by the loss vector from the training dataset, resulting in the final reconstruction loss $score$ . If the $score$ surpasses the threshold $th$ , SmartGuardtriggers an alarm.

4.2. Loss-guided Dynamic Mask Strategy

Autoencoders (Zhai etal., 2018), which take complete data instances as input and target to reconstruct the entire input data, are widely used in anomaly detection. Different from traditional autoencoders, masked autoencoders randomly mask a portion of input data, encoding the partially-masked data and aiming to reconstruct the masked tokens. By introducing a more meaningful self-supervised task, masked autoencoders have recently excelled in images (He etal., 2022) learning. However, such reconstruction tasks without mask and with random mask are sub-optimal in our scenarios because they do not emphasize the learning of hard-to-learn behaviors that occur rarely.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (6)

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (7)

We conduct experiments to verify the performance of autoencoders trained with three mask options:1) w/o mask: no mask strategy is used, the objective function is to reconstruct the input; 2) random mask: masking behaviors at every epoch randomly to reconstruct the masked behaviors; 3) top- $k$ loss mask: masking top $k$ behaviors with higher reconstruction loss to reconstruct the masked behaviors. We set mask ratio as 20% for the latter two.Figure6 shows the changing trends of the reconstruction loss and it’s variance of different behavior during training on SP dataset (described in Table1).First, as shown in Figure6(a), the model without mask shows the fastest convergence trend, whereas the loss of the model with mask fluctuates. Model without mask can simultaneously learn all behaviors, facilitating rapid convergence. In contrast, the mask strategy only encourages the model to focus on learning masked behaviors, which may hinder initial-stage convergence. Second, the model with top- $k$ loss mask strategy shows lowest variance towards the end of training as shown in Figure6(b), because the top- $k$ loss mask strategy effectively encourages the model to learn hard-to-learn behaviors (i.e., the behaviors with high reconstruction loss), thereby reducing the variance of behavior reconstruction losses.

In this paper, we design a Loss-guided Dynamic Mask Strategy. Intuitively, at the beginning of training, we encourage the model to learn the relatively easy task to accelerate convergence, i.e., behavior sequence reconstruction without mask. After training $N$ epochs without mask, we adopt the top- $k$ loss mask strategy to encourage the model to learn the masked behaviors with high reconstruction loss. We continuously track the model’s reconstruction loss of different behaviors by updating a loss vector in each epoch to guide the mask strategy in the next epoch. In epoch $ep$ , the loss vector $\mathcal{L}^{ep}_{\text{vec}}$ is calculated as:

(1)

\mathcal{L}^{ep}_{\text{vec}}=\left\{\ell_{1},\ell_{2},\ldots,\ell_{c},\ldots,%\ell_{|\mathcal{C}|}\right\},c\in\mathcal{C},

(2)

\ell_{c}=\frac{1}{n_{c}}\sum_{i=1}^{n_{c}}\ell^{i}_{c},

where $n_{c}$ is the number of times the device control $c$ occurs in epoch $ep$ , and $\ell_{c}$ is the average reconstruction loss of the device control $c$ .In epoch $ep+1$ , the mask vector for behavior sequence sample $s=[b_{1},b_{2},\cdots,b_{n}]$ is obtained as:

(3)

mask(i)=\left\{\begin{array}[]{rl}1,&\text{if}\ i\in sorted\_index[:\lfloor n%\cdot r\rfloor]\\0,&\text{if}\ i\notin sorted\_index[:\lfloor n\cdot r\rfloor]\end{array}\right%.,i\in[1,n],

(4)

sorted\_index=\operatorname{argsort}\left(\left\{\mathcal{L}^{ep}_{\text{vec}}%(b_{1}),\mathcal{L}^{ep}_{\text{vec}}(b_{2}),....\mathcal{L}^{ep}_{\text{vec}}%(b_{n})\right\}\right),

where $\operatorname{argsort}$ gets the sorted index with descending order of the elements in the vector, $r\in[0,1]$ is the mask ratio, and $n$ is the length of behavior sequence $s$ .

4.3. Autoencoder with Temporal Information

4.3.1. Three-level Time-aware Positional Encoder

The temporal information in user behavior sequence data primarily resides in the timing of control behaviors, which can be examined from two perspectives: the absolute timing of each individual control behavior, and the relative timing gap between control actions on the same device. On the one hand, the relative timing gap between control actions on the same device reflects the duration the device is in some specific state and the operation frequency of the user. On the one hand, user behaviors are usually time-regulated, and the functionalities a device carried can determine the absolute timing users operate on it. For example, users usually operate lights in the morning and evening, and operate the microwave and the oven at meal time. Since certain operations frequently take place nearly simultaneously, we will also consider the order of behaviors to provide a more comprehensive characterization of behaviors that occur successively. Therefore, we incorporate three types of temporal information into our model. (1) Order-level temporal information: we use integer $order\in[0,n-1]$ to denotes the order-level information of the behavior, $n$ is the length of behaviors sequence $s$ . (2) Moment-level temporal information: we represent the moment as hour of day $hour$ and day of week $day$ based on behavior’s timestamp. (3) Duration-level temporal information: the duration for behavior $b$ is calculated as:

(5)

duration_{b}=t(b)-t(b_{next})

where $b$ and $b_{next}$ are the behaviors on the same device and $b_{next}$ is the first behavior after $b$ that operates on the device, and $t(b)$ represents the occurrence time of behavior $b$ .

Then, the positional embedding is calculated as:

(6)

\begin{split}\overline{PE}&=w_{order}\cdot PE(pos)+w_{hour}\cdot PE(hour)+\\&w_{day}\cdot PE(day)+w_{dur}\cdot PE(duration),\end{split}

where $w_{order}$ , $w_{hour}$ , $w_{day}$ and $w_{dur}$ are learnable weights. $PE(\cdot)$ is a positional encoding function (Vaswani etal., 2017) defined as:

(7)		$\displaystyle PE_{(\cdot,2i)}$	$\displaystyle=\sin\left(\cdot/10000^{2i/d}\right),$
(7)		$\displaystyle PE_{(\cdot,2i+1)}$	$\displaystyle=\cos\left(\cdot/10000^{2i/d}\right),$

where $i$ denotes the $i$ -th dimension of the positional embedding, $d$ is the dimension of temporal embedding.

To learn the representation $h_{c}$ for device control $c\in\mathcal{C}$ , we first encode device control $c$ into a low-dimensional latent space through device control encoder, i.e., an embedding layer. Finally, we add positional embedding to the device control embedding as following to get the behavior embedding:

(8)

\mathbf{h}=\overline{PE}+h_{c}.

4.3.2. Sequence Encoder

To learn the sequence embedding, we employ transformer encoder (Vaswani etal., 2017) consisting of multi-head attention layer, residual connections and position-wise feed-forward network (FNN). Given an input behavior representation $\mathbf{h}$ , the self-attention layer can effectively mine global semantic information of behavior sequence context by learning query $\mathrm{Q}$ , key $\mathrm{~{}K}$ and value $\mathrm{~{}V}$ matrices of different variables, which are calculated as:

(9)

\mathrm{Q}=\mathbf{h}\mathrm{W}^{Q},\mathrm{~{}K}=\mathbf{h}\mathrm{W}^{K},%\mathrm{~{}V}=\mathbf{h}\mathrm{W}^{V},

where $\mathrm{W}^{Q},\mathrm{W}^{K},\mathrm{W}^{V}$ are the transformation matrices. The attention score $\mathbf{A}$ is computed by:

(10)

\mathbf{A}=\operatorname{Attention}(Q,K,V)=\operatorname{softmax}\left(\frac{%QK^{T}}{\sqrt{d_{k}}}\right)V,

where $d_{k}$ is the dimension of $K$ . Multi-head attention is applied to improve the stability of the learning process and achieve higher performance. Then, the position-wise feed-forward network (FNN) and residual connections are adopted:

(11)

\mathbf{\overline{h}}=\operatorname{Trans}(\mathbf{h})=\mathbf{h}+\mathbf{Ah}+%\mathrm{FNN}(\mathbf{h}+\mathbf{Ah}),

where $\operatorname{Trans}(\cdot)$ is the transformer and $\operatorname{FNN}(\cdot)$ is a 2-layered position-wise feed-forward network (Vaswani etal., 2017).

4.3.3. Sequence Decoder

The decoder has the same architecture as the encoder. We input $\mathbf{\overline{h}}$ into the decoder to reconstruct the entire sequence, probabilities of target device controls are calculated as:

(12)

\mathbf{\widetilde{h_{i}}}=\operatorname{decoder}\left(\mathbf{\overline{h_{i}%}}\right),

(13)

\hat{\mathbf{y_{i}}}=\operatorname{softmax}\left(\mathbf{W}_{h}\mathbf{%\widetilde{h_{i}}}\right),

where $\hat{\mathbf{y_{i}}}$ is the predicted probabilities of the $i$ -th device control and $\mathbf{W}_{h}\in\mathbb{R}^{|\mathcal{C}|\times len(h)}$ is the learnable transformation matrix, $|\mathcal{C}|$ is the number of device controls, and $len(h)$ is the length of $h$ .

4.3.4. Objective Function

We optimize the model to minimize the average reconstruction loss measured by cross-entropy loss:

(14)

\mathcal{L}_{rec}=\left\{\begin{array}[]{ll}-\frac{1}{|\mathcal{S}|}\sum_{s\in%\mathcal{S}}\sum^{|s|}_{i=1}\mathbf{y}_{i}\log\hat{\mathbf{y}}_{i},&\text{if}%\ epoch<=N\\-\frac{1}{|\mathcal{S}|}\sum_{s\in\mathcal{S}}\sum^{|s|}_{i=1}mask_{s}(i)%\mathbf{y}_{i}\log\hat{\mathbf{y}}_{i},&\text{if}\ epoch>N\end{array}\right.,

where $\mathcal{S}$ is the behavior sequences set, $|s|$ is the length of sequence $s$ , $\mathbf{y}_{i}$ is the one-hot vector of the ground-truth label, $mask_{s}$ is the mask vector for sequence $s$ , and $N$ is the training steps w/o mask.

4.4. Noise-aware Weighted Reconstruction Loss

Although LDMS encourages the model to focus on learning behaviors with high reconstruction losses, it remains challenging to reconstruct noise behaviors due to their inherent uncertainty. The significant reconstruction loss associated with noise behaviors can overshadow other aspects during anomaly detection, potentially leading to the misclassification of normal sequences containing noise behaviors as anomalies.

To eliminate the interference of noise behaviors, we propose a Noise-aware Weighted Reconstruction Loss as the anomaly score. We can get the final loss vector after training:

(15)

\mathcal{L}_{\text{vec}}=\left\{\ell_{1},\ell_{2},\ldots,\ell_{c},\ldots,\ell_%{|\mathcal{C}|}\right\},c\in\mathcal{C},

which is converted into the corresponding weight vector:

(16)

\mathcal{W}_{vec}=\left\{w_{1},w_{2},\ldots,w_{c},\ldots,w_{|C|}\right\},w_{k}%\in(0,1),

by the following equation:

(17)

\mathcal{W}_{vec}=\operatorname{sigmoid}\left(-\frac{\operatorname{relu}(%\mathcal{L}_{vec}-\mathbb{E}\left(\mathcal{L}_{vec}\right))}{\sqrt{%\operatorname{Var}\left(\mathcal{L}_{vec}\right)}\cdot\mu}\right),

where $\mu$ is a coefficient to adjust the input for sigmoid function, $\mathbb{E}$ and $\operatorname{Var}$ calculate the expectation and variance of the loss distribution, respectively. Relu function ensures that behaviors with losses less than $\mathbb{E}\left(\mathcal{L}_{vec}\right)$ (routine behaviors) are equally weighted. The sigmoid function assigns small weights to behaviors with high losses (potential noise behaviors). For each behavior $b_{i}$ in a sequence $s=\{b_{1},b_{2},\cdots,b_{n}\}$ , we compute the weight $p_{i}$ as follows:

(18)

p_{i}=\frac{\mathcal{W}_{vec}(b_{i})}{\sum_{j=1}^{n}\mathcal{W}_{vec}(b_{j})}.

Then, we can get the anomaly score of $s$ as the weighted sum of the reconstruction losses of behaviors in $s$ :

(19)

score(s)=-\frac{1}{|s|}\sum^{|s|}_{i=1}p_{i}\mathbf{y}_{i}\log\hat{\mathbf{y}}%_{i}.

SmartGuardcan inference whether a behavior sequence $s_{i}$ is normal or abnormal based on the anomaly score:

(20)

s_{i}=\left\{\begin{array}[]{ll}\text{Normal},&\text{if}\ score(s_{i})<th\\\text{Abnormal},&\text{if}\ score(s_{i})>th\end{array}\right.,

where $th$ is the anomaly threshold. We take the 95% quantile of the reconstruction loss distribution on the validation set as $th$ .

5. Experiments

In this section, we conduct comprehensive experiments on three real-world datasets to answer the following key questions:

•
RQ1. Performance. Compared with other methods, does SmartGuardachieve better anomaly detection performance?
•
RQ2. Ablation study. How will model performance change if we remove key modules of SmartGuard?
•
RQ3. Parameter study. How do key parameters affect the performance of SmartGuard?
•
RQ4. Interpretability study. Can SmartGuardgive reasonable explanations for the detection results?
•
RQ5. Embedding space analysis. Does SmartGuardsuccessfully learn useful embeddings of behaviors and correct correlations between device controls and time?

5.1. Experimental Setup

5.1.1. Datasets

We train SmartGuardon three real-world datasets consisting of only normal samples, two (FR/SP) from public datasets²²2https://github.com/snudatalab/SmartSense and one anonymous dataset (AN) collected by ourselves. The datasets description is shown in Table 1. All datasets are split into training, validation and testing sets with a ratio of 7:1:2. To evaluate the performance of SmartGuard, we construct ten categories of abnormal behaviors as shown in Table2 and insert them among normal behaviors for simulating real anomaly scenarios.

Name	Time period (Y-M-D)	Sizes	# Devices	# Device controls
AN	2022-07-31 $\sim$ 2022-08-31	1,765	36	141
FR	2022-02-27 $\sim$ 2022-03-25	4,423	33	222
SP	2022-02-28 $\sim$ 2022-03-30	15,665	34	234

Anomaly

Type

Anomaly

Type

Light flickering

Open the airconditioner’s

cool mode in winter

Camera flickering

Open the window

at midnight

TV flickering

Open the watervalve

at midnight

Open the window

while smartlock lock

Shower for long time

Close the camera

while smartlock lock

Microwave runs

for long time

5.1.2. Baselines

We compare SmartGuardwith existing general unsupervised anomaly detection methods and unsupervised anomaly behaviors detection methods in smart homes:

•
Local Outiler Factor (LOF) (Cheng etal., 2019) calculates the density ratio between each sample and its neighbors to detect anomaly.
•
Isolation Forest (IF) (Liu etal., 2008) builds binary trees, and instances with short average path lengths are detected as anomaly.
•
6thSense (Sikder etal., 2017) utilizes Naive Bayes to detect malicious behavior associated with sensors in smart homes.
•
Aegis(Sikder etal., 2019) utilizes a Markov Chain-based machine learning technique to detect malicious behavior in smart homes.
•
OCSVM (Amraoui and Zouari, 2021) build a One-Class Support Vector Machine model to prevent malicious control of smart home systems.
•
Autoencoder (Chen etal., 2018) learns to reconstruct normal data and then uses the reconstruction error to determine whether the input data is abnormal.
•
ARGUS(Rieger etal., 2023) designed an Autoencoder based on Gated Recurrent Units (GRU) to detect IoT infiltration attacks.
•
TransformerAutoencoder (TransAE) (Vaswani etal., 2017) uses self-attention mechanism in the encoder and decoder to achieve context-aware anomaly detection.

5.1.3. Evaluation metrics

We use common metrics such as False Positive rate, False Negative Rate, Recall, and F1-Score to evaluate the performance of SmartGuard.

5.1.4. Complexity analysis

Suppose the embedding size is $em$ , and the behavior sequence length is $n$ . The computational complexity of SmartGuardis mainly due to the self-attention layer and the feed-forward network, which is $O(n^{2}d+nd^{2})$ . The dominant term is typically $O(n^{2}d)$ from the self-attention layer. SmartGuardonly takes 0.0145s, which shows that it can detect abnormal behaviors in real time.

Dataset	Type	Metric	LOF	IF	6thSense	Aegis	OCSVM	Autoencoder	ARGUS	TransAE	SmartGuard
AN	SD	Recall	0.0275	0.4105	0.4680	0.2902	0.5399	0.9832	0.9858	0.9882	0.9986
	SD	F1 Score	0.0519	0.4972	0.5196	0.3672	0.5862	0.9915	0.9928	0.9908	0.9967
	MD	Recall	0.0745	0.4039	0.5941	0.4431	0.6039	0.5156	0.5666	0.6216	0.9745
	MD	F1 Score	0.1357	0.4824	0.6215	0.4718	0.6553	0.6692	0.7135	0.7557	0.9832
	DM	Recall	0.0784	0.4373	0.3745	0.5647	0.3510	0.5196	0.5313	0.6078	0.9961
	DM	F1 Score	0.1418	0.5174	0.4817	0.5647	0.4257	0.6725	0.6843	0.7452	0.9941
	DD	Recall	0.0961	0.3451	0.1980	0.7804	0.4961	0.5137	0.5117	0.5294	0.9980
	DD	F1 Score	0.1713	0.4282	0.3108	0.7044	0.5967	0.6675	0.6675	0.6818	0.9951
FR	SD	Recall	0.3541	0.2444	0.2907	0.3915	0.5918	0.9816	0.9796	0.9864	0.9979
	SD	F1 Score	0.4804	0.3655	0.4167	0.4542	0.6612	0.9907	0.9897	0.9921	0.9932
	MD	Recall	0.4275	0.2980	0.6567	0.7098	0.4384	0.9726	0.9875	0.9782	0.9984
	MD	F1 Score	0.5192	0.4230	0.6092	0.3827	0.5534	0.9861	0.9783	0.9874	0.9907
	DM	Recall	0.3825	0.3191	0.5461	0.7619	0.3920	0.4952	0.6676	0.6529	0.9985
	DM	F1 Score	0.4830	0.4494	0.6124	0.6822	0.4940	0.6508	0.7867	0.7779	0.9912
	DD	Recall	0.3572	0.1850	0.5358	0.9743	0.6267	0.4397	0.7329	0.6098	0.9981
	DD	F1 Score	0.4375	0.2806	0.5880	0.4481	0.6422	0.6013	0.8382	0.7479	0.9921
SP	SD	Recall	0.2197	0.2643	0.6979	0.1618	0.5332	0.9824	0.9795	0.9172	0.9862
	SD	F1 Score	0.3350	0.3857	0.7248	0.2164	0.6155	0.9911	0.9896	0.9489	0.9831
	MD	Recall	0.2786	0.3399	0.6317	0.7445	0.3840	0.5645	0.9696	0.9936	0.9961
	MD	F1 Score	0.3916	0.4632	0.6440	0.6636	0.5026	0.7095	0.9845	0.9866	0.9830
	DM	Recall	0.2780	0.3465	0.6080	0.8121	0.5351	0.3074	0.5297	0.5451	0.9198
	DM	F1 Score	0.4112	0.4918	0.6935	0.7758	0.6341	0.4649	0.6847	0.6962	0.9498
	DD	Recall	0.2109	0.1763	0.5449	0.8001	0.8293	0.6455	0.6455	0.6456	0.9961
	DD	F1 Score	0.3052	0.2627	0.6343	0.6545	0.7311	0.7685	0.7658	0.7653	0.9788

5.2. Performance Comparison (RQ1)

We use grid search to adjust the parameters of SmartGuardand report the overall performance of SmartGuardand all baselines in Table3. Bold values indicate the optimal performance among all schemes, and underlined values indicate the second best performance. First, SmartGuardoutperforms all competitors in most cases. This is because SmartGuardsimultaneously considers the temporal information, behavior imbalance and noise behaviors. Second, SmartGuardsignificantly improves the performance on DM and DD type anomalies detection. We ascribe this superiority to our TTPE’s effective mining of temporal information of behaviors. Third, the LOF, IF and 6thSense show the worst performance. Aegis and OCSVM outperforms LOF, IF and 6thSense, which benifits from the Markov Chain’s modeling of behavior transitions and SVM’s powerful kernel function. The Autoencoder outperform the traditional models because of stronger sequence modeling capability. ARGUS outperforms Aueocoder because of stronger sequence modeling capability of GRU. By exploiting transformer to mine contextual information, TransAE achieves better performance than all other baselines, but is still inferior to our proposed scheme.

5.3. Ablation Study (RQ2)

LDMS	TTPE	NWRL		SD	MD	DM	DD
X	X	X	$C_{0}$	0.9908	0.7557	0.7452	0.6818
Y	Y	X	$C_{1}$	0.9877	0.9708	0.9767	0.9817
Y	X	Y	$C_{2}$	0.9883	0.8716	0.8783	0.8799
X	Y	Y	$C_{3}$	0.9902	0.9766	0.9835	0.9855
Y	Y	Y	$C_{4}$	0.9967	0.9832	0.9941	0.9951

SmartGuardmainly consists of three main components: Loss-guided Dynamic Mask Strategy (LDMS), Three-level Time-aware Position Embedding (TTPE) and Noise-aware Weighted Reconstruction Loss (NWRL). To investigate different components’ effectiveness in SmartGuard, we implement 5 variants of SmartGuardfor ablation study ( $C_{0}$ - $C_{4}$ ). Y represents adding the corresponding components, X represents removing the corresponding components. $C_{4}$ is SmartGuardwith all three components. As shown in Table4, each component of SmartGuardhas a positive impact on results. The combination of all components brings thebest results, which is much better than using any subset of the three components.

5.4. Parameter Study (RQ3)

5.4.1. The mask ratio $r$ and the training step $N$ without mask

Figure7 illustrates that SmartGuardachieves the optimal performance when $r=0.4$ and $N=5$ . The parameter $r$ (Equation3) determines the difficulty of the model learning task. A smaller $r$ fails to effectively encourage the model to learn hard-to-learn behaviors, while a larger $r$ increases the learning burden on the model, consequently diminishing performance. As for training steps without a mask, a smaller $N$ hinders the model from converging effectively at the beginning stage, whereas a larger $N$ impedes the model’s ability to learn hard-to-learn behaviors towards the end, resulting in degraded performance.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (8)

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (9)

5.4.2. $\mu$ of Noise-aware Weighted Reconstruction Loss

The parameter $\mu$ (Equation17) controls the weights assigned to potential noise behaviors. A smaller $\mu$ results in a smaller weight for noise behaviors, while a larger $\mu$ leads to a greater weight for noise behaviors. As illustrated in Figure8(a), the False Positive Rate gradually decreases as $\mu$ decreases, benefiting from the reduced loss weight assigned to noise behaviors. However, as depicted in Figure8(b), the False Negative Rate slightly increases as $\mu$ decreases. When $\mu=0.1$ , SmartGuardachieves a balance, minimizing both the False Positive Rate and the False Negative Rate.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (12)

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (13)

5.4.3. The embedding size $em$

We fine-tune the embedding size for time and device control, ranging from 8 to 512. As depicted in Figure9(a), an initial increase in the embedding dimension results in a notable performance improvement, which is attributed to the larger dimensionality enabling behavior embedding to capture more comprehensive information about the context, thereby furnishing valuable representations for other modules in SmartGuard. Nevertheless, excessively large sizes (e.g., ¿ 256) can lead to performance degradation due to over-fitting.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (14)

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (15)

5.4.4. Number of layers $L$ of encoder and decoder

Figure9(b) shows the performance of SmartGuardwith different numbers of layers. When $L$ increases, F1-Score first increases and then decreases, reaching the optimal value at 3 layers, because fewer layers leads to under-fitting, and too many layers leads to over-fitting.

5.5. Case Study (RQ4)

To assess the interpretability of SmartGuard, we select a behavior sequence from the test set of the AN dataset and visualize its attention weights and reconstruction loss. Illustrated in Figure10, the user initiated a sequence of actions: turning off the TV, stopping the sweeper, closing the curtains, switching off the bedlight, and locking the smart lock before going to sleep. Subsequently, an attacker took control of IoT devices, turning off the camera, and opening the window for potential theft. Examining Figure10(a), we observe that the attention weights between behaviors $b_{6}$ , $b_{7}$ , $b_{8}$ , and other behaviors in the sequence are relatively smaller. This suggests that $b_{6}$ , $b_{7}$ , $b_{8}$ , and other behaviors lack contextual relevance and are likely abnormal. Turning to Figure10(b), the reconstruction losses for behaviors $b_{6}$ , $b_{7}$ , and $b_{8}$ are notably high. SmartGuardidentifies these anomalies in the sequence, triggering an immediate alarm.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (16)

5.6. Embedding Space Analysis (RQ5)

We visualize the similarity between device embeddings and time embeddings (i.e., hour embedding, day embedding and duration embeddings) to analyze whether the model effectively learns the relationship between behaviors. As shown in Figure11(a), opening the curtains usually occurs between 6-9 and 9-12 o’clock because users usually get up during this period, while closing the curtains generally occurs between 21-24 o’clock because the user usually go to bed during this period. The dishwasher usually runs between 12-15 and 18-21 o’clock, which means that the user has lunch and dinner during this period, and then washes the dishes. As shown in Figure11(b), users generally watch TV and do laundry on Saturdays and Sundays. As shown in Figure11(c), users usually take a bath for about 1-2 hours, bath time longer than this may indicate abnormality occurs.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (17)

6. Conclusion

In this paper, we introduce SmartGuardfor unsupervised user behavior anomaly detection. We first devise a Loss-guided Dynamic Mask Strategy (LDMS) to encourage the model to learn less frequent behaviors that are often overlooked during the learning process. Additionally, we introduce Three-level Time-aware Position Embedding (TTPE) to integrate temporal information into positional embedding, allowing for the detection of temporal context anomalies. Furthermore, we propose a Noise-aware Weighted Reconstruction Loss (NWRL) to assign distinct weights for routine behaviors and noise behaviors, thereby mitigating the impact of noise. Comprehensive experiments conducted on three datasets encompassing ten types of anomaly behaviors demonstrate that SmartGuardconsistently outperforms state-of-the-art baselines while delivering highly interpretable results.

Acknowledgements.

We thank the anonymous reviewers for their constructive feedback and comments. This work is supported by the Major Key Project of PCL under grant No. PCL2023A06-4, the National Key Research and Development Program of China under grant No. 2022YFB3105000, and the Shenzhen Key Labof Software Defined Networking under grant No. ZDSYS20140509172959989.

References

(1)
Amraoui and Zouari (2021)Noureddine Amraoui and Belhassen Zouari. 2021.An ml behavior-based security control for smart home systems. In Risks and Security of Internet and Systems: 15th International Conference, CRiSIS 2020, Paris, France, November 4–6, 2020, Revised Selected Papers 15. Springer, 117–130.
Celik etal. (2018)Z.Berkay Celik, Leonardo Babun, AmitKumar Sikder, Hidayet Aksu, Gang Tan, PatrickD. McDaniel, and A.Selcuk Uluagac. 2018.Sensitive Information Tracking in Commodity IoT. In 27th USENIX Security Symposium, USENIX Security 2018, Baltimore, MD, USA, August 15-17, 2018, William Enck and AdriennePorter Felt (Eds.). USENIX Association, 1687–1704.https://www.usenix.org/conference/usenixsecurity18/presentation/celik
Chen etal. (2019)Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019.Behavior sequence transformer for e-commerce recommendation in alibaba. In Proceedings of the 1st international workshop on deep learning practice for high-dimensional sparse data. 1–4.
Chen etal. (2018)Zhaomin Chen, ChaiKiat Yeo, BuSung Lee, and ChiewTong Lau. 2018.Autoencoder-based network anomaly detection. In 2018 Wireless telecommunications symposium (WTS). IEEE, 1–5.
Cheng etal. (2019)Zhangyu Cheng, Chengming Zou, and Jianwei Dong. 2019.Outlier detection using isolation forest and local outlier factor. In Proceedings of the conference on research in adaptive and convergent systems. 161–168.
Chi etal. (2022)Haotian Chi, Chenglong Fu, Qiang Zeng, and Xiaojiang Du. 2022.Delay Wreaks Havoc on Your Smart Home: Delay-based Automation Interference Attacks. In 43rd IEEE Symposium on Security and Privacy, SP 2022, San Francisco, CA, USA, May 22-26, 2022. IEEE, 285–302.https://doi.org/10.1109/SP46214.2022.9833620
deSouza PereiraMoreira etal. (2021)Gabriel de Souza PereiraMoreira, Sara Rabhi, JeongMin Lee, Ronay Ak, and Even Oldridge. 2021.Transformers4rec: Bridging the gap between nlp and sequential/session-based recommendation. In Proceedings of the 15th ACM Conference on Recommender Systems (RecSys). 143–153.
Feng etal. (2018)Jie Feng, Yong Li, Chao Zhang, Funing Sun, Fanchao Meng, Ang Guo, and Depeng Jin. 2018.DeepMove: Predicting Human Mobility with Attentional Recurrent Networks. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1459–1468.https://doi.org/10.1145/3178876.3186058
Fernandes etal. (2016)Earlence Fernandes, Jaeyeon Jung, and Atul Prakash. 2016.Security Analysis of Emerging Smart Home Applications. In Proceedings of IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA.
Fu etal. (2022)Chenglong Fu, Qiang Zeng, Haotian Chi, Xiaojiang Du, and SivaLikitha Valluru. 2022.IoT Phantom-Delay Attacks: Demystifying and Exploiting IoT Timeout Behaviors. In 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2022, Baltimore, MD, USA, June 27-30, 2022. IEEE, 428–440.https://doi.org/10.1109/DSN53405.2022.00050
Fu etal. (2021)Chenglong Fu, Qiang Zeng, and Xiaojiang Du. 2021.HAWatcher: Semantics-Aware Anomaly Detection for Appified Smart Homes. In 30th USENIX Security Symposium, USENIX Security 2021, August 11-13, 2021, MichaelD. Bailey and Rachel Greenstadt (Eds.). USENIX Association, 4223–4240.https://www.usenix.org/conference/usenixsecurity21/presentation/fu-chenglong
Gu etal. (2020)Tianbo Gu, Zheng Fang, Allaukik Abhishek, Hao Fu, Pengfei Hu, and Prasant Mohapatra. 2020.IoTGaze: IoT Security Enforcement via Wireless Context Analysis. In 39th IEEE Conference on Computer Communications, INFOCOM 2020, Toronto, ON, Canada, July 6-9, 2020. IEEE, 884–893.https://doi.org/10.1109/INFOCOM41043.2020.9155459
He etal. (2022)Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022.Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000–16009.
Jeon etal. (2022)Hyunsik Jeon, Jongjin Kim, Hoyoung Yoon, Jaeri Lee, and U Kang. 2022.Accurate action recommendation for smart home via two-level encoders and commonsense knowledge. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM). 832–841.
Jia etal. (2017)YunhanJack Jia, QiAlfred Chen, Shiqi Wang, Amir Rahmati, Earlence Fernandes, ZhuoqingMorley Mao, and Atul Prakash. 2017.ContexloT: Towards Providing Contextual Integrity to Appified IoT Platforms. In 24th Annual Network and Distributed System Security Symposium, NDSS 2017, San Diego, California, USA, February 26 - March 1, 2017. The Internet Society.https://www.ndss-symposium.org/ndss2017/ndss-2017-programme/contexlot-towards-providing-contextual-integrity-appified-iot-platforms/
Kang and McAuley (2018)Wang-Cheng Kang and Julian McAuley. 2018.Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197–206.
Kingma and Ba (2014)DiederikP Kingma and Jimmy Ba. 2014.Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980 (2014).
Li etal. (2024)Fan Li, Xu Si, Shisong Tang, Dingmin Wang, Kunyan Han, Bing Han, Guorui Zhou, Yang Song, and Hechang Chen. 2024.Contextual Distillation Model for Diversified Recommendation.arXiv preprint arXiv:2406.09021 (2024).https://arxiv.org/abs/2406.09021
Liu etal. (2008)FeiTony Liu, KaiMing Ting, and Zhi-Hua Zhou. 2008.Isolation forest. In 2008 eighth ieee international conference on data mining. IEEE, 413–422.
Lueth (2018)KnudLasse Lueth. 2018.State of the IoT 2018: Number of IoT devices now at 7B – Market accelerating.https://iot-analytics.com/state-of-the-iot-update-q1-q2-2018-number-of-iot-devices-now-7b/.
Paszke etal. (2019)Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, etal. 2019.Pytorch: An imperative style, high-performance deep learning library.Advances in Neural Information Processing Systems (NIPS) 32 (2019).
Rieger etal. (2023)Phillip Rieger, Marco Chilese, Reham Mohamed, Markus Miettinen, Hossein Fereidooni, and Ahmad-Reza Sadeghi. 2023.ARGUS: Context-Based Detection of Stealthy IoT Infiltration Attacks. In Proceedings of the 32nd USENIX Conference on Security Symposium (Anaheim, CA, USA) (SEC ’23). USENIX Association, USA, Article 241, 18pages.
Sikder etal. (2017)AmitKumar Sikder, Hidayet Aksu, and ASelcuk Uluagac. 2017. $\{$ 6thSense $\}$ : A context-aware sensor-based attack detector for smart devices. In 26th USENIX Security Symposium (USENIX Security 17). 397–414.
Sikder etal. (2019)AmitKumar Sikder, Leonardo Babun, Hidayet Aksu, and A.Selcuk Uluagac. 2019.Aegis: A Context-Aware Security Framework for Smart Home Systems. In Proceedings of the 35th Annual Computer Security Applications Conference (San Juan, Puerto Rico, USA) (ACSAC ’19). Association for Computing Machinery, New York, NY, USA, 28–41.https://doi.org/10.1145/3359789.3359840
Srinivasan etal. (2008)Vijay Srinivasan, JohnA. Stankovic, and Kamin Whitehouse. 2008.Protecting your daily in-home activity information from a wireless snooping attack. In UbiComp 2008: Ubiquitous Computing, 10th International Conference, UbiComp 2008, Seoul, Korea, September 21-24, 2008, Proceedings (ACM International Conference Proceeding Series, Vol.344), HeeYong Youn and We-Duke Cho (Eds.). ACM, 202–211.https://doi.org/10.1145/1409635.1409663
Sun etal. (2019)Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019.BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM). 1441–1450.
Tang etal. (2022)Shisong Tang, Qing Li, Xiaoteng Ma, Ci Gao, Dingmin Wang, Yong Jiang, Qian Ma, Aoyang Zhang, and Hechang Chen. 2022.Knowledge-based temporal fusion network for interpretable online video popularity prediction. In Proceedings of the ACM Web Conference 2022. 2879–2887.
Tang etal. (2023)Shisong Tang, Qing Li, Dingmin Wang, Ci Gao, Wentao Xiao, Dan Zhao, Yong Jiang, Qian Ma, and Aoyang Zhang. 2023.Counterfactual Video Recommendation for Duration Debiasing. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4894–4903.
Vaswani etal. (2017)Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, AidanN Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017.Attention is all you need.Advances in Neural Information Processing Systems (NIPS) 30 (2017).
Wang etal. (2023)Jincheng Wang, Zhuohua Li, Mingshen Sun, Bin Yuan, and John C.S. Lui. 2023.IoT Anomaly Detection Via Device Interaction Graph. In 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Network, DSN 2023, Porto, Portugal, June 27-30, 2023. IEEE, 494–507.https://doi.org/10.1109/DSN58367.2023.00053
Xiao etal. (2023a)Jingyu Xiao, Qingsong Zou, Qing Li, Dan Zhao, Kang Li, Wenxin Tang, Runjie Zhou, and Yong Jiang. 2023a.User Device Interaction Prediction via Relational Gated Graph Attention Network and Intent-aware Encoder. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 1634–1642.
Xiao etal. (2023b)Jingyu Xiao, Qingsong Zou, Qing Li, Dan Zhao, Kang Li, Zixuan Weng, Ruoyu Li, and Yong Jiang. 2023b.I Know Your Intent: Graph-enhanced Intent-aware User Device Interaction Prediction via Contrastive Learning.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMUWT/UbiComp) 7, 3 (2023), 1–28.
Zhai etal. (2018)Junhai Zhai, Sufang Zhang, Junfen Chen, and Qiang He. 2018.Autoencoder and its various variants. In 2018 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, 415–419.
Zou etal. (2023)Qingsong Zou, Qing Li, Ruoyu Li, Yucheng Huang, Gareth Tyson, Jingyu Xiao, and Yong Jiang. 2023.IoTBeholder: A Privacy Snooping Attack on User Habitual Behaviors from Smart Home Wi-Fi Traffic.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT/UbiComp) 7, 1 (2023), 1–26.

Appendix A Appendices

A.1. Notations

Key notations used in the paper and their definitions are summarized in Table5.

Notation	Definition
$d$ , $\mathcal{D}$	a device/set of devices
$c$ , $\mathcal{C}$	a device control/set of device controls
$s$ , $\mathcal{S}$	a sequence/set of sequences
$n$	the lenghth of a sequence
$b$	a behavior
$t$	the timestamp of behavior
$hour$	the hour of day of a behavior
$day$	the day of week of a behavior
$duration$	the duration of a behavior
$PE$	the positional embedding function
$PE(order)$ , $w_{order}$	the positional embedding and it’s weight
$PE(hour)$ , $w_{hour}$	the hour embedding and it’s weight
$PE(day)$ , $w_{day}$	the day embedding and it’s weight
$PE(duration)$ , $w_{dur}$	the duration embedding and it’s weight
$\overline{PE}$	the positional embedding
$h_{c}$	the device control embedding
$\mathbf{h}$	the behavior embedding
$\mathcal{L}_{rec}$	the reconstruction loss
$\ell_{i}$ , $\mathcal{L}_{\text{vec}}$	the loss of $i$ -th behavior and loss vector
$w_{i}$ , $\mathcal{W}_{\text{vec}}$	the weight of $i$ -th behavior and weight vector
$p_{i}$	the normalized weight of $i$ -th behavior
$mask$	the mask vector
$score(s)$	the anomaly score of sequence $s$
$th$	the threshold

A.2. Device information of different dataset

The AN, FR and SP data sets contain 36, 33, and 34 devices respectively, as shown in Table6, Table7, and Table8.

No.	Device	No.	Device	No.	Device
0	AC	12	LED	24	projector
1	heater	13	locker	25	washing_machine
2	dehumidifier	14	bathheater	26	kettle
3	humidifier_1	15	water_cooler	27	dishwasher
4	fan	16	curtains	28	bulb_1
5	standheater	17	outlet	29	TV
6	aircleaner	18	audio	30	pet_feeder
7	humidifier_2	19	plug	31	hair_dryer
8	desklight	20	bulb_2	32	window_cleaner
9	bedight_1	21	soundbox_1	33	bedlight_2
10	camera	22	soundbox_2	34	bedlight_3
11	sweeper	23	refrigerator	35	cooler

No.	Device	No.	Device	No.	Device
0	AirConditioner	11	Fan	22	Refrigerator
1	AirPurifier	12	GarageDoor	23	RemoteController
2	Blind	13	Light	24	RobotCleaner
3	Camera	14	Microwave	25	Siren
4	ClothingCareMachine	15	MotionSensor	26	SmartLock
5	Computer	16	NetworkAudio	27	SmartPlug
6	ContactSensor	17	None	28	Switch
7	CurbPowerMeter	18	Other	29	Television
8	Dishwasher	19	Oven	30	Thermostat
9	Dryer	20	PresenceSensor	31	Washer
10	Elevator	21	Projector	32	WaterValve

No.	Device	No.	Device	No.	Device
0	AirConditioner	12	GarageDoor	24	RobotCleaner
1	AirPurifier	13	Light	25	SetTop
2	Blind	14	Microwave	26	Siren
3	Camera	15	MotionSensor	27	SmartLock
4	ClothingCareMachine	16	NetworkAudio	28	SmartPlug
5	Computer	17	None	29	Switch
6	ContactSensor	18	Other	30	Television
7	CurbPowerMeter	19	Oven	31	Thermostat
8	Dishwasher	20	PresenceSensor	32	Washer
9	Dryer	21	Projector	33	WaterValve
10	Elevator	22	Refrigerator
11	Fan	23	RemoteController

A.3. Data collection

Testbed and Participants. To create a practical and viable smart home model, we implemented our experimental platform within an apartment setting to gather user usage data of various devices, forming our smart home user behavior dataset (AN). Three volunteers were recruited to simulate the typical daily activities of a standard family, assuming the roles of an adult male, an adult female, and a child. The experimental platform comprises a comprehensive selection of 36 popular market-available devices, detailed in Table6, with their deployment illustrated in Figure12.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (18)

Normal Behavior Collection. We enlisted volunteers to reside in apartments and encouraged them to utilize equipment in accordance with their individual habits. Throughout the designated period of occupancy, we refrained from actively or directly intervening in the users’ behavior. However, we implemented a system where users consistently logged their activities. Following the conclusion of the data collection phase, we reviewed the device usage logs via the smart home app, amalgamating these logs with the users’ behavior records to compile a comprehensive user behavior dataset. To mitigate potential biases arising from acclimating to a new living environment, participants were required to inhabit the experimental setting for a minimum of two weeks before the formal commencement of data collection. All users possessed comprehensive knowledge of the IoT devices and applications in use. Subsequent to check-in, control of all devices was relinquished to the users, who were duly informed in advance that their device usage would be subsequently reviewed and analyzed by our team

Anomaly Behavior Injection. We insert abnormal behaviors in Table2 into normal behavior sequences to construct abnormal behavior sequences. Then the abnormal behavior sequences. Then the abnormal behavior sequence and the normal behavior sequence together form the test dataset. The anomaly behavior sequences examples are shown in Figure13.

Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask (19)

A.4. Detailed experimental settings

All models (including baselines and SmartGuard) are implemented by PyTorch (Paszke etal., 2019) and run on a graphic card of GeForce RTX 3090 Ti. All models are trained with Adam optimizer (Kingma and Ba, 2014) with learning rate 0.001. We train SmartGuardto minimize $\mathcal{L}_{rec}$ in Equation (14). During training, we monitor reconstruction loss and stop training if there is no performance improvement on the validation set in 10 steps. For model hyperparameters of SmartGuard, we set the batch size to 512 and the initial weights of TTPE are $w_{order}=0.1,w_{hour}=0.4,w_{day}=0.4$ , and $w_{duration}=0.7$ . For mask ratio and step without step, we search in $\{0.2,0.4,0.6,0.8\}$ and $\{3,4,5,6\}$ , respectively. We chose the number of encoder and decoder layers in $\{1,2,3,4\}$ , and the embedding size in $\{8,16,32,64,128,256,512\}$ .

A.5. Mask strategy deep dive

To verify the effectiveness of LDMS, we compared it with the three baselines (w/o mask, random mask and top- $k$ loss mask) mentioned in the section 4.2. As illustrated in Figure14(a), LDMS consistently outperforms all other mask strategies across four types of anomalies. The results presented in Figure14(b) further demonstrate that LDMS exhibits the smallest variance in reconstruction loss throughout the training process, which demonstrates that SmartGuardlearns both easy-to-learn behaviors and hard-to-learn behaviors very well. We also plotted the loss distribution diagram under different mask strategies. As shown in Figure15, LDMS shows the smallest reconstruction loss and variance, which demonstrates that our mask strategy can better learn hard-to-learn behaviors. We can still observe behaviors with high reconstruction loss as pointed by the red dashed arrow after applying LDMS, which is likely to be noise behaviors, thus it’s necessary to assign small weights for these noise behaviors during anomaly detection for avoiding identifying normal sequences containing noise behaviors as abnormal.