Human Activity Recognition (HAR) plays a critical role in segregating and
distinguishing human actions among data generated from videos and other numerous
sensing modalities, such as accelerometer, gyroscope, GPS, and magnetometer. HAR is
considered a rapidly growing field that has revolutionized numerous areas, such as
healthcare, manufacturing, security, smart homes, etc. Manual extraction of features in
traditional machine learning approaches makes it difficult to handle the spatial and
temporal complexities of real-world datasets, thereby necessitating the need for Deep
Learning algorithms that offer automatic feature extraction to effectively capture both
the spatial and temporal data. This chapter provides a review of Deep Learning models
for HAR, focusing on advancements in CNN and LSTM and their variant architectures
that play a significant role in handling complex and multivariate datasets gathered from
wearable devices and smartphones. Furthermore, attention mechanisms, such as the
self-attention and squeeze and excitation modules, have significantly enhanced model
performance by focusing on relevant feature maps and recalibrating them adaptively.
These mechanisms do not only improve the accuracy but also the interpretability of the
model by concentrating on the important aspects of the data in consideration. This
chapter also highlights hybrid models that combine CNN and LSTM and their variants
for more accurate HAR, especially when working with sensor-based datasets.
Additionally, it also examines that incorporation of attention mechanisms not only
boosts accuracy but also optimizes the complexity of the models. Key trends in
attention-driven deep learning methods are examined, indicating their growing
importance in real-world human activity recognition applications.
Keywords: Artificial intelligence (AI), Deep learning (ML), Human activity, Wearable sensor.