This chapter presents the classification and analysis of fashion data, which
consists of 90 images belonging to one class, using deep learning techniques. Data
augmentation is done to pre-process the dataset. Features are retrieved using
Convolutional Neural Networks (CNNs), VGG16, and ResNet50. These modes are
trained on styles and patterns of images so that recognition can be done. For the styles
and subtitles, another dataset of 144 audio files has been utilized. Voice is converted
into text by using Machine Learning (ML) and Natural Language Processing (NLP)
techniques. Pre-processing of audio files has been performed using Mel-Frequency
Cepstral Coefficients (MFCC) along with normalization to reduce noise. The Recurrent
Neural Networks (RNNs) technique converts the audio file into a text file. The
proposed work is evaluated based on accuracy, reliability, and adaptability.
Keywords: Convolutional neural network, Image recommendation system, Subtitles recommendation system, VGG16, ResNet50, Mel-Frequency Cepstral Coefficients (MFCC), Recurrent neural networks.