All discoveries made by man in any discipline, like physical sciences, biological sciences, social sciences, engineering, etc., are based on past experiences or collected data. This means that human beings solve the problem by using past experiences or collected data. Therefore, since one aspect of Artificial Intelligence, as pointed out in chapter 1, unit 1, is to design systems that act like man, it becomes necessary that computer systems should be designed to solve the problem the way human beings solve a problem. This means that computer systems should be designed to solve the problem using past experience or previously stored data. The data are called training data because they are used to train the computer to learn the trend or pattern of the training data. Learning the pattern or trend of the training data as a rule, it uses the learnt rule to solve a subsequent problem using test data that has the same structure as the training data. The vast amount of data in machine learning is divided into two sets, which are the training set and the test set. The training set is used to develop a model, while the test set is used to evaluate the performance of the model. Data splitting technique in machine learning refers to the technique used to split the data into a training set and test set. The aim is to avoid poor generalization, i.e., overfitting or overtraining. Using more training sets improves the accuracy of the model, while using more test data improves the accuracy of the error estimate. An appropriate training/test set ratio of 70:30 is considered appropriate. Machine learning, therefore, is an aspect of Artificial Intelligence that deals with the design of systems that uses a large set of data called training data to solve a particular problem. Machine learning is a broad area in Artificial Intelligence, which will be considered in the various units of this chapter.
Keywords: Classification algorithm, Data pre-processing, Decision tree algorithm, Feature engineering, K-means clustering algorithm, Learner’s input, Learner’s output, Naive Bayes algorithm, Regression algorithm.