Extract, Transform, And Load (ETL) Technique for Pre-Processing of Agricultural Pest Dataset


  • J. Cruz Antony, J. Refonaa , S. L. Jany Shabu, S. Dhamodaran, P. Asha


Data Warehouse, ETL, Pest population dataset, Pre-processing, Talend Open Studio


Data pre-processing the preliminary data mining procedure is taking a new dimension, the Extract, Transform, and Load (ETL) based data pre-processing. Data pre-processing is a crucial step in knowledge discovery through data mining which shapes the raw data by cleaning, integrating, and transforming using predefined techniques. Studying the pest population dynamics was never easy, as the pest population was mostly correlated with the abiotic and biotic features for knowledge discovery and forecasting the occurrence of the pest in crops. ETL technique, a Data Warehouse concept was chosen for data pre- processing, because of its fast and efficient handling in regarding with the huge dataset, also the dataset is heterogeneous from five major districts of Maharashtra which includes the pest population on the crop for five years along with its respective abiotic features. Talend Open Studio (TOS) was used to design an ETL job for performing data extraction, integration, and discretization. The designed ETL job has exhibited good performance and accuracy in pre-processing the pest population dataset. This paper will provide an insight into building an ETL tool using Talend Open Studio and review the issues of data pre-processing in the field of agricultural pest management.


