Artificial intelligence approach in crude distillation unit operation

This study incorporates the use of Artificial Intelligence in the monitoring of atmospheric distillation unit of large scale refining operation using Google AutoML tables, Jupyter, and Python software. The process involved training, evaluation, improvement, and deployment of the models based on the input data. The predicted yield (vol %) for the models were: Auto ML model: liquefied petroleum gas (LPG) - 1.41 , straight run gasoline (SRG)– 4.96, straight run naphtha (SRN) – 17.87, straight run kerosene (SRK) – 14.5, light diesel oil (LDO) – 26.47, heavy diesel oil (HDO) – 2.7, and atmospheric residue (AR) –30.03; Jupyter Model: LPG – (0.93), SRG – (4.69), SRN – (17.24), SRK – (14.39), LDO – (26.43), HDO – (2.7), and AR – (30.18); and Python Model:LPG – (1.66) , SRG – (7.58), SRN – (11.68), SRK – (14.92), LDO – (24.77), HDO – (4.59), and AR – (24.59). The coefficient of determination (R 2 ) values of 0.99981, 0.99943, and 0.93078 and Standard Error values of 0.240918, 0.419291, 3.536064, were obtained for the 3 models, respectively. All the software gave good predictions of the actual yield, although the Google Auto ML Table gave the best prediction. The training of the model is fundamental to its performance and precision. AutoML, Jupyter, and Python software, respectively indicate the yields as follows : LPG - 1.41% , SRG – 4.96%, SRN – 17.87%, SRK – 14.5%, LDO – 26.47%, HDO – 2.7%, and AR –30.03%; LPG - 0.93% , SRG – 4.69%, SRN – 17.24%, SRK – 14.39%, LDO – 26.43%, HDO – 2.7%, and AR –30.18%; LPG - 1.66% , SRG – 7.58%, SRN – 11.68%, SRK – 14.92%, LDO 24.77%, HDO – 4.59%, and AR –24.59%. All the software gave almost a similar range of product quality as that of the actual yield.


Introduction
The oil and gas industry plays a crucial role in energy generation and is the mainstay and a major driver of most economies around the world. Despite the research and discoveries on other forms of energy, the petroleum industry remains crucial and hence the need for more inventions to enhance its processes. Crude distillation unit, being the first stage of the refining process is a fundamental process in the value chain. It is a highly energy intensive process and demands high efficiency and the constant demand on the need to improve its operation through necessary due diligence [1]. Al-Dunainawi & Abbod [2] evaluated the relationship between process performance and product composition of distillation columns, describing the process as nonlinearly transit and its complexity.
The need for process improvement in the refining process and its products implies the need for more sophisticated technologies and methods through the application of artificial intelligence (AI). Stephen [3] described AI as an ever evolving process at an ever increasing pace. It is imperative that AI can also have major breakthroughs in the oil and gas industry, especially the downstream sector to harness its efficiency and effectiveness. This approach will involve data observation, preparation, planning, and model building [4].
Frankenfield [5] described AI as any mechanism that displays traits related with a human mind like learning and solving problem. It is built on the principle that a machine can effortlessly imitate it and implement tasks, from the most-simple to those that are even more composite. A vital aspect of AI is machine learning which could be categorized into four algorithms namely: supervised, unsupervised, semi-supervised, and reinforcement machine learning involved with the use of classified, unclassified, semi-classified data, and data interaction with the environment, respectively. It has the ability to learn and improve from experience when there has not been any explicit programming [6].
Several models with have been applied in the study of the distillation process [2, 7 -13]. Additionally, Abbasi et al. [14] highlighted the importance of machine learning in the prediction of solutions to large scale industrial processes. These models focused on the need to improve efficiency of several aspects of the distillation operation.
Although much improvement has been made in applying auto machine learning (ML) in distillation as cited in these studies, the rigor of integrating neural networks and other optimization systems are still inbound.This study evaluates the application of Auto ML in the prediction of the output of a commercial distillation process, andto draw a comparison on the performance of different software.

Methods
This study involved an analytical research design. Google Auto ML Console, Jupyter, and Python software were used to model the data obtained from the atmospheric distillation unit of a commercial process over a 6-month period of continuous run operation. The structured data comprised of numeric and alphabetic components which describe the input data.

Mathematical modeling
A multiple linear regression equation was used to model the data as follows. This model considered the relationship between the dependent variable and the independent variables as shown in Equation (1).

Programming and software development
A Visual Studio Code (VS-CODE) version 1.49.3 designed by Microsoft was used to generate the linear regression models for the predictions. It involved installing a programme interpreter using the appropriate extension (-py = python and ipynb = Jupyter) which was generated for language coding, libraries created, data imported, graphical link of the variables created using matplotlib, and the linear regression model utilized. The developed model was used to make predictions on the input trained data and tested using a new input data to obtain a new set of data (predicted data).
The Google cloud autoML uses simple graphical user interface to train, evaluate, improve, and deploy models based on the data provided. Available custom models include: natural language, tables, vision intelligence, and translation and were dependent on the information provided. Google AutoML Tables was used for this study on the structured data following the approach (Fig. 1). Training the model on the data, involved normalization and encoding.

Figure 1 Flow diagram of AutoML Process
The following criteria were met for the use of the software: a. input data was ≤ 100 gb. b. Inclusion of the target column c. There were not more than 2 -1,000 columns. d. One column was the target, with feature column available to train the model. e. There were at least 1,000 and no more than 100,000,000 rows.
The prepared data set was thereafter imported into AutoML using CSV files. For this study, the smaller dataset which ranged from 2 -1000 columns was used.The imported data set was split into training set, validating set and test set. In splitting the data into columns, the time and target columns were included (Fig. 2).

Figure 2 Auto ML Split Page
The target column was numerical and not null. The target column is what the model was trained to predict. The data type determined the resulting model as a regression (Numeric) model. A representation of the numeric model is shown (Fig. 3).

Figure 3
AutoML data type column

Statistical analysis
The accuracy of the model was used to evaluate the precision of the model on the prediction of the test dataset. Precision of the regression evaluations was determined using the correlation (Eq. 2) Where: TPn and FPn are the true positives false positives for each of the n classes, respectively.
In addition, Google AutoML Tables provided two other ways to comprehend the model, the confusion matrix and a feature importance graph. The confusion matrix (Fig. 4) explained the misclassifications within the model whereas the feature importance (Fig. 5) indicated how much each element of the independent variable impacted on the model. The coefficient of determination (R 2 ) and Standard Error values from these models were also obtained.

Results and discussion
The structured technical data (Table 1) highlights the product composition, the dependent variable (product yield) and the independent variables (temperature, flow rate, and specific gravity). The actual yield of products in this unit were LPG -1.5%, SRG -5%, SRN -18%, SRK -14.5%, LDO -27%, HDO -3.5%, and AR -30.5%. The predicted yield of these products (
Al-Dunainawi and Abbod [2] highlighted the substantial impact of quality in the analysis of the products of the atmospheric distillation unit and the need for a reliable analysis of the product composition. The effectiveness of the Auto ML software in predicting the product yield could be attributed to the software training and its outstanding ability to learn the input -output relationship of nonlinear complex system of this unit which made it possible to effectively reduce the error between the predicted and actual yield ( Table 3). Training of the AutoML software is fundamental to its performance and effectiveness and sufficient effort must be made to achieve this. The close range values of the predicted values with the actual values indicate the sufficiency of the model for this purpose. A multivariable linear model was chosen to give sufficient consideration of some of the key parameters guiding the process and for the sake of simplifying the complex process [15,16]. Moreover, studies have shown that consideration of multiple variables improves model predictions [17] although the importance of maintaining a good start-up behavior for these model types have been highlighted and adequate care was taken to achieve this during the configuration process [10,18]. The statistical data (Table 3) [19] whereas the standard error values shows that the Auto ML model gave more accuracy than the Jupyter and Python models, and therefore an indication that the Google AutoML Tables is an improved model. Further training of the model with the test dataset improved its prediction.
Iterations and training of the software to develop the best regression model was automatic. Additionally, all the identified variables were considered for the process. This high accuracy can be attributed to the high precision associated with automatic machine learning in agreement with the work of Al-Dunainawi & Abbod [2]. The correlation of the Target with Temperature and Specific Gravity (Figs 6 & 7) observed through the line of best fit highlights a linear relationship and the fitness of these variables with the target. The feature importance (Fig. 3), however, shows the impact of these variables on the process and on the outcome of the prediction as shown by the VS Code software.

Conclusion
The prediction model from the Auto ML software was successfully established as the most accurate method of prediction in this study based on the predicted yield and statistical analysis. The introduction of artificial intelligence, as proposed, would help mitigate losses during operation, improve process efficiency and enhance the process. Being able to predict what is to be expected before the actual process can go a long way in harnessing the accurate logistics planning of refining operations. In the advancement into the world of the big data and Internet of Things, the development of this method and incorporation of same in the downstream sector is imperative towards achieving this purpose.