Statistical analysis: An approach by applying nonparametric tests to a Brazilian commercial egg production line

The present study refers to the statistical analyzes applied to the production of commercial eggs carried out in a Poultry Company in the interior of the State of São Paulo, Brazil. Currently, the concern with consumer demands is growing in the food sector, which makes companies seek to increase the level of quality of their products. In this study, the application of statistical analysis was applied to the data obtained from the software that integrates a crack detection accessory, the Crack Detector of 12 egg production lines for 6 months. It was concluded that, when performing the nonparametric tests, Spearman and Kruskal tests, the data are correlated and there is a difference between the analyzed cracks among the quality control equipments. As result of this research work a calibration plan for the measurement equipments was suggested, in order to increase the reliability in the quality control inspection process, reducing the amount of non-conform products delivered to the customers market.


Introduction
Statistical analyzes are extremely important for the validation of the results obtained in studies, and one of the central objectives of science is to identify and justify the best explanation for a given phenomenon [1], so it becomes an important tool for companies to assess the impacts of certain variables on the process of manufacturing their product or providing the final service.
Consumers in the food industry are influenced by the added value of the product. In Brazil, a large part of egg production is sold on the domestic market, however the sector is adapting, more and more, to increase exports [AVISITE. Brazilian Poultry Union [UBA], https://avisite.com.br/legislacao/anexos/protocolo_de_boas_praticas_de_producao_de_ovos.pdf, Last accessed on: 22/05/2019]. Brazilians consume 192 eggs per capita and the export of this same product caused the sector to generate US $ 7, 490 million in 2017 [AHORA DO OVO. Brazilian egg consumption is 192 eggs per capita increaisng 1% in Brazil, http://www.ahoradoovo.com.br/no-mundo-do-ovo/noticias/?id=1320%7Cbrasileiro-jaconsome-192-ovos-per-capita-consumo-cresce-1-no-brasil, Last accessed on: 20/06/2019, BRAZILIAN ANIMAL PROTEIN ASSOCIATION [ABPA], Annual Report 2018, www.abpa-br.com.br Last accessed on: 15/05/2019]. This leads to a greater concern of the poultry industry with the quality of its products and, applying statistical studies in order to adapt to meet the demands of consumers, is essential to ensure competitiveness [2].
In poultry farms, there is still a lack of technologies to automate and improve quality processes in egg production. Quality control at these companies is still essentially based on human inspection, which occurs at low rates (12 eggs per person) and is subject to human error [3].
Thus, in July 1973, Crack Detector was invented by George N. Bliss, which is an accessory that can be integrated into production lines and allows you to recognize cracks by the vibration that differs from a cracked egg to an egg in marketing conditions, and is currently widely used in poultry companies [4]. Crack Detector, in addition to preventing cracked eggs from reaching consumers, provides data with which it is possible to work by carrying out statistical analysis in order to increase the level of product quality.
This study aims to improve the knowledge about statistical analysis and its importance as a tool to increase the production quality index in poultry companies. In addition, it also aims to verify whether Crack Detectors read the same or different and whether they need repairs to deliver better products to consumers.

Statistics Analysis
Statistical methods are efficient direct approaches that offer objectivity and accuracy, which leads to giving more importance to facts than to abstract concepts [5]. Statistics provides tools that formalize and standardize procedures to obtain certain conclusions [6].
In studies involving statistical analysis, the methods and tests that best suit the analysis must be chosen to achieve the objective of the work.

Hypothesis Test
In order to reach an objective decision, it is necessary to carry out the hypothesis test, establishing hypotheses that can be rejected, reviewed or accepted.
The hypothesis test examines two hypotheses, one null, which is the hypothesis to be tested and an alternative. And, according to the sample data, the test determines whether to accept or reject the null hypothesis [7].

Nonparametric Tests
Data that is "free of distribution" is called non-parametric data and does not assume that the data analyzed was extracted from a normally distributed population. And, for these data, non-parametric tests are used that are identified as "ranking tests" that can be used with scores, which are not numerically accurate, but which are indicated by "rankings" in ranking.
These tests are commonly used with small samples, which makes it easier to use them in a pilot study or in studies that are not possible to have large samples [6].

Spearman's Ordinal Correlation Test
The Spearman Ordinal Correlation test has as main requirement that the paired data of the variables under study originate from a simple random sample of the population, therefore, the Spearman coefficient (rs) is used to test the association of the variables in the population. Thus, using the rs value, it is tested whether the variables are independent or whether there is an association between them [8].

Kruskal -Wallis Test
The Kruskal-Wallis test, developed in 1952 by William H. Kruskal and Wilson A. Wallis, is used to determine whether three or more independent groups are from different populations or not, as well as "The one-way analysis of variance" (ANOVA). However, ANOVA is used for parametric data, with populations normally distributed, and Kruskal-Wallis for nonparametric data, selected at random.
In addition, the Kruskal-Wallis test is also used to determine whether three or more independent groups are equal or differ in some variable of interest regarding the ordinal level of data, the interval level or the relationship levels of the available data [9].

Material and methods
The present work is a case study and was carried out by statistically analyzing the data from the 12-line Crack Detector software, over a period of 6 months. The average of each Crack Detector for each month was also compared, thus making a month-to-month analysis.
A coleta Data collection was carried out in a poultry company located in the interior of the state of São Paulo, Brazil.
Initially, an outlier test was performed to remove values outside the usual data distribution. Soon after, the Graphical Summary test was performed analyzing the P-Value of each production line.
As this is a study with a small sample size, non-parametric tests were used. To verify whether the production lines differed from each other, the Kruskal-Wallis Test was performed and to verify if there was a correlation, the Spearman Ordinal Correlation Test was performed. All analyzes were made from the graphics and information obtained in the Minitab statistical software version 2017.

Results and discussion
The equipment under study, Crack Detector, has an internal defect detection module, based on the vibrations emitted when the egg passes through the device, which allows checking whether an egg is broken or not. This detection module is important, as it replaces the human work of visual inspection of defects, allowing the automation of this task, producing agility, efficiency in the egg screening stage and standardization in the classification [10].
We have a company with a production plant with 12 production lines, at the end of each line, a Crack Detector identifies if there are defects in the eggs. Each equipment works in the same way.
The work carried out by Crack Detector identifies four common types of defects that occur in chicken eggs: dirt, blood stains, cracks and leaking of yolk. In this work, we analyzed the data collected from the 12 production lines during 6 months to identify these errors, as shown in Figures 1 and 2.  The results obtained from the automatic visual inspection performed by Crack Detector can be seen in the series of data commented in the sequency.
The first test to be done is the outlier test and its data are contained in Figure 3.

Figure 4 Variance of the Outliers
If the outlier test found any data to be an outlier, that data will be replaced by the average of the other data obtained.
The data from the Graphical Summary Test for the month of September are contained in Figure 5.

Figure 5 Graphic Summary
In Figure 5 it can be seen that the Summary Report provides several important data for several statistical analyzes, however, in the case in question, the P-Value (P-Value) was analyzed, which, despite having given a value greater than 0.05 which indicates that the data follow a normal distribution, due to the low amount of data analyzed (21 for each Crack Detector face), it is necessary to assume that this indication is a false positive, since the error β, for this amount of data, is very high, so there is a need to treat the data as nonparametric data.
The data for the Kruskal-Wallis test for the month of September are contained in Figure 6.

Figure 6 Kruskal -Wallis Test
When assuming that the data follow a non-parametric distribution, the Kruskal-Wallis Test is performed in order to replace the "The one-way analysis of variance" (ANOVA). The hypotheses tested in the Kruskal-Wallis test, at a level of 5% significance, are: H0: the populations from which the samples came are identical; H1: the populations from which the samples came are not identical.
The P value provided in Figure 6 indicates that, at the 5% significance level, there is sufficient statistical evidence to reject the hypothesis (H0) that there are no significant differences between the Crack Detectors installed in the 12 different egg production lines.
The data from the Spearman Correlation Test for the month of September are contained in Figures 7 and 8.

Figure 8 Spearman Correlation for all data
After the result obtained by the Kruskal-Wallis test, the Spearman Correlation Test was performed with the aim of analyzing the interaction between Crack Detectors. The calculation of the Spearman coefficient was very effective, since it is very sensitive to outliers in continuous variables -which confirmed, once again, the thesis that there are differences between the samples and proves the Kruskal test month-to-month analysis.
The same tests and analyzes were used when comparing the average of the months, also obtaining results similar to those obtained in the analysis of Crack Detectors within each month.
The Outlier Test and its data are contained in Figure 9.  The Graphical Summary test data for the month-to-month analysis is contained in Figure 11.

Figure 11
Graphic Summary month-to-month In the same way as analyzed for the month of September ( Figure 5), Figure 11 provides the Summary Report among the observed months, analysis month by month. As well as performed for a single month, the P-Value (P-Value) of the sample, which, despite having also given a value greater than 0.05 indicating that the data follows a normal distribution, it was necessary to assume that this indication is a false positive due to the β error, with the need to treat the data as nonparametric data.
The data from the Kruskal-Wallis test for the month-to-month analysis are contained in Figure 12.

Figure 12
Kruskal -Wallis Test month-to-month Analogously to the test performed in Figure 6, the Kruskal-Wallis Test was performed in order to replace the "The oneway analysis of variance" (ANOVA) for the month-to-month analysis. The hypotheses tested in the Kruskal-Wallis test month by month are the same as those tested for a single month, namely: H0: the populations from which the samples came are identical; H1: the populations from which the samples came are not identical.
The P value provided in Figure 12 provides the same result as found previously, that, at the 5% level of significance, there is sufficient statistical evidence to reject the hypothesis (H0) that there are no significant differences between the Crack Detectors installed in the different 12 egg production lines.
The data from the Spearman Correlation Test of the month-to-month analysis are contained in Figures 13 and 14. In the month-to-month analysis, the objective is to assess whether, among the analyzed months, any of them stands out, positively or negatively, thus leading to a rejection of the H0 hypothesis of the Kruskal-Wallis test indicating that, at the 5% significance level, there is statistical evidence to affirm that there are significant differences between the 6 months analyzed.

Conclusion
With this study, it was initially sought to improve knowledge about Statistical Analysis applied to the quality of commercial egg production, showing the importance of applying statistical tests in order to verify and ensure the quality of the final product.
Analyzing the data that the Crack Detector Software provided in the six months, it was possible to conclude, through statistical tests, that the equipment and the months analyzed differ from each other, showing that a limitation of the study for data validation is based on calibration, which should have a standard for all equipment.
The present research work, although only using Crack Detector as equipment, presents as a practical implication that statistical tests, such as those presented, are important tools in support of quality assurance.
It is recommended for future work, the application of different statistical tools for the same data, followed by a critical comparison of the results, as well as the application of the same tests in other models of production systems, in order to validate the results of this work.