Understanding temperature-rain data using ID3 based concept reduction technique in FCA

Results on ID3 values

In this section we compute the ID3 values for the many-valued temperature-rain data presented in Table 7.The readers are advised to refer18,21,22 to know about the computation process of entropy values for the attributes. Using Eqs. (1) and (2) from section 4, we get the following results.

Information entropy H(S)=0.9501

IG (Winter): 0.9501–0.7254 = 0.2247

IG (Summer): 0.9501–0.8102 = 0.1399

IG (Monsoon): 0.9501–0.8917 = 0.0584

IG (Autumn): 0.9501–0.8455 = 0.1046

ID3 based precedence order of attributes

An attribute that has a greater precision value is ranked highly since it is important in the decision-making context. According to this rule, we get Winter > Summer > Autumn > Monsoon. The precedence order of attribute according to the gain measure is W > S > A > M. The notation W > S means that W is more important than S.

The information gain measure gives the quantifies the amount of information present in a datasets. The higher information gain values attribute to the higher level of useful data present in the datasets. So the priority of seasons W > S > A > M spell out that winter season has higher impact on the rainfall whereas monsoon has the least impact of the rainfall.

The variation of the results to the reality regarding the ranking that Winter > Summer > Autumn > Monsoon, is due to the individual seasons’ contribution to the overall rain yield to the entire year. It’s true that the monsoon seasons have heavy rainfall while compared with the other seasons. But it happens almost uniformly for all the years. So, it has less important role in deciding the high rainfall for the year. At the same time the other seasons play a vital role in making high rainfall for the year. For example, though the summer has few rainfall days, its contribution to the rainfall of the year is significant since it changes the rainfall yield from lower to higher level for the year. The same explanation can be made for the other seasons, too.

Table 8 Concepts obtained by using the formal context in Table 9

The set of discarded precedence relation order is then listed. According to the precedence relation, A>M implies that attribute A is the direct predecessor of attribute M.

Precedence relation-based discarded concepts

A>M

C20, C21, C26, C27, C28, C29, C30, C31, C32, C33, C36, C37, C38, C39, C40, C41, C42, C43, C44, C45, C46, C47, C48, C49.

S>A

C15, C18, C19, C20, C21, C22, C24, C26, C28, C29, C30, C31, C32, C33, C34, C35, C36, C37, C38, C41, C42, C43, C44, C45, C46, C47, C48, C49.

W>S

C16, C17, C18, C19, C22, C23, C24, C25, C28, C29, C30, C31, C32, C33, C34, C35, C36, C37, C38, C39, C40, C42, C43, C44, C45, C46, C47, C48, C49.

The following is the compressed concepts for each pairwise precedence relation.

Concept retained after precedence relation

A>M

C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C22, C23, C24, C25, C34, C35, C46.

S>A

C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C16, C17, C23, C25, C27, C39, C40.

W>S

C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C20, C21, C26, C27, C41.

The final reduced collection of concepts includes only those concepts that obey to the precedence order relation with regard to at least one precedence order pair of characteristics. Concepts are eliminated from the set if they do not follow the precedence order relation of attributes with regard to any of the precedence order pairs.

33 concepts have been reduced, as follows

C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C15, C16, C17, C18, C19, C20, C21, C22, C23, C24, C25, C26, C27, C34, C35, C39, C40, C41, C46.

In any given situation, we expect that an object’s attributes should adhere to at least one precedence order pair relation. Therefore, this must also be true for every concept’s attribute or intent. The union of these concepts resulting from each precedence order pair must then be the intended set of final compressed concepts. Thus out of the 50 total number of original concepts, after reduction the number of concepts obtained is 33.

Measurement of quality of compressed concepts

In this section, determine each attribute’s weight, probability, and expectation using Eqs. (3) and (4) we get the below Table 9.

Table 9 Acquisition method for the single intent weight value.

Next, we use Eqs. (5) and (6) to calculate the relevant of concepts, which is the Intent Average Value (IAV), weight, and deviation for each concept. The outcomes are shown in Table 10 .

Table 10 The weight values of intents and their deviation values for the original concepts.

In Table 11, we tabulate the calculation related to reduced concepts.

Table 11 The retained concepts of importance deviation value and intent weight value.

Validation process-calculation of percentage error

The results are validated with the help of error measure. The error rate represents the percentage error of deviance for the compressed concepts with respect to the original ones for he temperature-rain data.

\(\sum D(I) = 0.7760\), \(d=0.7760/50 = 0.0155\) and \(\sum D(I^{\prime }) = 0.5840\), \(d^{\prime } = 0.5840/33 = 0.0177\) Where I, \(I^{\prime }\) stands for the intents of original and compressed concepts respectively.

Here v = \(max\{d,d^{\prime }\}\) = \(max\{0.0155, 0.0177\}\) = 0.0177.

Using these values, in equation(7), we get

\(\delta\) = \(\left| \frac{d-d^{\prime }}{v}\right| \times 100\)

\(= 0.1248 \approx 12\%\)

It is observed that there is a negligible percentage error \(\delta =12\%\) of weight deviance between compressed concepts. Since the percentage of error \(\delta , the threshold level for better model which we discussed earlier, we can conclude that the removed concepts have higher deviations by means of their intent weights than the existing (compressed) concepts.Consequently, we can draw the conclusion that the compressed concepts can be regarded as more relevant.

FCA can be an effective technique for investigating and displaying correlations particularly with smaller, more precisely defined datasets with distinct categories. However, when the dataset is huge, the representation of data with the use of FCA becomes more complex and difficult to understand. To overcome such limitations, preprocessing (such as discretisation) and supplementary techniques may be needed when applying it to continuous, sizable, or noisy datasets. This preprocessing stage is adopted by the authors in this article also. Similarly, ID3 algorithm also has limitations with regard to scalability, handling continuous data, and preventing overfitting, even though it can be helpful in some situations, particularly with smaller, simpler datasets and classification problems. Such issues are resolved here in the preprocessing stages.

Climate data is not uniform throughout any region or country. So the accuracy of data lacks as the region is expanded. Se we have collected data from a particular place in order to understand an analysis the data more accurately. If needed the analysis can be carryout for any specific region.

We have considered 15 years of temperature and rain data. Table 11 contains the important concepts that are retained. In our analysis, we find that the attribute ML (Monsoon Low temperature) occurs 21 times and it is mostly available in the 15 years period which implies that it is a vital attribute in the rainfall prediction. Next important attribute is SH (Summer High temperature) which is found 16 times. In a similar way we can understand the importance of remaining attributes. We believe that the analysis of the reduced concepts can help the farmers and other service providers who need the rainfall prediction data in understanding climatic changes that happen before rainfall.

Understanding temperature-rain data using ID3 based concept reduction technique in FCA

Tags: