Sustainable EV adoption with clustering and predictive modelling for optimal charging infrastructure in the West Midlands and North East UK

This study leverages the ISE-CAP framework, a comprehensive methodology designed to analyse EV user behaviours, predict charging preferences, and identify influential factors affecting charging station selection, user satisfaction, and infrastructure optimisation. This study provides actionable insights into optimising EV infrastructure across the North East and the West Midlands by integrating clustering, predictive modelling, and feature importance analysis using state-of-the-art ML techniques. The methodological components are outlined in detail below.

Methodological rationale and framework linkages

The development of the ISE-CAP framework is motivated by the need to bridge behavioural analysis and infrastructure optimisation within a unified analytical pipeline. Existing EV studies frequently apply clustering, predictive modelling, or optimisation independently, limiting their ability to translate behavioural insights into adaptive infrastructure planning.

The rationale for integration in this study is structured as follows: First, behavioural clustering is employed to identify distinct EV user segments based on charging habits, socio-economic attributes, and travel patterns. This segmentation establishes the heterogeneity structure within each region.Second, predictive modelling is applied to estimate charging behaviour outcomes, enabling quantification of demand variability and behavioural intensity within and across clusters.

Third, XAI is incorporated to interpret model outputs and identify the most influential features associated with charging preferences. This step enhances policy interpretability by translating model predictions into understandable behavioural drivers. Fourth, simulation-based RL is introduced as an optimisation layer that utilises predicted demand and behavioural patterns to evaluate adaptive charging station placement strategies. The integrated structure ensures that clustering informs prediction, prediction informs explainability, and explainability informs optimisation. This sequential linkage establishes a coherent decision-support framework rather than a collection of independent modelling exercises.

Data analysis

Table 2 provides a detailed demographic comparison between the North East and West Midlands, covering key categories such as gender, age distribution, occupation, education level, and annual household income. Descriptive percentages reported for EV adoption motivations were computed using frequency distributions within each regional sample. To assess whether regional differences in adoption drivers were statistically significant, chi-square tests of independence were conducted. Differences were considered statistically significant at p < 0.05.

In terms of gender, the North East has a higher proportion of males (56.5%) compared to the West Midlands (48.5%), whereas the West Midlands has a greater percentage of females (51.5%). Non-binary representation remains minimal in both regions.

Table 2 Comparison of demographic counts and percentages for North East and West Midlands.

The age distribution highlights that the West Midlands has a higher proportion of individuals aged 25–34 (41.7%), compared to 37.1% in the North East. The North East has a greater share of individuals aged 35–44 (31.5%) than the West Midlands (21.2%). The proportion of those aged 45 and above is relatively low in both regions, with the highest representation being in the 45–54 range (14.5% in the North East and 16.7% in the West Midlands). Occupationally, most respondents in both regions are employed, with the North East reporting a higher employment rate (85.5%) compared to the West Midlands (81.8%). Self-employment is slightly more prevalent in the West Midlands (6.1%) than in the North East (4.0%). Retirement and unemployment rates remain low, with the North East having 2.4% retired individuals compared to 1.5% in the West Midlands.

Regarding education, a larger proportion of individuals in the North East hold a bachelor’s degree (54.8%) compared to the West Midlands (47.0%). Master’s degree holders are similarly distributed in both regions (20.2% in the North East and 20.5% in the West Midlands), while those with doctorate-level education are relatively rare. Income distribution varies significantly between the two regions. In the North East, 28.2% of households fall within the $\pounds$50,001–$\pounds$70,000 bracket, whereas in the West Midlands, a significant 48.5% earn over $\pounds$70,000, showing a notable income disparity. Conversely, lower-income brackets (less than $\pounds$35,000) are more prevalent in the North East compared to the West Midlands The majority of respondents reported owning a single EV at the time of participation. The survey instrument focused primarily on charging behaviour, infrastructure utilisation, and adoption motivations, and did not systematically model the intention to purchase a second EV as a core analytical variable. Future research may extend this framework by incorporating second-EV adoption intention and longitudinal ownership dynamics.

Chi-square tests of independence were conducted to evaluate regional differences in EV adoption motivations and charging preferences. For adoption motivation, the association between region and primary motivation was statistically significant ($\chi ^{2}(4) = 12.87$, $p = 0.012$), with a moderate effect size (Cramér’s $V = 0.224$). For charger type preference (AC vs. DC), no statistically significant regional difference was observed ($\chi ^{2}(1) = 1.94$, $p = 0.164$). Bootstrapped 95% confidence intervals were computed for proportional differences to quantify uncertainty. These results provide formal statistical support for the descriptive comparisons presented in Figures 1–3.

Sampling strategy and representativeness

The study dataset consists of 256 EV users (North East: 124; West Midlands: 132) who participated in a structured survey designed to capture behavioural, socio-economic, and charging preference characteristics. Participants were eligible if they were current EV owners residing within the respective regions at the time of data collection. Respondents were recruited through a combination of regional EV user groups, university-affiliated outreach networks, local sustainability forums, and online community platforms focused on electric mobility. Participation was voluntary, and no financial incentives were provided. The sampling strategy therefore reflects a non-probability, self-selection approach. The dataset was not designed to be nationally representative of all UK EV users. Rather, it provides a structured behavioural sample from two specific regions. While certain demographic characteristics broadly align with regional EV ownership trends, the sample should be interpreted as regionally indicative rather than statistically representative of national EV registration distributions.

While this recruitment method enabled targeted engagement with active EV users, it may introduce self-selection bias, as individuals with stronger engagement in EV communities or higher digital accessibility may be overrepresented. As a result, the sample may not fully represent the broader regional EV population, particularly individuals with limited digital engagement or lower socio-economic accessibility. To assess potential confounding effects, socio-demographic variables including income level, education, occupation, age, and gender were incorporated into the clustering and predictive modelling pipeline. These variables were standardised and included as model features to account for behavioural variability linked to socio-economic differences. This approach mitigates confounding effects within the modelling process, although it does not eliminate sampling bias at the population level.

The income distribution indicates a relatively higher representation of upper-income households in the West Midlands sample, while mid-income groups are more prevalent in the North East. This imbalance reflects regional economic differences but may also influence charging behaviour patterns and model generalisability. Consequently, the findings should be interpreted as regionally indicative rather than statistically representative of the entire EV-owning population in the UK. Future research may benefit from stratified random sampling or integration with national EV registration datasets to enhance representativeness and external validity. Despite these limitations, the dataset provides a robust behavioural cross-section of active EV users in both regions and enables meaningful comparative modelling of regional charging patterns and infrastructure needs.

The North East and West Midlands were selected due to their contrasting urban structures, socio-economic profiles, and infrastructure distributions, providing analytically meaningful regional heterogeneity. The North East is characterised by a relatively compact urban layout with concentrated charging infrastructure, whereas the West Midlands encompasses metropolitan centres alongside suburban and rural areas with more dispersed travel patterns. Additionally, income distribution and mobility behaviours differ significantly between the two regions. These contrasts enable comparative modelling of homogeneous versus heterogeneous EV usage environments, enhancing policy relevance for region-specific infrastructure planning within the UK decarbonisation strategy.

The voluntary and community-based recruitment approach may introduce selection bias, as participants with stronger engagement in EV networks or higher digital accessibility may be overrepresented. Consequently, the sample may not fully represent the entire EV-owning population in each region. The findings should therefore be interpreted as behaviourally indicative within the sampled cohort rather than statistically representative of the broader UK EV population.

Analysis of EV user motivations and preferences

Figure 1 presents a comparative analysis of EV ownership and usage patterns between respondents from the North East and the West Midlands regions. The four subplots highlight key aspects, including motivations for EV adoption, types of EVs owned, their approximate all-electric range, and the duration for which respondents have been driving an EV. A chi-square test was conducted to examine regional differences in BEV versus PHEV ownership. The association between region and vehicle type was not statistically significant ($\chi ^{2}(1) = 0.84$, $p = 0.36$), indicating comparable ownership distributions across the two regions.

Fig. 1 Fig. 1 The alternative text for this image may have been generated using AI.

Comparison of EV ownership factors between North East and West Midlands.

The first subplot, located at the top-left, illustrates the motivations behind EV purchasing. Cost savings are the most dominant factor, particularly in the North East, where approximately 65 respondents cited this as their primary reason. In contrast, environmental concerns are more prominent in the West Midlands, influencing about 30 respondents. Around 15 individuals in the North East and 12 in the West Midlands are mentioned as having technological interests. Government incentives and other factors appear to have a minor impact, with fewer than 10 respondents selecting these options in each region. These results indicate that financial savings remain the leading driver for EV adoption, while environmental concerns are slightly more influential in the West Midlands.

The top-right subplot categorises EV ownership by type. The majority of respondents in both regions own battery electric vehicles (BEVs), with approximately 85 individuals in the West Midlands and 80 in the North East choosing this type. Plug-in hybrid EVs (PHEVs) represent a smaller share, with around 40 respondents in each region opting for this alternative. These findings suggest a growing preference for fully EVs, with minimal variation between the two regions.

The bottom-left subplot presents the approximate all-electric range of respondents’ EVs. The most common range is 200–300 km, reported by approximately 40 respondents in the North East and 35 in the West Midlands. The second most frequent category is 100–200 km, selected by nearly 30 respondents in the North East and 25 in the West Midlands. Around 18 respondents in the North East and 15 in the West Midlands own EVs with a 300–400 km range. Conversely, fewer than 10 respondents in either region have EVs with a range below 100 km or exceeding 500 km. These results indicate that most EV users prefer vehicles within the mid-range category, with only a limited number opting for long-range or low-range alternatives.

The final subplot at the bottom-right examines how long respondents have been driving an EV. The most common ownership duration is between 1–3 years, with approximately 47 respondents in the West Midlands and 40 in the North East falling into this category. Similarly, the 6 months to 1 year category includes around 45 respondents in the West Midlands and 38 in the North East. A notable number of respondents in the North East (approximately 33) have been driving an EV for less than 6 months, compared to around 22 in the West Midlands. Meanwhile, long-term EV ownership (more than 3 years) remains the least common, with fewer than 20 respondents in each region. These findings suggest that EV adoption has increased in recent years, with most users having less than three years of experience.

Analysis of EV charging behaviour and travel patterns

Figure 2 below compares EV charging and travel behaviour between North East and West Midlands respondents. The six subplots illustrate weekly charging frequency, daily travel distance, maximum waiting time tolerance, charging time of day, general charging frequency, and average charging duration. Each subplot compares trends between the two regions using line graphs, with the North East represented in blue and the West Midlands in orange.

Fig. 2 Fig. 2 The alternative text for this image may have been generated using AI.

Comparison of EV charging behaviour and travel patterns between the North East and the West Midlands.

The top-left subplot represents the weekly charging frequency. Most respondents in both regions charge their EVs once or twice a week, with approximately 65 responses in the West Midlands and slightly fewer in the North East. As the charging frequency increases, the number of responses declines, with fewer than 10 respondents in either region charging daily. The trend remains consistent across both regions, indicating that most EV users prefer a moderate charging routine. The top-right subplot illustrates daily travel distance. In the West Midlands, most respondents travel 10 km to 30 km per day, with nearly 50 responses. However, the North East exhibits a more evenly distributed pattern, with a significant portion of respondents travelling between 50 km and 100 km. The number of users travelling less than 10 km per day is higher in the North East compared to the West Midlands. Fewer than 10 respondents in either region report travelling more than 100 km daily, suggesting that long-distance travel is relatively uncommon among EV users.

The middle-left subplot presents the maximum waiting time tolerance for charging. Around 65 respondents in the West Midlands prefer a waiting time of 10 to 20 minutes, compared to approximately 50 in the North East. The least tolerated waiting time is 20 to 30 minutes, with fewer than 15 responses in each region. Interestingly, both regions exhibit a peak in responses for a waiting time of less than 10 minutes, indicating that many users prefer rapid charging. Fewer than 10 respondents in each region are willing to wait more than 30 minutes for charging. The middle-right subplot illustrates the preferred charging time of day. Respondents in the North East prefer charging between 6 pm and 12 am, whereas the West Midlands respondents prefer the 12 am to 6 am period. Charging activity is least common between 12 pm and 6 pm in both regions, indicating that daytime charging is generally avoided. The number of responses increases significantly in the evening and early morning hours, suggesting a strong preference for off-peak charging.

The bottom-left subplot introduces general charging frequency, showing a more detailed breakdown of charging habits. Most respondents charge weekly, with approximately 50 responses in both regions. Monthly charging is also notable, particularly in the West Midlands, where nearly 45 respondents charge monthly. The proportion of respondents rarely charge their EVs is slightly higher in the North East. The number of users who never charge is minimal, further reinforcing that routine charging is an essential aspect of EV ownership. The bottom-right subplot depicts the average charging duration. The most common charging duration is between 3 and 6 hours, as reported by nearly 50 respondents in both regions. A significant proportion of users also report charging for 1 to 3 hours. The least common charging duration is less than 1 hour, with fewer than 10 responses in each region. A few respondents, particularly in the North East, charge for more than 6 hours, indicating occasional long-duration charging sessions.

Overall, the data indicate that most respondents charge their EVs one to two times per week, travel between 10 km and 30 km daily, and prefer a maximum waiting time of 10 to 20 minutes. The preferred charging period is either in the evening (6 pm to 12 am) or early morning (12 am to 6 am), with longer charging durations of 3 to 6 hours being the most common. Charging frequency trends reveal that weekly and monthly charging are dominant, with few users relying on daily charging. The findings highlight a strong preference for efficient, scheduled charging patterns, minimising waiting times while ensuring adequate battery levels.

Charging preferences

Figure 3 compares EV charging preferences between North East and West Midlands respondents. The horizontal bar chart represents the number of responses for each category, with blue indicating data from the North East and red representing the West Midlands.

Fig. 3 Fig. 3 The alternative text for this image may have been generated using AI.

Comparison of EV charging preferences between the North East and West Midlands.

The chart reveals several key insights into user preferences regarding charging infrastructure. The most preferred method for obtaining information about charging station availability is through mobile applications, with 73 respondents in the West Midlands and 63 in the North East choosing this option. In-car navigation systems are also widely used, with 49 responses from the West Midlands and 44 from the North East, while online maps remain the least popular choice.

When asked which type of charger they would prefer if their vehicle had only 30% battery remaining, most in both regions chose DC chargers, with 100 responses in the West Midlands and 97 in the North East. On the other hand, only 32 respondents in the West Midlands and 25 in the North East selected AC chargers, highlighting a strong preference for faster charging options.

The chart also illustrates the factors considered when selecting a charging station. Charging duration is the most critical factor, with 87 respondents in the North East and 82 in the West Midlands prioritising this consideration. Cost also plays a significant role, particularly among West Midlands respondents, where 35 participants cited it as a key factor compared to 24 in the North East. The availability of chargers either on-route or at the end of the next trip was deemed less critical, with fewer than 10 respondents in each region selecting these options.

Regarding the willingness to travel to a charging station, most respondents in both regions prefer to travel between 1 km and 3 km, with 66 responses from the West Midlands and 53 from the North East. The second most common category is 3 km to 6 km, with 43 responses from the West Midlands and 34 from the North East. A small percentage of respondents indicated they are willing to travel more than 9 km, with just 4 to 5 responses in both regions.

Overall, these findings suggest that EV users in both regions prioritise fast-charging solutions and convenience, relying on mobile applications to locate chargers and preferring to travel short distances for charging. While charging duration remains the primary concern, cost considerations appear slightly more relevant in the West Midlands than in the North East. These insights can help policymakers and charging infrastructure providers optimise network placement and service offerings.

Analysis of travel and charging behaviour of EV users

Figure 4 compares travel and charging behaviour among EV users in the West Midlands and the North East. The violin-box plots illustrate the distribution and variability of responses across six key factors: time spent on the road, recharge level, trip planning, rating of charging station distribution, parking influence, and parking convenience. The left-hand side of each sub-figure corresponds to the West Midlands, while the right-hand side represents the North East.

Fig. 4 Fig. 4 The alternative text for this image may have been generated using AI.

EV Charging patterns and parking preferences in the West Midlands and North East.

The first plot highlights the impact of charging station availability on users’ time on the road. A considerable portion of respondents in both regions reported that charging stations neutralise their travel time. However, there are noticeable differences in the distribution, with some users in the North East indicating a slightly greater tendency for charging stations to reduce their travel time compared to those in the West Midlands. The second plot represents the typical charge level users decide to recharge their EVs. In both regions, most users recharge their vehicles between 20% and 80%, with a strong concentration around the 40% to 60% range. However, users in the North East show a slightly broader distribution, suggesting greater variability in their charging habits. The third plot illustrates how users incorporate charging stations into trip planning for journeys exceeding 50 km. The responses in both regions are relatively similar, with most users opting for charging stations already on a convenient route. A smaller proportion of respondents consider backup charging stations a contingency, while very few prioritise planning their entire route around station locations.

The fourth plot evaluates how respondents rate the distribution of EV charging stations along major routes. The data distribution suggests that respondents in the North East tend to provide slightly lower ratings for charging station accessibility and coverage compared to those in the West Midlands. This implies that users in the North East may perceive charging infrastructure as less evenly distributed or insufficient to meet their travel needs. The fifth plot examines whether charging station availability has influenced parking behaviours. Responses from both regions indicate that most users experience little to no impact on their parking habits due to charging station placement. However, some users in the North East suggest that they have to park further away due to charging station locations, whereas users in the West Midlands show a slight tendency towards improved parking convenience. The final plot addresses the perceived convenience of parking near charging stations compared to regular parking spaces. The responses are spread across different levels of convenience, but the distribution suggests that the North East users find parking near charging stations slightly less convenient compared to users in the West Midlands.

Data preprocessing

The North East and West Midlands datasets were preprocessed to ensure consistency and quality. Missing values were handled using variable-specific imputation strategies. Numerical variables with low missingness (<5%) were imputed using mean substitution, while categorical variables were imputed using mode replacement. The overall proportion of missing data was minimal, and no feature required exclusion due to excessive missingness. Categorical variables, such as user preferences and demographics, were encoded using one-hot encoding:

$$\begin{aligned} \textbf{X}_{\text {encoded}} = \left[ \textbf{x}_1, \textbf{x}_2, \dots , \textbf{x}_n\right] \end{aligned}$$

(1)

where $\textbf{X}_{\text {encoded}}$ represents the matrix of encoded features, and $\textbf{x}_i$ is the binary representation of each categorical feature.

Numerical variables, including income levels and charging durations, were standardised to ensure uniform scaling across features:

$$\begin{aligned} \textbf{X}_{\text {scaled}} = \frac{\textbf{X} – \mu }{\sigma } \end{aligned}$$

(2)

where $\mu$ and $\sigma$ denote the mean and standard deviation of each feature, respectively.

By combining Equations (1) and (2), the final preprocessed feature matrix $\textbf{X}_{\text {final}}$ was formulated as:

$$\begin{aligned} \textbf{X}_{\text {final}} = f(\textbf{X}_{\text {encoded}}, \textbf{X}_{\text {scaled}}) \end{aligned}$$

(3)

To address regional socio-economic imbalances, income level, education, and occupation were explicitly included as model input features in both clustering and predictive modelling. Numerical socio-economic variables were standardised prior to modelling to prevent scale dominance. By incorporating these attributes directly into the feature space, the modelling framework accounts for regional heterogeneity rather than excluding it, thereby reducing confounding effects in behavioural segmentation and prediction.

Clustering analysis

To evaluate clustering robustness, sensitivity analysis was conducted for k values ranging from 2 to 6. Silhouette scores, Davies-Bouldin Index, and Calinski-Harabasz scores were computed for each configuration. Additionally, cluster stability was assessed using 500 bootstrap resamples and Adjusted Rand Index (ARI) comparisons. The $k = 3$ solution demonstrated the highest average silhouette score and stable ARI ($> 0.82$ across resamples), confirming robustness of the three-cluster structure in both regions. To uncover patterns among EV users, K-Means clustering was applied to group users based on shared attributes related to charging behaviour, travel patterns, socio-demographic factors, and infrastructure accessibility. Specifically, clustering was performed on features including charging frequency, preferred charging location (home vs. public), travel distance per day, waiting time tolerance, and socio-economic variables such as income level and education.

To visualise the clustering patterns effectively, Principal Component Analysis (PCA) was employed to reduce the dataset’s dimensionality while retaining the data’s most significant variance. The PCA was performed on standardised numerical features, including charging session duration, travel distances, frequency of charging, charging location preferences, and willingness to travel for charging. The transformed PCA components were then used for plotting the clusters, providing insight into how different user groups are spatially distributed in the feature space. The clustering objective is defined as:

$$\begin{aligned} J = \sum _{i=1}^{k} \sum _{j \in C_i} \Vert \textbf{x}_j – \varvec{\mu }_{i}\Vert ^2 \end{aligned}$$

(4)

where k is the number of clusters, $C_i$ is the set of data points in cluster i, and $\varvec{\mu }_i$ is the centroid of cluster i.

Dimensionality reduction techniques, including PCA and t-Distributed Stochastic Neighbour Embedding (t-SNE), were employed to visualise clustering patterns in a lower-dimensional space. PCA captures the principal variance in the data:

$$\begin{aligned} \textbf{X}_{\text {PCA}} = \textbf{X}_{\text {final}} \cdot \textbf{W} \end{aligned}$$

(5)

where $\textbf{W}$ represents the matrix of eigenvectors corresponding to the top principal components.

To formally evaluate cluster quality, the Silhouette Score and Davies-Bouldin Index were computed. For the North East dataset, the average Silhouette Score was 0.62 and the Davies-Bouldin Index was 0.71, indicating well-separated and compact clusters. For the West Midlands dataset, the Silhouette Score was 0.55 and the Davies-Bouldin Index was 0.84, reflecting moderate but meaningful cluster separation consistent with the region’s behavioural heterogeneity.

Predictive modelling

The DL model consisted of a fully connected feedforward neural network with three dense layers. The first hidden layer contained 64 neurons with rectified linear unit (ReLU) activation, followed by a dropout layer (rate = 0.3). The second hidden layer included 32 neurons with ReLU activation and batch normalisation. The output layer used a linear activation function for regression of charging time. The model was trained using the Adam optimiser with a learning rate of 0.001, batch size of 16, and a maximum of 100 epochs. Early stopping with patience of 10 epochs was implemented based on validation loss. To prevent data leakage, predictor variables were restricted to independent behavioural, socio-economic, and preference-based features. No outcome-defining or derived charging duration variables were included among the predictors. Additionally, the train–test split was performed prior to feature scaling and preprocessing transformations, ensuring that parameter estimation for scaling was conducted exclusively on the training data. Three predictive models were developed to predict EV charging behaviours and preferences: RF, CatBoost, and a custom-designed DL model.

Both RF and CatBoost models were employed to predict EV charging preferences, including preferred charging station type (AC vs. DC), willingness to travel for charging, charging frequency, and waiting time tolerance. Additionally, these models were used to analyse feature importance, identifying key factors such as charging duration, cost, real-time station availability, travel distance, and user demographics (income, occupation, and education level) that influence charging behaviour. These ensemble models utilise decision trees and gradient boosting techniques to capture nonlinear relationships in the data, providing a robust approach for understanding user preferences and optimising EV infrastructure planning. Feature importance was evaluated using SHAP.

A neural network with dense layers, dropout, and batch normalisation was implemented to predict EV charging times based on survey data collected from respondents in the North East and West Midlands. The survey responses, which include charging frequency, preferred charging station type (AC/DC), average waiting time tolerance, typical travel distance, and socio-economic factors (income, education, and occupation), were preprocessed and used as input features for the model.

Although the dataset consists of 256 respondents, the DL architecture was intentionally designed as a compact fully connected neural network with limited depth and parameter count to align with the dataset scale. Regularisation strategies, including dropout and early stopping, were employed to mitigate overfitting risks. Given the structured tabular nature of the data, the modelling objective focuses on behavioural pattern estimation rather than large-scale high-dimensional representation learning. While the dataset size is moderate, cross-validation and held-out testing indicate stable generalisation performance within the analysed regional sample. Future research using larger multi-regional datasets would further strengthen statistical power and external validity.

To ensure robustness and prevent overfitting, the dataset was divided using an 80:20 train-test split, where 80% of the data were used for model training and 20% were reserved as a hold-out test set. Additionally, 5-fold cross-validation was performed on the training set during hyperparameter optimisation to enhance generalisation performance. For the DL architecture, early stopping and dropout regularisation were implemented to mitigate overfitting. All reported performance metrics ($R^2$, MSE, and MAE) correspond to evaluation on the unseen test dataset. Performance metrics included Mean Squared Error (MSE), R-squared ($R^2$), and Mean Absolute Error (MAE), defined as:

$$\begin{aligned} \text {MSE} = \frac{1}{n} \sum _{i=1}^{n} (y_i – \hat{y}_i)^2 \end{aligned}$$

(6)

$$\begin{aligned} R^2 = 1 – \frac{\sum _{i=1}^{n} (y_i – \hat{y}_i)^2}{\sum _{i=1}^{n} (y_i – \bar{y})^2} \end{aligned}$$

(7)

$$\begin{aligned} \text {MAE} = \frac{1}{n} \sum _{i=1}^{n} |y_i – \hat{y}_i| \end{aligned}$$

(8)

where $y_i$ represents the actual value, $\hat{y}_i$ is the predicted value, $\bar{y}$ is the mean of actual values, and n is the total number of samples.

Table 3 Summary of models, inputs, outputs, and performance metrics.

Table 3 provides a structured overview of the modelling pipeline within ISE-CAP, clarifying how clustering, prediction, classification, and optimisation components interact within the integrated framework.

To further ensure robustness and exclude potential overfitting, repeated $5\times 5$ cross-validation was conducted in addition to the 80 : 20 hold-out split. Scaling and preprocessing parameters were estimated exclusively on the training folds and subsequently applied to validation and test partitions to prevent data leakage. Bootstrapped 95% confidence intervals for $R^{2}$ and MSE were computed using 1,000 resamples of the test set. The North East model achieved a mean cross-validated $R^{2}$ of 0.981 ($\pm 0.012$), while the West Midlands model achieved 0.936 ($\pm 0.021$), indicating stable generalisation performance across folds. These additional validation steps confirm that the reported high $R^{2}$ values do not arise from leakage but reflect consistent regional behavioural regularity within the analysed dataset.

Evaluation of influential factors

SHAP analysis was conducted to determine the factors influencing EV user preferences. SHAP values quantify each feature’s contribution to model predictions:

$$\begin{aligned} \phi _j = \sum _{S \subseteq N \setminus \{j\}} \frac{|S|!(|N| – |S| – 1)!}{|N|!} \big [v(S \cup \{j\}) – v(S)\big ] \end{aligned}$$

(9)

where S is a subset of all features N, and v(S) is the model output for features in S.

Decision tree analysis

Decision trees were constructed to explore user preferences for AC versus DC chargers. The decision trees were constructed using the Gini impurity criterion for classification. The maximum tree depth was limited to 4 to enhance interpretability and prevent overfitting. A minimum of 10 samples per leaf node was required to ensure stability in split decisions. Cost-complexity pruning was applied to remove weakly informative branches, with the optimal pruning parameter selected using 5-fold cross-validation. Model validation was performed on the held-out test dataset to assess classification consistency. The decision rule at each node maximises information gain:

$$\begin{aligned} \text {Split} = \text {argmax}_j \Big [\sum _{i \in C_j} G(i)\Big ] \end{aligned}$$

(10)

where G(i) is the information gain for feature j at node i.

Dimensionality reduction and classification performance

PCA and t-SNE were used for dimensionality reduction to evaluate class separability, while a confusion matrix quantified classification performance. The accuracy of the classification model specifically, its ability to correctly predict EV user charging preferences, was calculated as:

$$\begin{aligned} \text {Accuracy} = \frac{\text {True Positives} + \text {True Negatives}}{\text {Total Samples}} \end{aligned}$$

(11)

PCA and t-SNE were employed for complementary purposes. PCA, as a linear dimensionality reduction technique, preserves the global variance structure of the dataset and enables assessment of overall class separability. In contrast, t-SNE captures nonlinear local relationships and visualises high-dimensional manifold structures, revealing cluster compactness and overlap patterns that may not be apparent in PCA projections. The combined use of these techniques provides a more comprehensive understanding of multidimensional behavioural patterns.

Implementation details

The LSTM component was implemented using a single hidden LSTM layer with 50 memory units, followed by a dense regression layer. The sequence length was defined as 5 temporal charging intervals. The model was trained using the Adam optimiser (learning rate = 0.001), batch size = 16, for a maximum of 100 epochs with early stopping (patience = 10).

The RL Q-learning update rule applied was:

$$Q(s,a) \leftarrow Q(s,a) + \alpha \left[ r + \gamma \max _{a’} Q(s’,a’) – Q(s,a) \right]$$

where learning rate $\alpha = 0.1$, discount factor $\gamma = 0.9$, and $\varepsilon$-greedy exploration rate $\varepsilon = 0.1$. A total of 1,000 simulation episodes were executed. The RL component was experimentally implemented within a simulation-based optimisation environment using predicted charging demand and travel distance metrics as input variables. The RL module was not deployed in a real-world infrastructure system but validated through iterative simulation episodes to evaluate reward convergence and station placement optimisation. Therefore, the RL results represent proof-of-concept experimental validation within a controlled modelling environment rather than field-level deployment outcomes. The ISE-CAP is a data-driven framework designed to analyse EV adoption trends, predict charging demand, and optimise charging station placement. As shown in Algorithm 1, the algorithm begins with data preprocessing, where the dataset $\mathcal {D}$, containing EV adoption, charging infrastructure, and socio-economic data, undergoes missing value handling, normalisation, and categorical encoding.

Algorithm 1The alternative text for this image may have been generated using AI.

ISE-CAP: EV clustering, demand prediction, and charging optimisation.

Next, K-Means clustering is applied to segment users based on charging behaviour, travel frequency, and socio-economic factors, with the optimal number of clusters k determined using the Elbow Method. Following this, charging demand prediction is performed using a LSTM model trained on historical charging station data, iteratively optimised through hyperparameter tuning and evaluated using MAE and $R^2$ scores. Feature importance analysis uses SHAP to identify critical determinants influencing EV charging behaviour, retaining high-impact features such as charging station proximity, pricing, and charging duration. ISE-CAP employs a RL model to enhance charging infrastructure that optimises station placement by computing a reward function based on demand, user travel distance, and grid impact. The model updates station locations iteratively, adjusting learning parameters when necessary. Finally, the framework generates policy recommendations, suggesting government incentives and targeted infrastructure improvements in regions with disparities while refining placement strategies where adoption is already balanced. The output of ISE-CAP includes clustered EV user groups, predictive charging demand insights, and strategic infrastructure recommendations, aiding policymakers and smart city planners in making data-driven decisions for sustainable EV adoption.

The RL optimisation module is formally defined as follows. The state space S consists of regional grid-demand vectors capturing predicted charging demand density, average user travel distance to nearest charger, and local grid load utilisation per candidate zone. The action space A comprises discrete station placement or relocation actions across predefined spatial cells. The reward function R(s, a) is formulated as:

$$R = \alpha D_s – \beta T_s – \gamma G_s$$

where $D_s$ represents demand satisfaction ratio, $T_s$ denotes average travel distance to chargers, and $G_s$ captures grid load imbalance. Parameters $\alpha , \beta , \gamma$ were tuned via grid search to balance infrastructure efficiency and grid stability. A Q-learning framework with $\varepsilon$-greedy exploration ($\varepsilon = 0.1$) was implemented. Convergence was defined as stabilisation of cumulative episode reward change below 1% over 50 consecutive episodes. This formal specification ensures methodological reproducibility.

Sustainable EV adoption with clustering and predictive modelling for optimal charging infrastructure in the West Midlands and North East UK

Tags: