This study was conducted in Iran during the 2024–2025 period, utilizing a methodological design for the development of the study instrument. The researchers aimed to design, evaluate, and validate a tool intended to measure the extent of AI usage in decision-making processes within healthcare organizations. Initially, the authors developed and constructed items for a preliminary questionnaire, informed by a previously published study relevant to the field. To ensure the instrument’s validity and reliability, a thorough evaluation was performed employing multiple methods, including assessments of face validity, content validity, construct validity, and reliability analysis. The study’s data reporting adhered to the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) guidelines21.
Research question
The research question was developed as: “What is the level of AI utilization in healthcare domains of healthcare organizations?”.
Study sample and sampling method
In this study, purposive and convenience sampling techniques were employed. To evaluate the quantitative face validity, the developed items were presented to a panel of 10 experts and 10 stakeholders, comprising professors and professionals in healthcare management, as well as employees from various organizations within the Iranian healthcare sector. Furthermore, input from four experts in the Persian language was incorporated to ensure the linguistic precision of the instrument’s items. During the content validity phase, 20 experts affiliated with the Iranian health system, including professors and health management professionals, participated in the assessment. For construct validity, a sample size equivalent to ten times the number of instrument items (120 individuals), composed of employees from organizations associated with the health system, was utilized. Finally, in the assessment of reliability, a group of 30 employees from health system-affiliated organizations was examined.
Phase one: item construction
In this phase of the study, the design of the questionnaire items for psychometric evaluation was informed by findings from a literature review conducted by the current authors, published in 2024 (details provided in Table 1). This literature review systematically examined review articles published in English between 2000 and 2024, sourced from databases including PubMed, Scopus, ProQuest, and Cochrane. Following a rigorous quality assessment of the selected studies using the Critical Appraisal Skills Programme (CASP) checklist, final studies were chosen for inclusion. Subsequently, thematic analysis was applied to synthesize and present the findings. Further details are available in the full article published on the journal’s website. To clarify, the CASP checklist is a structured tool used to assess the quality and validity of studies systematically, ensuring a critical appraisal of research evidence. On the other hand, the thematic analysis is a qualitative method that involves identifying and interpreting patterns or themes within the data through a systematic process, which facilitates rich and organized presentation of insights[9].
Phase two: validity
Instrument validity generally encompasses three primary types: face validity, content validity, and construct validity. Collectively, these forms of validity provide robust evidence that the instrument accurately measures the intended construct. Establishing validity is a crucial aspect of the instrument development process, ensuring that the measure is appropriate, relevant, and meaningful for its intended purpose and target population22.
Face validity
Face validity pertains to a subjective evaluation of a research instrument’s relevance, format, readability, clarity, and appropriateness for its intended audience. It represents the most fundamental form of validity, primarily grounded in the instrument’s appearance and overall presentation23,24,25.
In this phase of the study, four experts proficient in the local language (Persian) conducted a thorough review of the instrument items to identify and rectify any inappropriate wording and grammatical errors. During the process, the instrument items were revised in accordance with expert recommendations to improve clarity, readability, and to correct grammatical errors. Following this, a panel consisting of 10 professors and professionals in healthcare management, as well as 10 employees from various organizations within the Iranian healthcare sector, was engaged.
Subsequently, the ‘impact score’ method was applied to eliminate unsuitable items and quantitatively assess the validity of each item within the preliminary instrument. The impact score in face validity evaluations measures the relative importance of individual items within a measurement tool. The item impact score was computed according to the following formula[25]:
Item Impact Score = Frequency (percentage) × Importance.
In this context, “frequency” refers to the percentage of respondents who rated an item as important on a Likert scale, while “importance” corresponds to the mean rating assigned to that item. Typically, an impact score of 1.5 or greater is regarded as the minimum acceptable threshold for retaining an item within the instrument. This criterion signifies that at least 50% of respondents evaluated the item as important, providing a rating of 4 or 5 on a 5-point scale25.
The inclusion criteria for experts participating in this stage were as follows:
For experts proficient in the local language (Persian):
For professionals in healthcare management:
For employees within healthcare organizations:
The exclusion criterion was defined as:
Content validity
Content validity denotes the extent to which the selected items comprehensively and accurately represent the construct under measurement. The evaluation of content validity generally involves assembling a panel of experts who assess each item’s relevance and representativeness concerning the specific content domain being examined26.
In this study, content validity was evaluated utilizing Lawshe’s methodology. This method involves distributing questionnaires to a panel of experts, who assess the necessity and importance of each item within the instrument. Specifically, the Content Validity Ratio (CVR) and the Content Validity Index (CVI) were computed for each item to quantify the extent of expert agreement regarding the essentiality and relevance of the items. The CVR is computed using the following formula27:
$$\text{CVR=(Ne-N/2)/(N/2)}$$
Where:
An item with a CVR of 0.42 or above is considered relevant for inclusion28. Moreover, an item with a CVI of 0.7 or higher is considered relevant for inclusion, whereas items with CVI values below this threshold are typically recommended for removal29. During this phase of the study, items with borderline CVI scores were either revised or consolidated with other items, adhering to evidence-based guidelines. The CVI is calculated using the following formula25:
$$\text{CVI = Number of experts rating an item as relevant/Total number of experts}$$
A total of 20 professors and professionals specializing in health management participated in this stage. The inclusion criteria for expert participation were as follows:
The exclusion criterion was defined as:
Construct validity
Construct validity refers to the degree to which an instrument or test accurately measures the theoretical construct it is intended to assess. This type of validity is established by examining the relationships between the instrument and other measures of the same construct (convergent validity), as well as its associations with different, theoretically distinct constructs (discriminant validity), in alignment with established theoretical frameworks. Construct validity ensures that the measure behaves as expected according to theory, providing confidence that the instrument truly reflects the intended abstract concept rather than unrelated constructs30.
In the present study, exploratory factor analysis (EFA) was utilized to establish the construct validity of the instrument through the principal component analysis technique. EFA is a statistical method that enables researchers to condense a large set of variables or items into a smaller number of latent factors, thereby clarifying the underlying structure of the relationships among these variables. This technique offers evidence of construct validity by illuminating the internal composition of the instrument and the theoretical constructs it is intended to measure31.
The exploratory factor analysis (EFA) procedure typically commences with the selection of items grounded in theoretical justification and expert consultation. This is followed by statistical analysis to ascertain the extent to which these items load onto distinct factors. Factor loadings represent the magnitude of the association between each item and a specific factor; a loading exceeding 0.30 is commonly regarded as indicative of a moderate correlation31. In this study, a sample of 120 employees from healthcare organizations was employed, which was considered appropriate given the 12 items included in the instrument at this stage. This sample size conforms to the widely accepted guideline recommending a minimum of 5 to 10 participants per variable32,33.
Prior to conducting exploratory factor analysis (EFA), the suitability of the data for factor analysis was assessed using the Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy and Bartlett’s test of sphericity. The KMO statistic quantifies the proportion of variance among variables that may be common variance, with values ranging from 0 to 1. A KMO value above 0.5 is deemed acceptable for factor analysis, while values exceeding 0.8 indicate a high degree of suitability34. Bartlett’s test of sphericity evaluates whether the correlation matrix significantly differs from an identity matrix, thus determining if the variables exhibit sufficient correlation for factor analysis. A statistically significant result (typically p 35,36. All statistical analyses in this stage were performed using IBM SPSS version 27.0.1.
Upon confirmation of data suitability through the results of the Kaiser–Meyer–Olkin (KMO) measure and Bartlett’s test of sphericity, principal components were extracted and summarized using a Varimax rotated component matrix. During the process, the number of final components was determined based on the eigenvalues observed in the scree plot. Two authors independently reviewed the distribution of each item across the extracted components based on factor loadings and proposed labels for each component according to the thematic content of the items. Any discrepancies between the two authors were resolved through consultation with a third author. Moreover, during the data analysis process, missing data were addressed by calculating the mean of the available values for each item and imputing the missing values with the corresponding item mean.
The inclusion criterion for participants in this phase was:
The exclusion criterion was:
Phase three: reliability
Reliability pertains to the precision and consistency of the data obtained22. To assess the reliability of the study instrument, two statistical measures were employed: Cronbach’s alpha and the intraclass correlation coefficient (ICC).
Cronbach’s alpha (α) is a statistical measure of internal consistency, indicating the extent to which a set of items are interrelated and collectively form a cohesive scale. A Cronbach’s alpha value above 0.70 is generally considered acceptable, reflecting adequate homogeneity among the items. Values exceeding 0.80 suggest good reliability, while those above 0.90 indicate excellent internal consistency37. Conversely, the ICC serves as a reliability metric that assesses both the degree of correlation and agreement among measurements. ICC values below 0.5 are indicative of poor reliability; values between 0.5 and 0.75 denote moderate reliability; values from 0.75 to 0.9 represent good reliability; and values above 0.9 signify excellent reliability37. For this phase of the study, a sample comprising 30 employees from healthcare organizations was employed to assess the reliability of the instrument, consistent with the research published by Bujang et al. (2024) that identifies this number as the minimum sample size required for such analyses38.
The inclusion criterion for participants in this phase was:
The exclusion criterion was:
Ethical considerations
All study data will be retained and securely stored for a minimum period of one year following the publication of the article. The data analysis at all stages was performed by researchers who declared no conflicts of interest concerning the study topic or the organizations involved. Furthermore, in accordance with established ethical standards for research employing such methodologies, the analysis of the collected data was carried out in a manner that preserved participant anonymity and ensured strict confidentiality of all information. Informed consent was obtained from all participants prior to their inclusion in the study. Additionally, consistent with ethical principles, participants were afforded the right to withdraw from the study at any time without incurring any penalties or loss of benefits.