In 2024, a Riyadh-based healthcare provider implemented an AI-powered diagnostic tool from an international vendor. The system initially performed well, identifying early-stage conditions with impressive accuracy. But six months later, clinicians noticed something disturbing: the model's performance had degraded significantly for Middle Eastern patients, while maintaining accuracy for Western demographics. When the hospital raised the issue, the vendor couldn't explain why — their development team had no visibility into how the model was making decisions, and no clear path to fix the problem.
The hospital had done what any organization would do: they checked references, reviewed security certifications, and negotiated standard service-level agreements. But they missed what's become the defining challenge of AI procurement: the difference between traditional IT due diligence and AI vendor due diligence is profound, and getting it wrong carries risks that most organizations aren't prepared for.
Traditional software vendors sell you a product with defined behavior. AI vendors sell you systems that learn, evolve, and sometimes fail in ways nobody anticipated. In Saudi Arabia, where Vision 2030 is driving massive AI adoption across government and enterprise, the stakes are particularly high. The Kingdom has positioned itself as a global AI leader, but that ambition creates a pressing need for robust vendor governance frameworks that align with local regulations, cultural values, and national priorities.
Why AI Due Diligence Is Different
When you procure enterprise software, you're buying predictability. The vendor promises specific features, guarantees certain uptime, and commits to delivering patches on a schedule. If something breaks, there's a clear path to troubleshooting, and ultimately, someone to hold accountable.
AI systems turn this model on its head. A machine learning model isn't built through deterministic coding — it's trained on data, and its behavior emerges from patterns that even its creators may not fully understand. This creates three categories of risk that traditional due diligence doesn't address: opacity, drift, and emergent behavior.
Opacity means you can't simply audit the code to understand why the system made a particular decision. In traditional software, if a feature works incorrectly, developers can trace through the logic and fix it. With deep learning systems, especially black-box models, the internal reasoning is often inscrutable. This matters enormously for regulated industries in Saudi Arabia — healthcare, finance, and government sectors all require explainability for compliance.
Model drift refers to the degradation of performance over time as the data distribution in the real world diverges from the training data. The Riyadh hospital's experience is a textbook example. The model worked initially because the deployment environment matched its training conditions. But as patient demographics shifted, or as medical practices evolved, the model's performance eroded. Traditional software doesn't spontaneously get worse; AI systems do, and they do it silently.
Emergent behavior is perhaps the most alarming. When AI systems are deployed in novel contexts, they can develop capabilities or tendencies that weren't present in testing. A hiring AI might learn to discriminate based on zip code patterns, even when explicitly told not to consider location. A content moderation system might develop biases against certain dialects or cultural expressions. These aren't bugs in the traditional sense — they're properties that emerge from the interaction between the model and the environment.
For Saudi organizations, these risks compound with specific regulatory and cultural considerations. The Saudi Data & AI Authority (SDAIA) has been building a comprehensive governance framework, including the Personal Data Protection Law (PDPL) and sector-specific AI guidelines. But compliance isn't just about following rules — it's about ensuring that AI systems align with Islamic values, cultural norms, and national priorities. An AI that performs perfectly technically but produces outputs inconsistent with Saudi cultural sensitivities is, in practice, a failure.
Pre-Engagement Assessment
Before you even start talking to vendors, you need clarity on your own requirements and constraints. This sounds obvious, but most organizations skip it, diving straight into vendor demos and technical specs. In the AI context, skipping this step is especially dangerous because it's easy to get seduced by impressive demos without understanding the underlying assumptions and limitations.
Start by defining your risk tolerance. Not all AI applications carry equal risk. A customer service chatbot that sometimes gives unhelpful answers is fundamentally different from a medical diagnosis AI that might misidentify a tumor. Saudi regulators will view these differently, and your internal governance should too. Map out the potential harm scenarios: what's the worst thing that could happen if the system fails or makes a mistake? Who gets hurt? What are the legal, financial, and reputational consequences?
Next, articulate your data sovereignty requirements. Under Saudi Arabia's PDPL, certain categories of personal data must remain within the Kingdom. But the requirements go deeper than geography. You need to understand what data the AI vendor will need access to, how they'll use it, and whether it will be used to improve their models for other customers. Many vendors train on customer data by default, often buried in terms of service that nobody reads. For Saudi organizations, especially in government and critical infrastructure, this can be a non-starter.
Define your explainability requirements. How much do you need to understand about how the system reaches its conclusions? In some contexts, "it just works" might be acceptable. In others — healthcare, judicial decision support, credit scoring — you may need detailed explanations that can withstand regulatory scrutiny. This requirement dramatically affects which vendors and which types of models are suitable. Deep neural networks generally offer less transparency than decision trees or rule-based systems.
Finally, establish your performance monitoring framework before you commit to any vendor. You can't monitor what you haven't defined. What metrics will you track? How will you detect performance degradation? What thresholds trigger intervention? In traditional IT, uptime and response time are usually sufficient. For AI, you need domain-specific quality metrics that measure whether the system is actually doing what it's supposed to do, not just whether it's running.
The Critical Questions
When you engage with AI vendors, the conversation needs to go beyond features and pricing. You're entering a partnership that will determine whether the AI system enhances your operations or creates new problems. The following questions aren't meant to be exhaustive — they're a starting point that you should adapt to your specific context and risk profile.
Data Handling and Governance
What data was used to train the model, and where did it come from?
This is foundational. Models trained on predominantly Western data will often perform poorly for Saudi contexts — they may miss cultural nuances, struggle with Arabic language variations, or reflect values that don't align with local norms. Ask for a data sheet or model card that documents the training dataset composition. If the vendor can't provide this, that's a red flag.
Will our data be used to improve your models?
Many AI vendors use customer deployment data for ongoing model improvement. This can create powerful network effects, but it also means your proprietary data could benefit your competitors. For Saudi organizations, especially those handling sensitive government or personal data, this is often unacceptable. Negotiate clear terms about data usage, and consider requiring data isolation — your data used only for your instance, never for the vendor's general model training.
How is data handled during training and inference?
Data sovereignty is non-negotiable for many Saudi use cases. Verify where data is stored, processed, and transmitted. Does the vendor have data centers in Saudi Arabia? If not, do they have clear mechanisms to ensure compliance with PDPL? Consider requiring a data processing agreement that explicitly addresses Saudi data residency requirements.
Model Transparency and Explainability
Can you explain how the model reaches specific decisions?
For high-stakes applications, you need more than a black box. Ask for documentation on the model's architecture, key features, and decision logic. Some vendors provide interpretable model explanations — tools that highlight which factors influenced a particular prediction. If these aren't available, ask whether the vendor can provide them.
What are the model's known limitations and failure modes?
Every model has weaknesses. Responsible vendors will document these explicitly. What types of inputs does it struggle with? What are the edge cases where performance degrades? In what scenarios should the system defer to human judgment? If a vendor claims their model works perfectly for everything, they're either lying or they haven't tested thoroughly enough.
How do you detect and address bias?
Bias in AI systems can reflect and amplify existing inequalities. Ask about the vendor's bias testing methodology — what metrics do they use, how do they measure performance across different demographic groups, and what's their process for addressing disparities? For Saudi organizations, this includes ensuring fair treatment across different regions, nationalities, and other relevant dimensions.
Compliance and Certification
How does your system align with Saudi regulations, including PDPL and sector-specific guidelines?
This isn't a yes/no question — you need specifics. Does the vendor have experience with Saudi compliance requirements? Can they provide examples of how they've addressed PDPL provisions in similar deployments? For highly regulated sectors like healthcare and finance, ask about relevant certifications and audit trails.
What third-party audits or certifications does your organization hold?
Certifications like ISO 27001 for security, SOC 2 for controls, or industry-specific certifications can provide assurance, but they're not sufficient on their own. AI-specific certifications are still emerging, but responsible vendors should be engaging with frameworks like the EU AI Act risk categories or NIST's AI Risk Management Framework, even if they're not formally certified.
How do you handle regulatory changes?
AI regulation is evolving rapidly, globally and in Saudi Arabia. What's the vendor's process for monitoring regulatory changes and updating their systems accordingly? Can they provide examples of how they've adapted to previous regulatory shifts?
Liability and Accountability
Who is responsible when the system causes harm?
This is the question nobody wants to ask but everyone needs to. If an AI system makes a decision that causes financial loss, physical harm, or regulatory penalties, who bears liability? The contract should clearly delineate responsibilities. Many vendors try to limit liability significantly — push back on this, especially for high-risk applications.
What insurance coverage do you have for AI-related incidents?
Traditional general liability policies often exclude AI-related claims. Ask whether the vendor has specialized coverage for AI incidents, including errors and omissions insurance that covers algorithmic decisions. This is particularly important for healthcare, financial services, and other high-risk sectors.
What's your incident response process?
When things go wrong, time matters. What's the vendor's SLA for responding to incidents? Do they have a dedicated security and incident response team? Can they provide examples of how they've handled previous incidents?
Red Flags
Some warning signs should cause you to pause or walk away entirely. These aren't deal-breakers in every context — low-risk applications may tolerate some of these — but they warrant serious scrutiny.
Overpromising on accuracy: If a vendor claims 99% accuracy without extensive documentation and third-party validation, be skeptical. AI performance is highly context-dependent, and impressive numbers in controlled settings often don't translate to real-world deployment. Ask for validation in contexts similar to yours.
Lack of transparency: Vendors who can't or won't provide information about training data, model architecture, or evaluation methodologies are hiding something. Responsible AI development requires documentation and transparency. If they treat their model as a proprietary black box that you must trust blindly, that's a problem.
No human-in-the-loop capability: For any consequential application, there should be a mechanism for human review and override. If a vendor's system doesn't allow for human intervention or audit trails, they haven't designed for real-world deployment where things will inevitably go wrong.
Vague or nonexistent ethics policy: Every serious AI vendor should have a documented approach to ethical AI development and deployment. This doesn't need to be perfect, but it should exist and demonstrate serious thought about the social implications of their technology.
Pressure to sign without review: If a vendor tries to rush you through the contracting process, claiming their terms are non-negotiable, that's a red flag. AI contracts are complex and merit careful legal review, especially in regulated industries.
One-size-fits-all solutions: Saudi organizations have unique regulatory, cultural, and operational contexts. Vendors who claim their system works perfectly everywhere without customization either don't understand the local context or are overselling. The best vendors will engage with your specific requirements and constraints.
Contract Considerations
Standard software contracts aren't sufficient for AI procurement. You need provisions that address the unique characteristics of AI systems and the specific Saudi regulatory environment.
Performance metrics and SLAs: Move beyond uptime and response time. Define domain-specific quality metrics that actually measure whether the AI is doing its job effectively. These might include accuracy, precision, recall, fairness metrics across demographic groups, or other domain-appropriate measures. The contract should specify what happens if these metrics fall below thresholds — is there a right to terminate? Are there financial penalties? Who bears the cost of remediation?
Model update and versioning control: AI models evolve. The contract should specify who controls when and how models are updated. Can the vendor push updates without your approval? Do you have the right to roll back if an update causes problems? What testing is required before updates are deployed? These questions are especially critical in regulated industries where changes to decision-making systems may require regulatory approval.
Data ownership and usage rights: Be explicit about who owns what. Your data remains yours — ensure the contract reflects this. If the vendor uses your data for model improvement, define the scope and get appropriate compensation. Consider requirements for data deletion when the relationship ends.
Liability and indemnification: AI systems can cause harm in unexpected ways. The contract should clearly allocate liability for different types of harm. Consider requiring broad indemnification for AI-related incidents, especially if the vendor is training models that learn from your data.
Audit rights: You need visibility into how the AI system is operating. The contract should give you the right to audit the vendor's compliance with the agreement, including access to relevant logs, metrics, and documentation. For highly regulated industries, consider regular third-party audits.
Termination and transition: If you need to switch vendors, what happens? Can you export your data and any custom models? What about the vendor's pre-trained components that you've fine-tuned? The transition plan should be documented in the contract, including data portability and service continuity requirements.
Regulatory compliance representations: The vendor should represent that their system complies with applicable Saudi regulations, including PDPL and any sector-specific requirements. These representations should be backed with warranties and, where appropriate, third-party certifications.
Ongoing Monitoring
Signing the contract is just the beginning. AI systems require continuous monitoring to ensure they're performing as expected and not developing harmful behaviors. This isn't optional — it's a fundamental part of responsible AI deployment.
Establish a baseline: Immediately after deployment, establish a performance baseline across all your key metrics. This includes accuracy measures but also operational metrics like throughput, latency, and resource utilization. Document the baseline formally — it's essential for detecting drift over time.
Implement automated monitoring: Set up automated systems that continuously track performance metrics and alert you to anomalies. The specific alerts depend on your application, but common ones include accuracy degradation, changes in output distribution, unexpected error rates, and bias indicators. For high-stakes applications, consider implementing shadow mode testing where the AI runs in parallel with human decision-makers to compare performance.
Conduct regular human reviews: Automated monitoring catches quantitative changes, but qualitative issues require human judgment. Regularly sample outputs and have domain experts review them for quality, appropriateness, and potential issues. This is especially important for generative AI systems where output quality is more subjective.
Monitor for concept drift: Concept drift occurs when the relationship between inputs and outputs changes over time. This can happen because the underlying patterns in the world shift, because your user base evolves, or because the operational context changes. Statistical process control techniques can help detect drift, but domain expertise is essential for interpreting whether drift is problematic.
Track user feedback: End users are often the first to notice problems. Implement systematic feedback collection and establish a process for reviewing and acting on feedback. Consider creating a cross-functional AI governance committee that meets regularly to review performance, user feedback, and emerging issues.
Plan for periodic re-evaluation: Schedule periodic comprehensive reviews of the AI system, including whether it still meets your needs, whether better alternatives have emerged, and whether the risk profile has changed. For fast-moving AI technologies, annual reviews may be insufficient — consider quarterly or more frequent reviews for high-impact systems.
The Saudi Context
Saudi Arabia's AI ecosystem is developing rapidly, driven by Vision 2030 and significant investment from the government and private sector. This creates both opportunities and challenges for organizations procuring AI systems.
SDAIA's Role: The Saudi Data & AI Authority has become the central body for AI governance in the Kingdom. They've developed guidelines, frameworks, and regulations that organizations must navigate. Stay informed about SDAIA's evolving requirements — what was acceptable last year may not be today. For major deployments, engage with SDAIA proactively to ensure alignment with national priorities.
Data Residency Requirements: Saudi Arabia's PDPL includes provisions that restrict the transfer of personal data outside the Kingdom without appropriate safeguards. This affects AI vendors who process or store personal data. Many international vendors have established regional data centers in response, but you need to verify compliance explicitly in contracts and technical architecture.
Local vs. Global Vendors: Both local and international vendors have strengths and weaknesses. Local vendors understand the Saudi context — cultural nuances, regulatory requirements, local business practices. They may also have easier access to relevant datasets and talent. However, they may have less mature technology, smaller R&D budgets, and fewer reference implementations.
International vendors often have more mature technology and extensive experience globally, but they may lack deep understanding of Saudi context. They may also be less flexible in accommodating local requirements. The best approach depends on your specific use case — for highly contextual applications, local vendors may be preferable. For more generic applications with lower cultural sensitivity, international vendors with strong local presence may be appropriate.
Cultural Alignment: Beyond regulatory compliance, AI systems must align with Saudi cultural values and Islamic principles. This includes considerations around modesty, gender interactions, content appropriateness, and other cultural factors. Vendor selection should include evaluation of cultural sensitivity — ask about their experience in similar cultural contexts and their process for ensuring cultural alignment.
National Priority Alignment: Vision 2030 identifies specific sectors for AI development and adoption, including healthcare, energy, transportation, and government services. Consider how vendor solutions align with these national priorities. Vendors who demonstrate commitment to Saudi Arabia's AI ecosystem — through local partnerships, investment, or contributions to national initiatives — may offer better long-term value.
Moving Forward
AI vendor due diligence in Saudi Arabia isn't just about risk mitigation — it's about building the foundation for successful AI adoption that supports Vision 2030's goals. Organizations that approach vendor selection thoughtfully, with clear requirements and robust governance frameworks, will be better positioned to realize AI's benefits while managing its risks.
The technology will continue to evolve rapidly. What matters most is building organizational capabilities — the processes, expertise, and relationships — that enable effective AI procurement and governance. Start small, learn iteratively, and scale what works. Document your decisions and lessons learned, not just for your own organization but to contribute to the broader Saudi AI ecosystem.
The most successful AI deployments will be those where organizations and vendors work as true partners, with shared commitment to responsible development, transparent communication, and continuous improvement. In this partnership model, due diligence isn't a hurdle to overcome — it's the foundation for a relationship that can adapt and grow as the technology evolves.
Saudi Arabia has the opportunity to demonstrate global leadership in responsible AI adoption. By approaching vendor diligence with rigor and foresight, organizations in the Kingdom can build AI systems that deliver real value while upholding the highest standards of safety, ethics, and cultural alignment.
Published by PeopleSafetyLab — AI safety and governance research for KSA organizations.