Automation Audit

Overview

Business process automation accumulates over time. What begins as a single integration between two systems grows into a network of scheduled jobs, webhook handlers, data transformation pipelines, and synchronisation processes that collectively keep an organisation's operational data consistent and its manual workload manageable. Each addition makes sense in isolation. The collective result, over months and years of incremental development, is often a system that nobody has a complete picture of — where the dependencies between automated processes are poorly documented, where failure modes are unknown, where a single process failure can silently cascade into data inconsistencies that are difficult to detect and expensive to correct.

An automation audit is a structured review of an organisation's automation and integration landscape — the processes that run without direct human initiation, the systems they connect, the data they move, and the failure modes that current monitoring and error handling does or does not address. The audit produces an accurate map of the automation estate, an assessment of its reliability, and a prioritised set of improvements that reduce operational risk and improve visibility.

The audit is not limited to technically complex automation. Simple scheduled jobs, basic email forwarding rules, and manual-trigger integrations all count — the complete picture of what runs automatically matters because incomplete visibility is itself an operational risk. The process that runs automatically and nobody is monitoring is the process that fails silently for months before someone notices that the data it was maintaining has been wrong.

We perform automation audits for businesses with established operational systems — organisations that have been using software for their operations long enough that automation has accumulated organically, teams that have experienced automation failures and want to understand their exposure, and businesses that are onboarding new technical leadership who need to understand what is already running.

What an Automation Audit Covers

Automation inventory. The comprehensive catalogue of everything that runs automatically — the first output of the audit and the foundation for everything else.

Scheduled processes: the cron jobs, scheduled tasks, Windows Task Scheduler jobs, Azure Functions timer triggers, AWS Lambda scheduled events, and any other time-triggered processes. Each process documented with its schedule, its purpose, its inputs and outputs, the systems it interacts with, and who owns it. The scheduled processes that run at 3am and everyone has forgotten about.

Webhook receivers: the HTTP endpoints that external systems call when events occur — the Shopify webhook that fires when an order is created, the Stripe webhook that fires when a payment succeeds, the GitHub webhook that fires when code is pushed. Each webhook receiver documented with its source system, the events it handles, the actions it takes, and the downstream effects.

Data synchronisation processes: the jobs that keep data consistent between systems — the nightly sync between the ERP and the e-commerce platform, the hourly inventory update, the customer data sync between the CRM and the support system. Each sync documented with its frequency, the direction of data flow, the transformation logic applied, and the systems on each end.

Event-driven processes: the processes that fire in response to events within the system — the order that triggers a fulfilment request, the invoice that triggers an accounting entry, the customer registration that triggers a welcome email sequence. The event chain documented from trigger to final outcome.

Manual-trigger integrations: the processes that require a human to initiate but then run automatically — the export that a user triggers that then runs a complex transformation and imports into another system. These are included because they have the same failure characteristics as fully automated processes once initiated.

Third-party automation platforms: the Zapier zaps, the Make (Integromat) scenarios, the Microsoft Power Automate flows, the HubSpot workflows, and any other no-code or low-code automation platform configurations. The platform-based automation that often runs without developer visibility.

Dependency mapping. The relationships between automated processes and the systems they interact with.

System dependency graph: the visual representation of which automated processes connect which systems — the map that shows that a failure in system A will affect processes B, C, and D, which in turn affect systems E and F. The dependency graph that makes cascade failure risk visible.

Critical path identification: the automated processes that sit on the critical path for business operations — the automation failure that would immediately impact revenue, customer experience, or regulatory compliance. The critical path processes that require higher reliability standards than background data synchronisation.

Circular dependency detection: the automation chains where system A updates system B which triggers an update back to system A — the circular dependency that can cause infinite loops, duplicate records, or data oscillation. The circular dependencies that may have been dormant for months and could activate under specific conditions.

External system dependencies: the external APIs, third-party services, and vendor platforms that the automation depends on. The external dependency availability that the organisation does not control and for which no fallback exists. The automation that silently stops working when a third-party API changes its response format or authentication mechanism.

Failure mode analysis. The systematic examination of how each automated process can fail and what happens when it does.

Failure detection: whether each process has monitoring that will detect when it fails. The process that runs silently — no logging, no alerting, no monitoring — and whose failures go undetected until someone notices downstream effects. The gap between what the team believes is monitored and what is actually monitored.

Error handling: the error handling within each process — what happens when an API call fails, when a data transformation produces an unexpected value, when a required field is missing from an incoming webhook payload. The process that catches all exceptions and continues, silently swallowing errors that should be investigated. The process that has no error handling and terminates on the first unexpected input.

Retry logic: whether failed operations are retried, with what backoff strategy, with what maximum retry count, and with what happens after retry exhaustion. The process that retries forever and fills the queue. The process that does not retry and permanently loses data on transient failures.

Duplicate processing: the protections against processing the same event or record more than once. The webhook that is delivered twice by the source system and processed twice, creating duplicate records. The job that can overlap with its previous run if the previous run takes longer than the schedule interval. The idempotency gaps that cause data quality problems under specific timing conditions.

Data loss scenarios: the conditions under which data can be permanently lost. The process that reads from a source and deletes the source data before confirming successful processing. The process that has no dead letter queue and drops messages that fail processing. The backup and recovery coverage for the data that automated processes create and manage.

Data quality assessment. The condition of the data that automation has been creating and maintaining.

Consistency verification: the comparison of data across systems that automated synchronisation is supposed to keep consistent. The customer record in the CRM that differs from the same customer in the ERP because a sync failed six months ago and was never reconciled. The inventory level in the e-commerce platform that differs from the warehouse system because an update was lost.

Accumulated errors: the data anomalies that have built up over time as a result of automation bugs, timing issues, or incomplete error handling. The duplicate records, the orphaned references, the mismatched totals that indicate automation has been processing incorrectly.

Audit trail completeness: the records of what automation has done — the transaction log that allows reconstructing why a record is in its current state. The automation that modifies data without creating an audit trail, making it impossible to investigate data anomalies retrospectively.

Security and access review. The credentials and permissions that automated processes use.

Credential inventory: the API keys, service accounts, database credentials, and OAuth tokens used by automated processes. The credentials stored in environment variables, configuration files, or secrets management systems — and the credentials stored in code or committed to version control. The credentials that have not been rotated since the system was set up.

Permission scope: the permissions granted to each automation's credentials. The service account with administrator privileges because it was easier to set up than a correctly scoped account. The API key with write access to every resource because the integration only needs to read three specific endpoints. The principle of least privilege applied or violated.

Credential sharing: the credentials shared between multiple automated processes — where a single compromised or expired credential can affect multiple systems simultaneously. The individual credential per process versus shared credentials and the operational trade-offs.

Documentation and knowledge assessment. The state of documentation for the automation estate.

Documentation coverage: what is documented and what is not. The process that exists in the collective memory of one person who set it up two years ago. The integration that is documented in an old Confluence page that nobody has updated since it was changed.

Bus factor: the processes whose continued operation depends on knowledge held by a single person. The automation that will break when that person leaves and nobody else knows how to fix it.

Runbook completeness: the operational runbooks for handling automation failures. The steps that a non-expert could follow to diagnose and resolve the most common failure scenarios. The gap between the incidents that occur and the documented procedures for handling them.

Audit Process

Discovery phase. Interviews with technical and operational stakeholders to build the initial inventory. Review of infrastructure configuration, deployment scripts, scheduler configurations, and any existing documentation. Access to logging and monitoring infrastructure to understand current observability.

Technical review phase. Code review for the most critical automation processes. Examination of error handling, retry logic, and failure detection. Review of credentials, permissions, and security posture. Analysis of monitoring and alerting coverage.

Data quality sampling. Spot-checks of data consistency between connected systems. Identification of specific data anomalies that indicate historical automation problems. Assessment of audit trail completeness.

Findings consolidation. Categorisation of findings by severity and impact — the critical gaps that represent immediate operational risk, the significant issues that should be addressed in the near term, and the improvement opportunities that would increase reliability and visibility over time.

Audit report delivery. A structured report covering the automation inventory, the dependency map, the failure mode analysis, the data quality assessment, the security findings, and the prioritised improvement recommendations. The report that serves as both an executive summary of automation risk and a technical workplan for the remediation effort.

Common Findings

Based on typical automation audit engagements, the most common significant findings include:

Unmonitored critical processes. Processes that the business depends on operationally — order processing, inventory synchronisation, financial reconciliation — running without any monitoring or alerting.

Silent failure accumulation. Processes that catch errors and log them without alerting, resulting in hundreds or thousands of logged errors that have been silently accumulating for months.

Missing idempotency. Webhook handlers and event processors that process the same event multiple times, creating duplicate records or applying the same transaction more than once.

Credential sprawl. API keys and service account credentials that are years old, shared across multiple systems, stored insecurely, and scoped with far more permission than the integration requires.

Undocumented dependencies. Critical automation running with no documentation of what it does, what it connects, or what happens when it fails.

Stale automation. Processes that were created to support business workflows that no longer exist, that are still consuming resources and occasionally producing errors that nobody investigates because nobody knows what the process is supposed to do.

Technologies Covered

The automation audit covers processes built across any technology stack:

Scheduled jobs — cron, Windows Task Scheduler, cloud scheduler services (AWS EventBridge, Azure Scheduler, GCP Cloud Scheduler)
Webhook infrastructure — any language and framework handling inbound HTTP webhook delivery
ETL and data pipelines — custom and platform-based data movement and transformation processes
No-code/low-code platforms — Zapier, Make, Power Automate, HubSpot workflows, ActiveCampaign automations
Cloud functions — AWS Lambda, Azure Functions, Google Cloud Functions used for automation
Message queues — SQS, RabbitMQ, Azure Service Bus-based automation
API integration services — any server-side code that calls external APIs on a scheduled or triggered basis

From Unknown to Understood

The most immediate value of an automation audit is not the improvements it recommends — it is the accurate picture it creates of what is actually running. Most organisations that have been operating automated systems for more than a year have less complete knowledge of their automation estate than they believe. The audit converts implicit, distributed, and incomplete knowledge into an explicit, documented, and current picture that the organisation can reason about, plan around, and hand over to new team members.

The improvements that follow from that picture — better monitoring, cleaner error handling, documented runbooks, correctly scoped credentials, eliminated stale processes — reduce the operational risk that accumulates invisibly in automation estates that are allowed to grow without periodic review.