Ansible “until” loop

Continuing on with Amateur Ansible Fumbling Hour, here’s what I wanted to do, and what wound up working, with commentary on the errors I got and what’s going on for those unable to make heads or tails of the Ansible documentation.

I have an Ansible playbook that runs updates on servers, and then reboots them after the updates finish. However, I have noticed that after rebooting, an essential service on a server isn’t starting automatically, despite the service being set to start automatically. This seems like a great situation to add something into my playbook to check the status of that service and start it if it’s not started. However, I also wanted some level of error handling, just in case the service didn’t start automatically because something was stopping it just after reboot.

Here is the playbook that finally worked.

---
- hosts: hosts
  tasks:
  - name: Check and start Service
    win_service:
      name: "service_name"
      state: started
    register: result
    until: (result is not failed) and (result.state == "running")
    retries: 5
    delay: 10

Now let me explain a couple of roadblocks I ran into trying to get this to work.

As you can see, I’ve got the results of the service start command being registered to result. Then, I use until to check the contents of the result variable, and specifically the running object. Below is the output of result after a successful run of the playbook.

changed: [server.domain.com] => {
    "attempts": 3,
    "can_pause_and_continue": false,
    "changed": true,
    "depended_by": [],
    "dependencies": [
    ],
    "description": "Service description",
    "desktop_interact": false,
    "display_name": "Service Name",
    "exists": true,
    "invocation": {
        "module_args": {
            "dependencies": null,
            "dependency_action": "set",
            "description": null,
            "desktop_interact": false,
            "display_name": null,
            "error_control": null,
            "failure_actions": null,
            "failure_actions_on_non_crash_failure": null,
            "failure_command": null,
            "failure_reboot_msg": null,
            "failure_reset_period_sec": null,
            "force_dependent_services": false,
            "load_order_group": null,
            "name": "service",
            "password": null,
            "path": null,
            "pre_shutdown_timeout_ms": null,
            "required_privileges": null,
            "service_type": null,
            "sid_info": null,
            "start_mode": null,
            "state": "started",
            "update_password": null,
            "username": null
        }
    },
    "name": "Service",
    "path": "C:\Windows\System32\service.exe",
    "start_mode": "manual",
    "state": "running",
    "username": "LocalSystem"
}

So of course, I can use until to run the playbook until result.state == running, but when I only checked against that, I got an error message saying that dict object has no attribute 'state'. This took me a while to puzzle out, but the issue was that when the playbook failed (let’s say because the service was set to disabled), then nothing was being written to the result variable. Then, when the playbook went to check the contents of that vairable, of course there was no attribute ‘state.’ This is why I added the other check result is not failed. So now the playbook can’t fail, and the service has to be running for the playbook to end.

Server Not Found in Kerberos Database

I recently started trying to use Ansible to manage all of the disparate systems I have at the office, and in trying to set up Ansible to communicate with our Windows systems, I ran into this (among other) issues. Despite having all of the right ports open to communicate with WinRM, a couple of systems were giving the error

Server not found in Kerberos database

After some digging, I discovered that Kerberos is highly dependent on DNS to be able to perform both a forward and reverse lookup. In my case, my reverse lookup zone had not correctly populated for the servers that were giving this error.