In my previous Blog post I talked about using Azure Endpoint monitoring on a VM and tying that to an alert rule for notification. It’s pretty easy and works well, however the monitoring is tied to the machine you are configuring. So yes you can monitor all the nifty VM stuff (disk, memory, CPU) and be alerted if those things exceed thresholds you define, and you can also define any arbitrary custom endpoint (HTTP) on the same machine that will be invoked on a schedule and be alerted about that.
But with ‘Endpoint Monitoring’ you can’t get Azure to monitor any arbitrary endpoint URL you dream up – it has to be an endpoint belonging to the VM (or website) you are working on. That’s a bit restrictive and I was hoping the Azure Scheduler (a new base piece of infrastructure) would help as an alternative.
After a bit of researching it became clear that the Azure Scheduler is not configurable through the portal yet, and only through an SDK or REST service. This post was pretty helpful but after a few minutes I realised it was quite a painful way to have to configure my schedule of jobs using .NET code and specify things like alerting, schedule times, what to run etc… All easy stuff, but all pretty painful through C# code really.
So I decided to park Azure Scheduler until there is a nice UI for it on the Azure Management Portal. I needed a Plan B.
Enter SetCronJob.com Big ups to the maker of this service for making something simple, that works well, is easy to sign up and priced at the point I just can’t say no.
SetCronJob is a simple website service that allows you to login, create a job (essentially an HTTP URL), create a schedule, get emailed if something goes wrong, done!
Like well designed services you get enticed in with the free offering which I had up and running in seconds. You can create an account with your GoogleID, so with a couple of clicks I was setup and running.
Here’s a screen shot.
I basically create a Job with a URL, tell it whether its GET/POST/PUT etc, tell how often to run and what to do if it fails. As per my previous post, I had already configured a permanent Web API
health check action to return 200 if all good, or 503 if my end to end test failed. That was exactly what CronJobs here is looking for.
Here is one of my jobs, I only needed to set 3 things:
You can manually invoke the job whenever you want to test its working. I did a few positive tests, then killed something on my deployed app to make my end to end test fail. Here is the log below. You can see the invocation of the job where I had killed my app. It came back with a 503. You can even drill into each result below to see exactly what was in the HTTP body/header response.
How much does it cost? The Free service that I signed up for allows 50 jobs executed per day. Slice that anyway you want. but 2 jobs executing hourly will take you to 48. One job executing every 30 mins – same thing. I was using this for about 10 minutes and playing around and then realised I have more than one environment to monitor and I’d kind of like to know in 5 mins if I’ve screwed something up and either some deployed code broke an environment, or a machine is just having some weird problems.
Next plan up? The Silver plan – 3,600 jobs executed per day for $10/year… Yup, that’s not a misprint – I love these people. (Azure Scheduler will be about $10/month when it goes live apparently). SetCronJob will also wait up to 5 minutes for the job to finish on this plan too which could be handy.
After digesting this and realising that the debate was over within 3 cups of coffee it was a no brainer – silver all the way. But just to tempt the wary person in myself, they even let me “Try” silver for a few days before buying and spending my $10. People this nice actually exist on the net? I’m kind of shocked. So I’m on Silver Trial.
One small feature improvement that could be handy would be in the response time of the job. For example, the job may just be checking that an endpoint is alive and not really running a job at all on the server. We might expect it to always come back with 200 in less than 500ms. If it takes longer than that it could indicate high server load. So maybe an alert based on a response time threshold would be handy. At the moment its just a pass = 200, fail = > 400 (I presume that’s what they defines as a fail), it doesn’t look at the response time (although as you can see above, it does record it).
And as per the Azure stuff, a long term history of the results and graphs would be handy. Maybe your outages relate to a pattern, maybe you can correlate to attempted DOS attacks etc. With only the last 10-20 runs recorded you lose that info.
But I love this thing. It works like a dream, I signed up through Google, it has an interface so simple I could create a new job with only 3 bits of info, and a pricing plan from heaven. I’m impressed.