1

Pager Team   

pagerteam.com

Upvotes by: Michal

Comments Share 1.5k

Geographies: World wide

Pager Team is a simple, robust, zero-config on-call rotation management system that you only pay for when you need it. It's easy to use, so you can't mess up and leave nobody on-call. It respects shift swaps between two members, and won't move the schedule leaving you scrambling to get coverage. Works with your existing metrics and alarms. Pager Team has three key considerations that differentiate it from the existing competition. Those are cost, configuration, and consistency.

Cost
VictorOps, Pager Duty, and Ops Genie all charge monthly by user. The more people on your rotation, in some cases even as mere observers, the more you pay. As a paging mechanism, you should want to use Pager Team as little as you can, and I want to align your financial incentives with that goal. Fortunately, the costs Pager Team has to bear are actually rather minimal: keeping a list of users in a rotation costs effectively nothing. Rather than charging by the user, Pager Team charges by the incident, since that's where the costs are. I have two plans: a pay-as-you-go plan that charges $1/mo per rotation plus $1/incident, or a flat rate plan at $200/mo per rotation which provides unlimited incidents. Both plans allow unlimited rotation members, unlimited viewers, unlimited swaps, and unlimited email, SMS, and voice notifications. Every on-call escalation starts with an email, then a minute later an SMS, another SMS, a phone call, and another phone call (all one minute apart). If the primary on-call isn't reached within those 5 minutes, Pager Team will try the previous on-call, repeating the same process. (And if that person can't be reached, tries a third person before trying the rotation owner as a last resort.)

Configuration
The configuration of all three incumbents is complex. You can make them do just about anything! It's wonderful, in the same way JIRA is wonderful. However, because you can do just about anything, configuring them is actually so complicated that it's easy to get wrong. Maybe you're confused by the terms on-call, escalation policy, team, and rotation -- perhaps the UI isn't quite clear. Fortunately VictorOps has this helpful 14 page guide on how to set things up: https://help.victorops.com/knowledge-base/team-escalation-policy/. If you read closely you'll find this gem buried pretty deep:

> There are some instances where a step within an Escalation Policy will reference paging an on-call user at a time when there’s no one on-call. [...] the escalation policy will page no one and then wait however long is specified before executing step two.

I don't know about you but "no one on-call" is the last feature I want out of my rotation management system. Pager Duty's on-boarding process for a new company will have you create a new rotation with a user who only has entered their email address. Good luck waking someone up at 3am by repeatedly emailing them. (I do think they set the important flag though.) Pager Team won't let any user join any rotation unless they have a verified email and verified phone number.


Consistency
Consistency has two really important facets. Most important for the business is the consistency of response time. Regardless of the individual currently on-call, when an incident comes in, you want to see the same prompt response. Every Pager Team user has the same notification path which they can't change, so you know what to expect regardless of who is on-call. (The incumbents will let users configure things like "email me, if not ack'd wait 4 minutes then call my work phone, then wait 2 minutes and call my house phone then text my cell phone" which both gives inconsistent results but also means invariably folks will fail to configure anything at all.)

The second part of consistency is for the members of the rotation. Everyone has a life outside of work and they need to be able to plan around their on-call schedule in a consistent, predictable manner. Invariably someone will get scheduled to be on-call during their birthday, or vacation, or whatever. They'll want to trade with a colleague -- I cover your shift, you cover mine. Again, you can do this with the existing tools just fine. What the incumbents don’t account for is that teams are dynamic. With any luck, you're going to see success and grow, and onboard new people on the team. And the reality is sooner or later someone will leave the team. When person A and B have traded shifts and someone joins (or leaves) the rotation, the way the incumbent services handle this situation is by shifting the rotation by a week -- so while A is covering next week, B is actually on-call the week after so A ends up covering for someone else altogether. Here's a recent blog post bemoaning this very problem: http://rachelbythebay.com/w/2019/01/14/rotation/.

Pager Team provides schedule consistency by breaking down the on-call schedule into three phases. For the immediate future (the next 30 - 90 days, based on your preference), we lock the rotation schedule, so people can plan around their shifts with as ironclad a guarantee as we can make (sadly we can't promise nobody will leave, but if someone does leave we don't move everyone up by a week). Beyond the immediate future we show a predicted schedule, so people have a sense of what to expect, but without making that explicit commitment that it won’t change. Finally for the very distant future, the likelihood of the rotation schedule changing between now and, say, January 1, 2020, is so high that we don't even show what we think the schedule might be. This is intentional, to discourage people from worrying about next summer’s fourth of July and trying to preemptively get the best shifts.

Of course if someone is scheduled for an on-call shift they can’t do, they are responsible for finding coverage. There are two schedule exception concepts in Pager Team: an override which is just like the incumbents: from time X to time Y, person C will be on-call. There's also a concept of a swap where person A and B are agreeing to swap their next two shifts -- and when their shifts move from the predicted schedule into the committed schedule, the swap turns into two overrides. (And since one person's next shift will be committed first, the swap turns into one override first, then another one later.) And the committed schedule means you can re-order the rotation and make other changes without worrying you'll inadvertently change the committed schedule -- which in existing tools is actually crazy easy to accidentally change next week's on-call person and have no idea how to revert to the previous state.

Startup Features:
  • Robust configuration you can't mess up to get notifications when system outages occur
Categories / Subcategories:
  • Technology
Target Audience:
  • monitoring
  • devops
  • alert
  • pager
  • on-call
  • system outage
  • incident management
loading

Get latest updates from Betafy!

You can manage notifications anytime from your browser settings