Our head of development (let’s call him T), who was also 33% of the development team, left us in March, which makes me the new head of development. It’s not something I was aiming for at all (I’ve always been more drawn towards the technical specialist career track rather than management) but since there was no one else available, I’m now the manager of our two-person team.

In a way it’s nice to be manager, as it does give me greater say in how we develop (and to some extent what we develop). There will be even greater focus on design and architecture, on code quality and automated testing, and on paying down our technical debt.

The handover was very undramatic. I just slipped into the vacated shoes. We had three weeks of handover time, during which I was getting used to thinking and acting as head of development, and T worked part-time as an ordinary developer. Very smooth.

Last week was our first week without T, and it was as if the place was jinxed. All kinds of things kept going wrong, and today was no better.

Monday started with login problems at our intranet site. I inherited the responsibility of supporting the intranet from T. The site is based on Drupal + phpBB, and we use an LDAP integration module for user authentication. The LDAP module needs the credentials of one user in order to log in and get the details for all other users. We’d used T’s credentials in the past but now that he wasn’t an employee any longer, his login didn’t work. Easy-peasy, we’ll switch his credentials for mine and off we go. Except that when we did that, everything stopped working. We spent several hours on it on Monday and got nowhere. It didn’t help that we had only basic knowledge of PHP, knew nothing about Drupal development, and even less about LDAP. By noon we’d gotten rid of one detailed error message, only to see it replaced with the not-so-useful “invalid credentials”. It was release week, which meant that the intranet was not high priority. But of course it couldn’t stay down for too long, so on Thursday I spent another half-day digging. It turned out that another Drupal module was interfering with the LDAP module, by transparently inserting <p> tags around the settings we had entered. What a stupid error to lose time for!

On Friday, a few minutes past midnight, our production site went down for 15 minutes, for no known reason. The web host has so far ignored our requests for log files.

Our monthly release was due on Friday afternoon. Ingrid woke up with a fever and I had to stay at home, and Eric couldn’t get home before 5. T came in for part of the release process, and I had to phone in when Eric got home, to support my junior colleague during the remaining testing tasks. Luckily everything went well, even though it was a semi-complicated release. Less fortunately our testing showed that the release included several avoidable bugs, but nothing too severe.

Saturday morning I got a call telling me that it was impossible to log in to our site, even though the site was available, and had worked perfectly well on Friday. Luckily it was easy to diagnose with the help of Google, and almost as easy to fix. It turned out that we’d missed a quirk about web.config files, which for some still uknown reason hadn’t caused any problems in our test environment. Every time the ASP.NET worker process was restarted (i.e. every morning, because of the overnight idle period) the application effectively got corrupted. My first task on both Sunday and Monday morning, before any user had time to log on, was to restart the application in just the right way to avoid that corruption, and when I got in the office today we put in a proper fix.

Then, yesterday evening, our office network collapsed because of the server (or firewall, or some other piece of hardware in that stack) overheating. Turns out there is no AC in the server room, and the server cabinet is of the wrong type, and someone (probably the cleaners) had closed the door during the weekend. Network access was restored quickly, but we still spent half the day without access to incoming email. This was one problem I didn’t have to fix, or even feel any responsibility for, but it meant we were flying blind as far as production support goes – much of our monitoring is, unfortunately, based on email notifications. This was doubly unfortunate since today was the first weekday after release, and it’s not uncommon for a few bugs to arise. Luckily we had no urgent issues today.

It’s all been a flood of unfortunate events, which were all resolved in the end before causing any major issues. But it was a precarious balance, and now I feel all exhausted after a week of firefighting. Stress doesn’t arise from having lots of work, but from feeling of having no control over the situation.