The Penn Mobile app release day was an absolute fiasco. We had app review behind schedule, API crashes, DDoS attacks, and no sleep. What a week.
Things started to go wrong on Tuesday when we decided to push a new build to Apple for the Wednesday release deadline. The Dining API providers made a few requests the weekend prior and we decided to try to squeeze them into the app last minute. Bad decision.
The one thing going for us was that Apple had approved our last 5 builds within 1-2 hours of submission. Turns out Apple decided to balk on that timeframe for the one release that mattered. We stayed up Tuesday night prepping for release after a 5 hour work session Tuesday evening. Wednesday came, 3am came, 6am came, 9am came, nothing.
Then we saw the DP article. Misspellings, inaccuracies, and more focus on the negative than the positive. That was a great addition to the day. I should mention that by this point in the day (11am), posters were hanging all over campus. Now those were wrong too.
Android came through in the early afternoon. A silver lining! Finally!
Turns out the app had many functional problems (non crash causing) that rendered many subsections unusable. Cue damage control from developers. Now we had members of labs, the UA, and the DP working on damage control. Hours went by. Classes missed. At some point I considered writing a script to record how many times I refreshed the iTunes Connect App Submission page.
We had blown our launch. Android was in shambles. iOS was MIA with no alibi. Hundreds of dollars in marketing materials down the drain. 5 cups of coffee didn’t help either.
At about 9:30pm, Apple came through. We had approval! I took down the countdown timer, we deployed site changes with updated links, and we were good to go!
By 10:30, we had 40+ users. By 11, 80+. Then I pulled up the app on my own device and was met with an error.
I ask Adel if he sees the same. He does. That was a BOOM moment. We knew the issue had to be with the server. We try to SSH into the server to find the issue. No luck. The VPS was totally dead. So we restart it. Adel begins to dive into /var/log. That’s not a place you want to dive into. Soon after, I try to help, but by the time I do, he’s already found the bug.
Someone had hit us with thousands of SSH login attempts form various IPs at http://www.poneytelecom.eu. This was a deliberate DDoS attack.
We proceed to get the server back up and running after going through all forensic data possible. Whoever created the attack didn’t leave much of a trace. But we have our suspicions. There are very few people who know our API server info. And even fewer who would DDoS it.
Adel manages to get the server back. While he’s doing so, I work on the patch for update 1.1.1. I manage to find the top 5 most common bugs (causing 90% of crashes) and fix them. Adel does the same. Midnight comes, 1am comes, 2am comes, now at the point of 40+ hours with no sleep. We are delirious.
But now, its 3am, and the app is out on both devices. Its not perfect, but its there. We’ve learned out launch lessons. We secured our server. But we made mistakes. Lot of them. No future launch will ever match the stress level and frantic sense of today. That’s the goal of this blog.