Shifting bottlenecks
Today is a great day! I finally finished moving all our data to the new servers. I love finishing a project — especially one that involves working until midnight every night until it’s done.
As part of this migration, I outsourced our email hosting to Rackspace, and I couldn’t be happier. They have much better spam filters than we had on our server, and I don’t have to think about it anymore. It’s not that hosting our own email was difficult, but it was a pain to migrate to a new server because you have to time everything just right so you don’t lose mail, or have it directed to the wrong location. Rackspace does it better, and it’s one less thing for me to think about. Well worth the money.
For the new server setup, we’ve got a couple dedicated file servers, and we use Windows distributed file system replication (DFS) to continuously replicate files between them. It’s wonderful! I’ve been trying to get to this point for many years, but I was using the wrong tools. Previously I used rsync or SyncBack, but the problem was always the time it takes to scan all the files for changes. (We currently have 700,000 files, so this is a big deal.) DFS instead uses the NTFS change journal to track all the changed files, and doesn’t have to scan the file system to be sure things are synced up. If there’s another 3rd party tool that does this, I’d love to know about it, because the only problem I have with DFS is it requires all the servers to be on the same domain (or in the same forest on a WAN.) The other tools I’ve seen that use the NTFS change journal still say you have to scan all the files periodically to make sure you haven’t lost any changes somewhere along the way.
When we started out many years ago, we had only one server. To backup the data, we first created a database backup, which involves reading in the entire database and writing out a backup file. Then we copied all the files (or just changes) to a backup, so at a minimum it’s scanning all files for changes, then reading all the changed files to send to the backup. That’s a very disk-intensive process, so while the backup was running, it hurt performance of the application. While it would be nice to have frequent backups, the performance penalty meant we could only take backups during off-peak hours.
For our second generation of servers, we bought the fastest drives we could, which doubled the cost of the servers, but meant the backups didn’t hurt application performance as much. That worked OK, but we still had other problems around disaster recovery. If a server failed and took all its data with it, we’d have to recover from backup, which involved moving hundreds of GBs of data, which is slow no matter how fast the drives are.
Now we’re on the third generation of servers, and we’ve taken a different approach. Now the application servers are only responsible for creating database backups to a file server periodically. That’s much less disk-intensive, and has minimal impact on application performance. We’ve shifted all the backup processing to the file servers, and even that is spread out over a couple servers to make sure it doesn’t impact application performance. So this not only improves performance, but it gives us more redundancy, and allows us to do more frequent backups during the workday.
So does this fix all our problems? No way. But it changes things so much, I don’t know what the next problem is going to be. This week I’ll be using the excellent PAL tool from Clint Huffman to analyze the server performance, and start designing the next iteration of the server architecture. Some days it’s hard to tell I’m in the software business.
You’ve got to set expectations
Politeness and civility are the best capital ever invested in business. Large stores, gilt signs, flaming advertisements, will all prove unavailing if you or your employees treat your patrons abruptly.
–P.T. Barnum
My washing machine is leaking. It’s under warrantee and I’ve left two messages and sent an email…today. All I want is a call back or some kind of response. It’s okay if you can’t come out to fix it this week and I don’t mind if there’s going to be a charge.
But, every minute I wait I get frustrated for no reason. On top of that, I’m cursing, getting worked up, and won’t recommend you to my friends.
I understand why this can happen – I’ve been guilty of not following up when I say I will, not being clear about what I’m doing, and mostly taking on more things than I should.
We try to treat our customers as we’d like to be treated. No matter how you slice it, most of the time it comes down to one thing:
Set the expectations from the beginning and keep communicating.
We keep improving how we set expectations, but it really comes down to being organized enough to tell your customers what’s going on. If you’re selling something, you need to end every conversation with a plan for what will happen next.
If you promise to make a phone call, you need to follow through – especially if it’s uncomfortable or bad news. Even if you don’t know the answers you’re being asked, just be honest – rather than making up excuses, most people are happy with the answer: “I’m really sorry. I don’t know. I’ll try to help.”
After four calls, I finally caught someone live on the phone. “Didn’t you know? Our tech is scheduled to come to your house to fix your machine at 10 tomorrow”.
What’s my first useful skedsheet?
Now that we’ve got a working skedsheet (even though we’re still hiding it for a while) I’ve spent some time playing to see how (and if) things work. I’m really excited about the way it’s shaping up – and despite the fact that it’s really ugly, it actually does a big part of what we want to do.
I’m starting to feel good because skedsheet is going to bring value to the people and companies we know well – construction subcontractors. And, I think it’s more widely useful than that.
So, I want to figure out a way to use it for myself – I guess the entrepreneurially correct phrase is “eat our own dog food”. For JobTracker, we ended up using it almost immediately as a CRM system, even though that was a bit of a stretch…it turned out that using our own software on a day-to-day basis was very useful, especially early on. What can I do?
1) Marketing calendar. I have a spreadsheet of things that need to be done marketing-wise. Trade shows, ads, articles, mailings. One of the problems I run into with my marketing spreadsheet is that it’s hard for me to follow what needs to get done today or this week. I think having a calendar would help.
2) Supporting customers. I’m wondering if there’s another level of detail that skedsheet could help with in managing some support tasks. Probably not, since we already have an system for managing this. But maybe, a calendar spreadsheet would be good in helping our JobTracker customers plan their implementations. “here’s a schedule with dates, who’s supposed to do what, and what are the milestones or prerequisites.”
3) Personal stuff. I’ve got a spreadsheet for a workout schedule, and a spreadsheet of some of my kid’s activities. I think the kid activity spreadsheet would be really cool if I can convince some friends to read, edit, and add to it, too.
Making a mock-up or a mockery?
I had lunch with a friend who’s a “product innovation” consultant, and although we try not to talk about work too much, that’s inevitably the topic that comes up.
My friend’s premise is that making and showing off paper (or PowerPoint) mock-ups is worth it. Because…before you spend a minute of development time, you’ll want to make sure that the product you’re planning is worth it.
Now some people disagree, and I’ve personally believed you need a product to show before you can talk about it – the opposite is just vaporware. But there are some good arguments for trotting out your ideas on paper before you build.
So, why should you show a paper design to a prospective customer?
- Get immediate feedback, before you spend time writing code. If an idea isn’t worth anything to a customer, you get away with a much smaller investment – drawing out a design on paper could take hours or days, but writing code could take months. Or more.
- Don’t set the expectation that it’s real, so customers don’t get hung up on faults. If you see real software, even with the disclaimer that it’s a prototype, you’re going to be drawn to the worst parts or repulsed by how ugly it is.
- Prevent yourself from falling in love with your design. There’s a tendency to value the work that you do and not be willing to throw it away. We’ve managed to steer clear of that in the past, but once you’ve got a framework built for a particular design, it’s painful to give up.
- Start getting the word out, before your product is ready to show. Through the skedsheet blog, we’ve tried to explain what we’re doing and why, but we’re not talking about the specifics or making promises about features. We could take it to the next level by showing a bit of what we have in mind…today.
I’m still torn, but I spent a few hours last night mocking up a design, purely to practice a demo. Almost everything I wanted to show could have been done in the prototype we’ve got running already, but it felt more honest doing it with just a sketch. Will I show it to anyone?
The first Skedsheet!
We reached a milestone today. We moved Skedsheet from the development servers to the production servers at http://www.skedsheet.com/
This doesn’t mean it’s done or ready for public viewing. We’re just putting everything on the public site so we can have an outside web designer help us make it look better. (At this point, it couldn’t look any worse. So I think she’s set up for success.)
After installing the software, Harry and I signed up as the first users and created a few Skedsheets. We also confirmed our email addresses (to prove we are the rightful owners of them) so we can now share these Skedsheets with each other.
If we lived in the same state, we might have raised a toast to this little milestone. Instead, I wrote a blog post.
Book report: The Big Switch
The Big Switch by Nicholas Carr tries to be a business book, a history book, and a warning about the future. Unfortunately, by trying to bite off so many topics, none of them really get a chance to shine.
Punchline: Google is to computers as Edison was to electricity. There are some interesting parallels between how electricity became popular and the internet showing up everywhere. I had no idea how fast electricity took off and how influential Edison was as a businessman, founding what became GE and licensing technology (and his name) to various power companies like Consolidated Edison.
What’s good: The first half of the book gives a snapshot into the time when electricity evolved from being virtually nonexistent to becoming the primary power in businesses, transportation and homes. What’s hard to believe is that all of this happened in a span of 30 years (between 1880-1910). There’s a similar story showing the beginnings of the internet – briefly touching on the PC, ARPA, and finally Google.
There are some interesting, but unsubstantiated, claims about how the promise of the internet (democratizing information) might not be true – rather that it’s concentrating wealth and power in a few companies, more than other technologies in the past.
What’s bad: It felt like the author wasn’t sure who the audience should be, so at times it seemed overly dumbed-down. But that didn’t really bother me too much because while he’s explaining technology (both electricity and internet) the story is interesting enough to see beyond it.
The first real problem is that there’s no focus to the book – it’s like each chapter was an article written on it’s own without a strong overall theme. The second, and bigger problem is that when Carr tries to speculate about the future I totally lose interest. Maybe because I disagree on his conclusions that the internet will kill all kinds of culture, make us all vulnerable to terrorists, and maybe destroy our way of life.
What I learned: Our scheduling software was built around the idea of being a web service from the beginning, but I never considered the scale of this kind of technology. I live about hundred miles from a Google data center that was one of the first huge ones; power consumption – around 100 MW – similar to a city like Tacoma, WA. This switch to “utility computing” is going to make software and hardware cheaper and more accessible for a long time.
Would I recommend?: Probably not. If you want to go to the library and only read the first third, that might be okay, but I think the payoff for the rest of the book isn’t there.
Recovering from a disaster
I have been shocked a few times when I see how some of our customers take care of their own data, when they install JobTracker on their own servers. They usually think they’re fine, until their server crashes after running 24×7 for 4 years, and they suddenly realize they don’t know if they have any way to recover.
I think I have the opposite problem. No matter how much effort we put into designing and implementing our backup process, I’m always worried that it’s not enough. I’m just a glass half-empty kind of guy.
When designing a backup process, the only thing that matters is how you’re going to recover data. So you have to think backwards and design a recovery process first, then make sure you have a backup process to support it.
We just bought a new set of servers and much of the decision for what to buy was based on our desired recovery process. Our previous server setup left us with a few scenarios where it would just take too long to recover from a disaster, because we had to move too many GB of data across the network. The new setup spreads the data out more, so there is more redundancy and recovering from any single failure will be much quicker.
The two primary measures of the recovery process are how long will the system be down, and how much data will be lost. Ideally both these numbers would be zero, but you can always dream up a scenario that causes downtime or data loss, even if it’s extremely rare or violent. So the best you can do is make these numbers small, but the smaller you want them, the more it costs, and the more it can hurt performance.
Once you accept that there’s the possibility of downtime and data loss, then you can map out various scenarios, and design a process to get the right $ cost vs. performance vs. risk trade-off.
We have done just that, and below is a summary of how much downtime and data loss we expect from various disasters, under our current server configuration.
| Disaster | Recovery Action | Downtime | Max Data Loss |
|---|---|---|---|
| Human error, destroying data | Restore from backup | None | Depends on how long it takes to discover problem. Lose data entered after the backup you’re restoring was taken. |
| Any server hardware failure, leaving drives OK | Swap hardware | 30 min | None |
| Server software problem | Move data to another server, rebuild server later | 30-60 min | None |
| Single drive failure | Swap drive, reboot server, may have slow performance while RAID rebuilds | 10-30 min | None |
| Multiple drives fail on app server | Restore from backup to another server | 30-60 min | 1 hour (only for databases on failed server) |
| All production drives fail on all servers | Rebuild everything from local backup | 3 days | 1 day |
| Nuclear bomb at primary data center | Rebuild everything from remote backup | 3 days for initial recovery, 1-2 weeks to get all historical data transferred | 1 day |
| 2 nuclear bombs at primary and backup data centers | Enlist in the Army | Forever | Everything |
Some of the reasons we can recover quickly are:
- Our servers are in a data center that’s staffed with experts 24x7x365 with tons of spare parts sitting around to fix any hardware problem within 30 minutes.
- We have a bunch of identically configured servers. If there’s a problem with one, we can easily move the data to another server that’s ready to go.
- We use a combination of database full & log backups, file replication, on-site backups, and off-site backups.
- Every piece of data gets stored on at least 5 servers, across 10 drives. Keeping 30 days of full backups means once it’s a month old there are over 90 copies of that data! (Luckily our new servers have 18TB of disk space to handle this.)
We’re always improving our process, so I expect we’ll get these numbers down over time. But it also depends on how much money we’re willing to spend, which is based on our revenue going forward.
Announcing intentions
A study by NYU psychologist Peter Gollwitzer recently showed that going public with your intentions to do something might be counterproductive.
A group of law students who announced that they were going to read law journals periodically actually read less than those who kept their mouths shut. This made the news…probably because it’s conventional wisdom that talking about your goals helps achieve them.
Uh oh. Isn’t that what this blog is all about? We intend to make a spreadsheet that ties into a calendar. We intend for it to look pretty and be useful. And, we intend to spread the word about skedsheet.
So far, what do we have to show for it? Bupkis.
Well, that’s not strictly true. We’re actually working hard and the guy who’s doing most of the work hasn’t announced anything publicly. The study has much more to do with individual performance than the productivity of a team.
I think the biggest difference between our scenario and the study is that there’s a different level of consequences when you renege on intentions or promises as a company versus those of a student. We’re going to keep talking about what we’re doing – including the trials and tribulations because it’s a vital part of keeping us focused, excited, and moving forward.
Thanks to Derek for the link to the study.
How I stopped worrying and learned to love the bomb
Every time we’ve done something new in the past, one of the fears I’ve had is that we’ll be swept into the tornado of too much success at once.
Here’s the actual sound of that tornado: “pfffftt.” Or, maybe just the sound of crickets chirping.
It turns out that an overnight boom of success isn’t what we need to worry about – and originally, I felt like a loser because of it. Most business books talk about the exceptions – the runaway hits and the dramatic failures. And because I’ve never imagined our business being a failure, the obvious thing to shoot for was being a blockbuster.
Well, instead, we bombed…compared to the high expectations we set. With JobTracker, that meant just a handful of customers the first year, and modest growth over time. No feature opened the floodgates, no recommendation, ad, or key event fundamentally changed our business. But the reason I stopped worrying is that the small successes added up.
So, for skedsheet, I expect the same. It’d be really cool to be mentioned on TechCrunch, have a zillion people visit our site, start using a skedsheet and sharing it with everyone they know, and needing it enough to pay. But the reality is that most businesses aren’t the ones that you read about, but can still bring enough value that people are willing to pay for your product or service.
So, I’m not worried about hearing silence when we release our first version, because I know it’s just the first of hundreds of small steps toward another great addition to our business.
Which questions should I ask to find the value?
I spend way too much time thinking about our sales process, mostly because I keep trying to use my engineering experience and apply it to situations that rely on people – aka, situations that are pretty unpredictable.
For JobTracker, the way we try to sell is by having conversations about “specific indicators”, which are the problems that drive our customers to buy. If we were selling shovels, the questions would go something like:
Q: How are you digging holes in the ground now?
Q: What works about that? What doesn’t?
Q: Are your fingernails getting really dirty and you can’t stand the taste of mud anymore?
Q: Do you think you could dig an extra hole every week if you had a sharp blade with more leverage?
It looks like a few of the scenarios we already use apply to skedsheet. By definition (combine a spreadsheet and a calendar), we have a decent idea of the problems our leads are running into with the two tools they have today.
If it was a conversation – and it will be as I start interviewing folks to ferret out some new markets – my dream conversation would look something like this:
Q: How are you scheduling now?
A: I’m using a spreadsheet with all of the details, and transferring over some of it to outlook to move things around and share the calendar.
Q: What works about that?
A: I like having access to the information on the computer more than hunting through file folders and a whiteboard.
Q: What doesn’t?
A: Every time I need to change the schedule, I have to put the details on the spreadsheet and the dates in outlook.
Q: Keeping up with documentation in 2 places seems mistake-prone. Is that a problem for you?
A: Yes.
Q: What’s the $ impact every time you forget to update the date in one place?
A: One zillion dollars.
Q: How often would you say that you make mistakes because you need to remember to put the date in 2 places?…..
The important part about a conversation like this is having questions that uncover the pain (I need to update my schedule in 2 places) that we can actually solve. Then, if someone starts using skedsheet they’re happy because they’re solving a problem. Even better, we hope it’s a solution to a problem that they can quantify – either money or time is great.
And, we’re happy because someone’s getting value out of what we do… and if there’s value, they’ll be willing to pay.