BetDash.com Infrastructure

Posted: November 4th, 2012 | Author: | Filed under: posts | Tags: , , | No Comments »

One of the biggest improvements at BetDash.com in October was our migration into a new production environment. We had outgrown our previous environment and took the opportunity to improve many aspects of our production setup as we built out the new systems.

The majority of our servers are now virtualized using VMWare ESX. Of the BetDash servers, only our database remains on physical hardware – this is at the recommendation of the Oracle MySQL consultant we worked with on our deployment.

This is what our infrastructure looks like in our new environment and how our stack is distributed across the different tiers:

  • Citrix Netscaler load balancers
  • Web servers running Apache, serving the contents of our Rails app’s public folder, i.e. all static assets (CSS/JS/Images). This content is also replicated into our CDN.
  • Passenger Enterprise on top of Apache for our front-end application servers, serving our primary Rails 3.2 app. These boxes are presently 8 virtual CPUs and 8 GB RAM
  • Resque, Resque Scheduler, and other proprietary backend applications run on our backend application servers – these are 16 virtual CPUs and 32 GB RAM.
  • Dedicated instances for our Admin panel application, Redis, Memcached, and other internal applications utilized by our system.
  • Redis is currently deployed in a master/slave replicated setup, however, we’re keeping a close eye on Redis Sentinel emergence from beta and are also looking to deploy Redis Failover on top of Apache Zookeeper.
  • MySQL is on physical hardware, configured as a RedHat Conga Cluster with the database files themselves sitting on a SAN. Each node in the cluster has 12 physical CPUs and 96 GB RAM.
  • Dedicated MySQL instances for reporting and data warehousing

The instances are built with Puppet and the applications are deployed out of git using a combination of Capistrano and Paddy Power’s proprietary release tool.

One of the most common comments in regard to our stack and environment is the question of why we’re running on our own VMs and hardware as opposed to hosting in the AWS cloud like everyone else. The regulated nature of real-money gaming requires that our systems sit in the Isle of Man and thus this rules out the IaaS options. It was extremely interesting to see the number of leading Rails consultancies, which we were speaking with in regards to our deployment strategy and tuning our application for the new environment, that haven’t worked with non-IaaS systems in years.

With our move to these systems, we’re well positioned as we continue to scale the BetDash platform, thanks to the hard work of a large cross-functional project team. Our next step will be to move to run in an active/active configuration across Paddy Power’s multiple data center sites.


Connecting a Windows VM to a Mac-based localhost Ruby on Rails site (Passenger)

Posted: January 20th, 2011 | Author: | Filed under: posts | Tags: , , , | 5 Comments »

Supporting Internet Explorer for your web app is a pain. At least, trying to test your Ruby on Rails (or other localhost-based site) in IE while developing on the Mac should be easy, given the prevalence of virtual machines. Right?? You’d think. However, trying to setup our front-end developer to access a local instance of Passenger running on Mac OS X from a Windows VM proved to be more of an issue than it needed to be. Adding to this, most of the top Google results for relevant keywords for this configuration made the process more overly difficult than it needed to be.

So, to help ease your pain in testing your Mac-based Rails site from a Windows virtual machine, here is an easy configuration for the three main VMs.

We need to do two main things:

  1. Trying to serve port-based or subdomain-based sites over to the Windows VM is going to be more challenging. Let’s make things easy – we’ll set the default http://localhost site for our Mac to our Rails site.
  2. We need to get the VM’s network to communicate with the Mac’s network correctly so that we can access this site.

On my team, we’re using Passenger on the Mac (running on top of the Mac’s base install of Apache). We’re using Passenger Pane to configure Apache easily. We also have this hooked into RVM, but that won’t be relevant to what we’re doing here. For the purposes of this post, I’ll assume that you have Passenger successfully serving a site that you can access via http://somedomain.local I’ll also assume that for whichever VM you’re opting to use, you have Windows installed and running in it.

** Do yourself a favor. Make sure that VM Ware Tools, Parallel Tools, or VirtualBox Guest Additions are installed – this ensures that the network connectivity will work as expected.

 

Let’s get http://localhost serving our Rails site

At the bottom of /etc/apache2/httpd.conf when using Passenger Pane, you’ll find the following configuration:

<IfModule passenger_module>
  NameVirtualHost *:80
  <VirtualHost *:80>
    ServerName _default_
  </VirtualHost>
  Include /private/etc/apache2/passenger_pane_vhosts/*.conf
</IfModule>

 

To make our Passenger site the default localhost site, go into passenger_pane_vhosts/, view the Apache config file for the site you want to be the default, and you’ll see something along the lines of the following:

$ cat yoursite.local.vhost.conf
<VirtualHost *:80>
  ServerName yoursite.local
  DocumentRoot "/Users/username/rails/yoursite/public"
  RackEnv development
  <Directory "/Users/username/rails/yoursite/public">
    Order allow,deny
    Allow from all
  </Directory>
</VirtualHost>

 

Copy the lines from inside the VirtualHost block and paste them back into /etc/apache2/httpd.conf so that it looks like the following:

$ cd ..
$ sudo mate httpd.conf
# Added by the Passenger preference pane
# Make sure to include the Passenger configuration (the LoadModule,
# PassengerRoot, and PassengerRuby directives) before this section.
<IfModule passenger_module>
  NameVirtualHost *:80
  <VirtualHost *:80>
    ServerName _default_
	####
	DocumentRoot "/Users/username/rails/yoursite/public"
	RackEnv development
	<Directory "/Users/username/rails/yoursite/public">
	  Order allow,deny
	  Allow from all
	</Directory>
	####
  </VirtualHost>
  Include /private/etc/apache2/passenger_pane_vhosts/*.conf
</IfModule>

 

Restart Apache by going to System Preferences > Sharing and uncheck & recheck Web Sharing.

Also, if you’re using VMWare Fusion or Parallels, note the IP address that’s shown here.

Configuring VirtualBox

Let’s start with Virtual Box, which was the biggest pain to configure. Not because there’s a lot of work, but because they one critical detail doesn’t seem to be well known (most of the posts you’ll read online try to guide you through using the command-line network configuration tool). I should note that this was my first real opportunity to use Virtual Box, having only used Parallels and VMWare Fusion beforehand. It’s a little rough around the edges in a few places, but overall very impressive.

1) First, ensure that network settings for VirtualBox should be set to NAT – no port forwarding needed

 

2) Now, here’s what took a while to dig up: VirtualBox connects http://10.0.2.2 to the Mac’s localhost (thanks to this forum post) That is, just typing that into IE within the VM should connect you to http://localhost running on the Mac.

 

Configuring Parallels

1) For Parallels, you want to make sure that your VM is running on a Shared Network.

 

2) If you’re setup with this network configuration, you should be able to directly access the local IP address of your Mac, as shown in the System Preferences > Sharing screen:

 

Configuring VMWare Fusion

1) VMWare Fusion is much like Parallels. Configure your VM to “Share the Mac’s network connection” via NAT.

 

2) If you’re setup with this network configuration, you should be able to directly access the local IP address of your Mac, as shown in the System Preferences > Sharing screen, as with Parallels.

Hopefully, if you’ve made it to this point, for your respective VM, you have IE accessing your Passenger-based site. Success!

 

Hopefully, at least now you’ll be able to spend your time on all of the issues that poor users confined to IE are suffering through and get bug fixing quickly.


The Bullshitting CTO – advice for non-technical founders

Posted: June 2nd, 2010 | Author: | Filed under: posts | Tags: , , , , | 1 Comment »

I was chatting with another startup founder recently who was talking a conversation he had with his CTO and what the CTO had told him about a recent issue with their site. Some data had been lost from the site and the CTO apparently immediately started putting blame on some unknown “user error” as the cause of the unknown glitch. This is despite the fact that there didn’t seem to be any relevant connection between any of the functionality on the admin backend and the data loss that had occurred. Fortunately, there were backups and the data was able to be restored, so no real harm done (aside from frustration and lost time). However, what was more interesting was that the situation was apparently only one instance of many similar conversation – strange technology issues occur and the CTO has no clear explanation for what happened.

Startup founders, especially those with no technology background, if this happens to you – stop. Stop letting your CTO get away with providing vague explanations for what happened. Stop letting the CTO off the hook. Everyone who uses a computer knows that technology doesn’t always work right. Startups are, more often than not, dealing with a combination of bleeding edge platforms, compressed time schedules, and lack of sleep – a cocktail that can be exciting, but results in a higher than average percentage of software bugs.

The issue is not the bugs. They are expected. Any founder who expects a system to be bug free is dreaming. The issue is a CTO who can’t explain the issues that occurred, or, more likely, doesn’t want to take responsibility for the issues. And even more importantly, the implications that this has for the technology side of your business.

Founders with no technical experience are in a difficult position in startup world. So much of a startup’s life is centered around the technology. As the company moves from customer development to product development, for someone who doesn’t understand the tech, the startup world becomes a wild roller-coaster ride with the CTO in the drivers’ seat. Make no mistake – you are more or less at the mercy of your tech co-founder if you don’t understand the tech, so you had better pick a good one.

A good way to look at this is what if your CTO walks away – do you know how to access the code, how the architecture is setup, how to get into the various administration tools, how to access the backups? Ideally, the question is yes to all of the above, but startup world is chaotic. New systems are being added, servers are reconfigured – change is ever present. Are you up to speed on stuff?

So, what’s the point? The point is – if your CTO can’t take responsibility for a tech issue that occurs, if she won’t walk you through what caused the issue, if he doesn’t do a root cause assessment and explain the results – then you are living on the edge. If your CTO can’t own up to one issue, how much other stuff is going on that you have no idea about?

Let’s be very clear – if you are in this situation, then you have a relationship with your CTO where the balance of power is skewed and the wellbeing of your company is at risk. You need to get clarity into what’s going on over on the tech side and restore the balance of power, and more importantly, rebalance your relationship where you’re getting truthful explanations from your CTO. Or find a new one. Heed the warning signs and protect the company.


Another quick tip for server bottlenecks – prevent against cron job overrun

Posted: May 19th, 2010 | Author: | Filed under: posts | Tags: , , , , , | 1 Comment »

As I wrote the previous post on WordPress performance, I remembered another tip that we used to solve an issue on our FanGamb servers and wanted to do another technical post to cover it.

Because FanGamb is a very data-intensive site (lots of constantly updating odds, games, results, etc.), much of the software powering the site doesn’t directly tie to the web interface and instead interacts with our database. Last winter, we had an issue where as usage increased on the site, resource usage increased faster than it should have been and systems started locking up.

Digging through the processes that were running, we finally noticed a confluence of issues. First, one of our data scripts was running away, loading a bunch of duplicated games. This wasn’t good, but the vendor was able to fix it easily. However, this first issue led to a second problem. As the number of games increased in the database, the script that processed these games kept running longer and longer to deal with the increasing number of dups. As game results update quite frequently, our cron jobs trigger in fairly close succession. What began to happen was as the first cron job took longer and longer to complete, the subsequent cron job would kick off before the first completed. So, we had cron job after cron job stacking up on the server, quickly leading to issues, as you would suspect.

The fix for this was quite simple, as well, and has since become a standard practice for us. There’s a utility that EngineYard (our host) pointed us to that implements “locking”, so that one task can’t kick off while the other is already operating – it’s called Lockrun. It uses a temporary file and system ‘flock’ing to implement this, so it’s incredibly simple to install. One little utility and a big issue solved – the best kind of solution.

If this is your cron job:

/usr/bin/php script.php > log.log

Using this utility, just change to:

/usr/bin/lockrun –lockfile=/data/path/JOBNAME.lockrun — sh -c “/usr/bin/php script.php > log.log”

Download the utility here: Lockrun


Three lessons learned from my early days of releasing software

Posted: May 7th, 2010 | Author: | Filed under: posts | Tags: , , , , , , | 1 Comment »

The other day, I received a call from a user of one of the software products I developed, back when I was running my own web development consulting firm, Shedd Technologies International. In addition to doing pure client work, there were several products that I packaged up and made available for use – some as freeware, some as commercial software. It’s been many years now since the products were actively developed and I was impressed to hear that they’re still proving to be useful.

The call provoked some thought about what I learned from the process of developing and supporting a set of packaged software products. This is a fairly long post – it’s part reflection for me, though I would think it would be also be potentially useful to any other young entrepreneurs looking at building software products.

We all have so many different experiences in our professional and personal lives. It’s really only after these are over and after you look back that you start to see what you learned from the experiences. There’s certainly a lot of value that you can get out of taking a look back, though, and this is a process that I’m going to try to do fairly regularly going forward. Especially with startups, the day-to-day is so crazy, I think it will be useful to take a step back every 6 months or so to reflect on what we’ve learned, what worked well, and what we can improve upon. But for now, here are some reflections and lessons learned from the start of my career with technology on the web”¦


 

Background

First, the background:

Not long after starting consulting work, some client project work led to several pieces of software being developed, which lent themselves to release as packaged products. For a site needing a photo gallery, I developed the code that became a simple script called PhotoGal. Another client needed an affiliate management system and this was extended into Affiliate Manager, which was because a commercial product that I sold under a consulting model ““ base license fees + fees to install and adapt it to your individual business. There were also two products other products developed, as well, which were not extensions of consulting work. Support Services Manager was created as a helpdesk tool for internal use, but was released and really only used by external users. DLMan, a digital product delivery system, was perhaps my most concerted attempt to develop a packaged product.

There are a lot of lessons that I can pull from this work. As a whole, my software products were moderately successful, depending on your metrics. While it’s hard to estimate the actual installed base, Support Services Manager (SSM) was widely downloaded with more than 20,000 downloads, in addition to being packaged into the Fantastico web-hosting control panel tool. PhotoGal probably had around 5,000 downloads. For the commercial products, it’s much easier to gauge, because of sales figures. Affiliate Manager had a small user base, seeing use with a handful of clients. DLMan was more successful, at least in terms of revenue.

DLMan took about a week of development effort to build out for the first release and went on sale for $45/license + additional charges for installation and extension modules. If I was billing the development time at what was my standard consulting rate, the product would have roughly broken even. DLMan did drum up a steady business in related consulting, modifying the product to suit a variety of purposes and industries and this business certainly helped make the product worthwhile.

Certainly, in terms of users or sales, I didn’t have any blockbuster successes. Still, the process of developing the software, supporting it, and learning what users really needed was most important, though, and gave me a number of lessons that I have pulled from as my career has progressed.

At a high level, developing the software gave me a better understanding of what it takes to write a system from scratch and made me a better consultant with this ability to look at systems with a deeper perspective. Supporting the software also put me in touch with the people actually using the code and showed me a whole host of new considerations to take into account when building products, along with the value of actually speaking with customers. I also gained experience with pricing, discounts, sales, and numerous other important points. But perhaps most important was that I learned about what’s really critical in terms of releasing software.

 

Lesson Learned #1: Being Open

Being young and naïve, the products were all released under a fairly restrictive license that I wrote myself. I wanted to keep people downloading the software through my site and was unsure what opening the code up under a full OSI-compatible license would have meant.

This is something that I still see today. There is this instinct, that because you built something, you want to hold on to it and try to find some way to benefit from it.

In retrospect, the better road would have been the open source road. While I was concerned about keeping control of the code and limiting modifications, this was counterproductive to what I should have focused on – increasing adoption and getting the code into the hands of users – building a community around the products. As mentioned above, much of the revenue generated from these businesses was actually in the consulting work and customization of the initial products. Not only would this not have been affected, it probably would have increased, because adoption would most likely have increased from having a less-restrictive license. As a result, being open probably would have had given a nice benefit to the bottom line, in addition to being the right way to do business.

 

Lesson Learned #2: Keep Updating

Looking back, the markets that I got into were good choices and the products were well timed. SSM came out about 6 months before PerlDesk (one of the most popular helpdesk scripts at the time) and its integration with forum packages was something that distinguished the product (and drew a userbase). DLMan didn’t really have many competitors upon release, though a SaaS-based offering was released soon after (which was before many of the digital-download enabled shopping card packages).

After the fun of building a product was over, though, and having lots of new ideas, I was usually ready to move onto another project. I would still actively work with users needing help through the forums, but in terms of new releases, they were very rare. Other offerings coming into the market combined with sporadic updates to my software was not a good combination and led to a declining userbase.

 

Lesson Learned #3: Remaining Focused

Following SSM’s release, I kept looking at new competitors that were emerging and their larger feature sets. The one SSM competitor that was really noticed was PerlDesk (as mentioned above). I’m not quite sure of the exact cause, but PerlDesk got a lot of attention following its release and gained a big following. The feature set was roughly the same at the outset, but PerlDesk kept releasing and adding new features. There was a lot of temptation on my part to match their additions at first, but then the gap got quite large and I lost my inspiration for the project.

Rather than suffering from feature-envy, I probably should have gone the other direction and kept SSM as a streamlined, optimized helpdesk tool. One aspect of 37 Signals’ Getting Real methodology that really resounded with me was the focus on getting something useful out to the users and keeping the feature set lean. Most of the features competitors had were nice, but not essential. Creating an effective core product that worked really well, and then working as hard as I could to listen to customers, would have been far more effective than succumbing to trying to match competing products feature-for-feature.

 

Retrospectively, there were certainly many choices that I could have made differently which would have probably resulted in a more effective product strategy and increased adoption/success. Still, the work certainly proved its value in terms of building my core consulting business and also teaching me a lot about critical success factors for software products. My overall product strategy was decent, at least in terms of market timing, since I found real need in markets that heated up soon after I entered them. But the most valuable thing I got out of the experience was learning how to take a product from concept to functioning software, satisfy customer needs, and what worked/didn’t work in terms of making that product successful. For that, the experience was highly useful and I’m glad that I had the courage and foresight at the time to take advantage of it as much as I did.


Improving WordPress Performance

Posted: May 5th, 2010 | Author: | Filed under: posts | Tags: , , , , , | 3 Comments »

Hopefully, this blog is now loading a bit faster for you. As I think about how an individual’s web presence is increasingly a major part of their personal brand, it’s obvious that how that web presence operates must play into how the personal brand is perceived. Would you hire a web developer if their blog falls apart in your browser? Or a designer who’s site looks like it went through the Geocity-izer? Of course not. So, as a technology executive, I felt that a slow loading blog probably wasn’t good evidence of my abilities and invested a little bit of time to cut down some of the bottlenecks… (The last string of posts have been business-focused, too – it was time for another technical post.)

So, for other who may be running into similar issues, here are the steps I did to increase the performance of WordPress for my site:

 

Understand the Problem

First, you need to understand the issues that you’re facing. How severe is the load time and response time for your blog? How does it compare with other sites you’re running? I started tracking my blog with Pingdom to monitor the average response time. This demonstrated that there was a huge opportunity to cut the response time – it was twice that of other sites that I run.

Another good way to evaluate where your site stands is to use the Y-Slow plugin. This will give you many useful pointers and things to look into.

As for my analysis via Pingdom, you can see from the chart below, the first half shows extremely high average response times. After the changes, that has evened out and is much improved.

 

Memory Conflicts

The first step that I did was move WordPress to its own isolated username. For ease of administration in my shared environment (with DreamHost), I had been running a couple of sites under one username. As I started digging into the bottlenecks, DreamHost pointed out that processes were getting killed under that master username due to memory limitations. Having multiple sites trying to run processes at the same time under one user was a major reason for this. So, an easy fix was to separate the resource intensive sites to their own usernames. Your host may not allow multiple system users under one account, but if they do, take advantage of this easy way to balance out load and avoid running into imposed resource limits. This was an easy and obvious fix, but also one that wasn’t made in any of the WordPress performance guides that I read, so it may be useful to others.

 

The Database

The database is a common bottleneck. For most applications, the database is the chokepoint, especially as load increases. I’m running all of my databases at DreamHost on a “DreamHost Private Server” – this allows me a little more control with regard to the resources consumed by the databases and allows me to adjust resources based on usage. In taking a look at the resource usage, it was clear that resource usage was climbing and hitting limits here, as well. By checking the MySQL processlist, I could see a couple of poorly performing databases and queries from a couple of development databases we also run on the server. (yes, it’s best to keep your dev and production dbs separated and are in the process of moving the dev databases off of this combined server.) Killing these and restructuring some of the databases on the box was able to alleviate this issue quickly. Bottleneck mitigated – you can see the huge drop in memory usage in the chart below.

This is easy to check on your server. Just run the following command from the MySQL console:

show processlist;

Then, either take note of the queries that are running and figure out if there’s something else using the database that’s tying up the server’s resources or kill processes that are hung / locked. (Note if you have multiple usernames accessing the database, you may need to run the command from those multiple usernames to get an accurate picture. See the MySQL docs.)

 

Caching

Finally, if you’re looking to increase your WordPress install’s performance, make sure you have the WP Super Cache plugin installed. This was already installed on my blog, but it takes care of a number of common tasks, including caching and gzip compression.

So there are a couple of easy steps to help increase the performance of WordPress running on your server. Unfortunately, dealing with performance bottlenecks is a very case-by-case, application-by-application specific process, so these may not be immediately applicable to your situation, but perhaps they’ll provide some starting points and ideas.

 

Resources

Here are some other good references and posts on the subject: