published: 2010/06/19. tags: web hosting, shared hosting
Economic Hosting Plan
Lunch may not be free, but it can be economical
This post is not about which hosting provider to pick. It is about if you can get away with shared
hosting. Like most of my posts, this one is also prompted based on a few recent events. First, I
learned about two guys working on a website for two years and finally closing it because of not being
able to get enough users. The second one was when discussing how hard it is these days to make any
money with normal websites even with AdSense.
In the first case, these guys had a grand plan. They wanted control. They dreamt big. So, they decided
to go with colocation hosting which is very expensive. After two years and a mediocre response, the
bills were piling up but the returns were simply not there. So, they shutdown their website.
In the second case, my friend actually has a shared hosting. However, he was worried about thinking of
ideas for websites that require lots of processing (like having a cron job and such) because that would
mean giving up shared hosting and moving to a more costly plan like VPS (virtual private server).
In both cases they were wrong. I see many people who are just getting into the enterprenual spirit of
building the next big website running the risk of wasting too much money or not using shared hosting
effectively to build highly functional websites. This is where I wanted to share some of my thoughts
and secrets on this. I am going to discuss this with a specific example of one of the websites I have
put together for fun. It's a website for gift selection. I am
yet to finish my write up on the architecture of this website, but that's been dragging too long. I will
get there some day. But here I am going to talk about how I could develop this website that has high
demand for resources using some simple techniques.
Before we go into some technical details of ideas to effectively use shared hosting, I want to first
talk about if more advanced hosting option is really needed. Here are some statements to answer this
There are very few websites in history that have become very popular in less than a month. Take
any top 100 websites and chances are you mostly haven't heard of them in the first one month and may
be even first year of their existence. So, the traffic to your website is not going to shoot up
crazily the day you announce the world about your website. I am not trying to put your hopes down
or shatter your dreams, but talking about reality for 99.999% websites here.
How many users are you expecting realistically after 1 year of existence? You may have used a
spreadsheet and some s-curve and other tools that you know and come up with some real cool number
either to please yourself or the venture capitalists. But honestly, it's not going to be as rosy as
you might expect it to be. The internet is a place where people have lot of choices. And people want
new things. Except for a few applications like email and gaming, and these days blogging and facebook
not many websites have that same loyalty where people want to visit your website each day. So, don't
assume that your traffic is going to be cumulative as more and more users visit and even register on
your website. Many are even likely to forget that they registered on your site in just a few days.
With a good hosting provider, it's possible to just use a shared hosting and serve up a few hundred
thousand web pages a month without any sweat.
The success of many websites (or anything for that matter) depends on good marketing and not as much
on the technology and architecture (that coming from a tech guy should mean something). I am not saying
that the technology doesn't matter to the extent that there can be security problems or a page takes
more than a few seconds to load. Even if you spend all your energy in doing that, you would mostly be
sitting idle waiting for people to start using your website.
Here are some key factors that can make a website popular
Good user interface. Mint.com is a great example of this or so I was told as I don't use this type
of services where I have to give out my account details of other websites.
Search Engine Optimization. Amazon is one of the best examples of this. They have time and again
evolved their website and it's content with Google's ever changing algorithms.
Good marketing. These days viral marketing seems to be useful, but sometimes that can create
unnecessary stream of users that would wean in a week or two. For example, you probably don't want to
end up with a slashdot effect in the first few weeks of your website launch.
Everything else come next. It doesn't matter what version of database you use or if you use Linux or
Windows for your hosting or which CMS you have deployed. I mean, it might matter for you to be able to
effectively operate your website, but to the end users they only see your web pages and what content
they serve.
Infact, spending resources on just a few of the operational requirements of a great website initially
and monitoring the traffic patterns to your website should give you a good insight into whether you
should continue to further invest time and effort into that great idea of yours or not. Keep in mind
that knowing when to stop working on an idea and a website is also a good thing. It can help you focus
on other things that hopefully are benefited from your prior experience.
There are a few things that you can do to avoid costly hosting plan, at least during initial stages till
you think your website is going to really take off and make it affordable.
Search Unless your website becomes popular, you probably won't have too much content. If the
content is not very much, initially a search solution that is not very efficient may be OK. Searching
a few hundred pages occasionally is not going to be very expensive. If you reach a stage where you
have decent content that's getting slow to search but you still haven't reached critical mass for your
website, you can consider one or both the below options
Better navigation. Rather than making people search, make them browse. Well, it might sound like the
pre-Google days. But you need to realize that a better navigation is not only use friendly, it also allows
users to explore other parts of your website. Some of the best performing websites, including craigslist,
still don't offer sophisticated search capabilities, but they make it easy to navigate through the their
information island. Also, when you monitor your traffic, you would notice that the top 10 pages are
probably provide as high as 90% or more of your page views. So, make these most popular pages to be
easily accessible more prominently.
If you have designed your content such that it's SEO optimized, chances are that all your pages are
indexed by Google. Then, you can make use of Google itself to provide the search capabilities. There are
ways to limit the search results to only those pages within your website. So, needn't worry about users
leaving your website altogether.
Discussion This can be forums or blog comments. These days most people are going with blogs.
However, there are websites that offer APIs to provide threaded discussion on your website. One drawback
with these 3rd party solutions is that the content resides on their servers and so Google won't be indexing
that content against your website. Unless the discussions are critical part of your SEO strategy, there
is no need for them to be supported directly by your website. Note that some of these 3rd party providers
allow exporting all your content and so should you become popular and want to start hosting this user
generated content in house, you can always export it and start using it on your website.
Cron Jobs You might have some heavy load jobs to be run periodically. One option to consider
is if that is something that can be done remotely, probably on your home machine, and upload the final
content back. For example, if you need to bring in some 3rd party product catalog and do some heavy work,
you can do that on your home machine and upload it back. More on that in my specific example about
giftpickr. Sometimes it may not be possible to do all the task remotely. Even then, you can split the
task into what's doable remotely vs what needs to be on the website's server. Any load reduced on the
website server is always good.
Caching Do you know that Linux already provides decent amount of caching? So, if you have some
hotspots, chances are they are always in the cache, even on a shared website. Of course, on a shared
website, the chances of the cache content getting flushed out is high.
Near Realtime It's not necessary that you need to be providing real-time results. Even a big
website like Facebook doesn't provide realtime results. Just define a new group or change the name of
an existing group and immediately do a search to see if your new group shows up. The point is, even if
you generate your cache or index or whatever that supports some functionality of your website periodically,
say, once an hour or a day depending on how much your users can tolerate, you won't be losing users
for that. In fact, you might be providing real time support and that slows down your website significantly
and that can be a big let off for your users.
GiftPickr
Now let me talk about how GiftPickr works easily on a shared hosting. The website provides a product
catalog that allows end users to search and/or browse the product catalog. The product catalog is
actually provided by Overstock.com. They allow their affiliates to download the catalog. This catalog
is updated once a day. So, each day I need to download this catalog, do some processing, put it into
the database, make some complex and advanced queries to build some statistics that take up reasonable time
to compute. Here are the things I have done to be able to support this on a shared hosting.
I have enabled a cron job on my home PC so all the heavy lifting is done remotely.
First I chose SQLite instead of MySql. SQLite database is an embedded database and not a server to
which the program needs to connect. The entire database resides in a single file. The advantage of this
is that I can build the entire database on one machine and then move it to another and start using it.
At 5 am in the morning, the cron job starts and does the following
Downloads the product catalog from an ftp server
Unzips the content and processes it and builds a new SQLite database
Generates HTML fragments that do some complex processing
Zips the database and the HTML fragments
Uploads all the newly generated files
Downloads a dynamic page on my website. The content of this page is not a concern, it is downloaded
so that it triggers the subsequent processing which is to unzip the newly uploaded files.
As you can see above, rebuilding the database at home and uploading removes a lot of burden on the
web server. It may not be possible for all scenarios, but these are the type of optimizations that you
can look for.
Next, rather than uploading the files as is, I first zip them achiving a 70% compression that
reduces the network bandwidth requirement.
Also, rather than provide a full text search for my catalog, which could be more expensive, I have
chosen for a much simpler yet effective solution. As part of processing the product catalog, I generate
all the words from various different fields and store them in a table that is specifically optimized for
querying words. Because of this custom built text index, I could choose to give more weight for the same
word if it appears as a prefix vs suffix. I found this very desirable for product catalogs.
All the above mentioned optimiizations are for remote processing. Now, after offloading as much of the
tasks as possible from the web server, there are still a few that needs to be processed by the web server
as a part of the request. Here are some optimizations I have done for that
I stitch the pages using some statically generated fragments and some dynamic parts. The static
fragments are usually expensive to compute, but don't vary by request or the user.
The homepage is very large. So, I have a html file and also a zipped version of the same file. Depending
on the client support, I serve either of the two. This can reduce the bandwidth requirement.
SQLite has support for making calls to user functions as part of the aggregate sql functions. I make
use of this to do a lot of additional processing as part of the same sweep that computes the main data.
I want to write a separate article on this altogether sometime, but this helped me to reduce the number
of SQLs.
Monitor your website and figure out what your users are doing most of the time and tune your data model
as per their needs. You may start off one way, but you might end up requiring a different set of indexes
for example.
Now let me talk about some statistics of GiftPickr website. There are probably 10 to 30 unique visitors
a day, and 20 to 60 page visits. Most of the page visits result in custom keyword search I built and then
some browsing of those results. One technique I use to increase the chances of the user actually interested
in buying something from my website (which takes them to Overstock.com) is to figure out what search the
user used on the search engine to come to my website. Sometimes it's possible to know this and when I do,
I make use of this information and tailor the product catalog page more specific to this query. This
helps the users to find what they are looking for without much effort and even increases the chances of
their clicking on the results which are highly relevant.
One of the reasons I built the website is to play with building a very functional product catalog. A
side benefit to this is that I could be generating some affiliate income from this. If I had thought that
processing some 250 thousand products in the product catalog feed is going to be very expensive and so
I need a colocation server, I would have lost a lot of money. I knew, as I wouldn't be actively marketing
the website, I would only get those occasional visitors from search engines. So, I decided to come up
with an architecture that would allow me to build this site on a shared hosting. I can say that with
the limited traffic I get, I probably on average get affiliate revenue that is sufficient enough to keep
the site running and may be go out for a dinner once a month. Colocation would have probably taken out
a few dinners of my pocket.
I have also actively monitored the user visits to the website an based on that did some tweaking to help
increase the chances of users finding what they want so that they can buy it immediately. You can visit
the website and see all the functionality. It might appear slow for some searches, but after the first
search, it usually becomes very fast. When I started off this project, there were about 50 thousand
products that I used to index and provide an average response time of 3 seconds. But since then, the
index has grown double and so the performance problems are more apparent these days. But still not to
the extent that it's completely unusable. I hardly need to touch the website these days. It's 100%
automated and can sustain itself.
One can say that there is no point in building such a website as it doesn't generate much revenue, but
the experience I gained with this experiment is what I value the most.