Traditionally, when people talk about Cloud computing, people talk about SaaS, PaaS & IaaS as depicted in the following diagram.
So, traditionally people think about building a PaaS on top of an IaaS. Samisa recently blogged about the practical problems we have been facing with IaaS, in particular, Amazon Web Services (AWS). To be honest, working with AWS has been a big challenge since we experienced very bad performance when it comes to IO, as well as the infrastructure itself behaving in a very unpredictable manner from time to time. Every so often, we would lose the network, and we would not even be able to connect to 127.0.0.1 from our software, forcing restarts. So much for SLAs & high availability! It is also well known that virtualization leads to IO performance degradation, so if your PaaS or SaaS is IO intensive, you may see a considerable drop in performance.
Coming to think about it, running a PaaS on top of an IaaS such as AWS could be overkill. In such a setup, we have two levels of multi-tenancy; one at the IaaS layer, where the PaaS service provider is a tenant, the other level is at the PaaS layer itself. One level of multi-tenancy at the PaaS level is what is actually required. When it comes to elasticity in a PaaS, what we actually need is a new process; in the case of Java PaaS, a new JVM, but what we do in a setup such as the one shown in the above figure is, we spin up a new image instance (in the case of AWS, a new EC2 instance), and then start a process in that new instance. Spinning up a new instance can take up to 15 minutes, so by the time a new instance boots up & is able to perform some work, the need for starting up that instance may have passed due to the traffic dropping back to normal levels.
Perhaps, the proper model would be to run your PaaS on the infrastructure (hardware+networking+OS) directly without virtualization as shown in the above diagram, and have a few cold standby EC2 instances for Cloud bursting. This is the model we will have to go with, at least until the time where we have IaaSs that are much more stable. Another advantage is, the cost of having your own hardware will be very much less than the accumulated amount you would be paying the IaaS provider, since you will be running your PaaS 24x7.
The challenges of going for such a setup include, having to implement alternatives to many of the functionality that is already provided by the IaaS. This includes geographically distributed deployments (AWS provides this through availability zones & regions), firewall functionality (AWS provide this using security groups), public IP address assignment (AWS provides this through Elastic IPs), and so on. However, the benefits of implementing such functionality at the infrastructure level will yield huge benefits for large scale Platforms-as-a-Service such as StratosLive.