Fri 10 August 2018
A recap from an internal sharing session
The technology develops in two ways, improving with the existing one and taking a breaking through.
20 years ago, in the year of 1998, the servers were running on physical servers. There is very few things we can do to recover from a disaster. A common approach is to have one or more mirror servers standby regardless if they are needed. Once one of the server is down, the alternative servers can save you a little time to recover.
The downside with this server generation, one is that it is very difficult to determine how many servers need to preserve for your uncertain traffic. Another one is that you have to run a lot more servers than you need if you have multiple services deployed separately.
10 years ago, in year of 2008, virtualization technology was getting very mature for production using. Multiple virtual machine can be running simultaneously on one physical server. The virtual machine supports backup to a snapshot and restore from it. So if needed, we can easily create a copy of our server and scale up on demands.
Speaking of virtual machine, I have to mention AWS EC2 service. EC2 is short for Elastic Compute Cloud. It is an online virtual machine service that provided by Amazon. Based on different scenarios, the users can run various numbers of EC2 instances with different specs.
The users also can choose to create an Amazon Machine Image from a running EC2 instance. An AMI is a snapshot of a virtual machine. The user can reuse their own AMI or share their AMI with other users or public. With the flexibility from EC2 and AMI, the applications that deployed on AWS can be easily maintained or scaled.
One of the biggest advantage of virtual machine is isolation. The software running in the guest OS is fully isolated from the host OS and other virtual machines. The isolation makes sure the application in each VM is running safely and independently.
The flaw of VM is very obvious as well. The target application needs to be running in a guest operating system that based on a set of virtualized hardwares. The computing resources can be wasted heavily on those overheads.
A practical improvement in virtualization is to replace the host OS with hypervisor, which is a specific software that manage virtual machines on top of the hardwares. In this case, we can save some overhead from the host OS.
So is there any technology that is able to adopt the highly isolated environment as well as efficiency?
Yes, it's container.
Containerization includes two key technologies. One of them is Linux Namespace. Linux Namespace helps provide an isolated environment (including process management, users, etc.) for process to run. The other one is cgroups, which is designed to isolate and control resources that process is using. With these technologies, even your application is still running in the host OS, they (the app) will think themselves as running in a full OS.
As we can see, the environment of the process is isolated from the current operating system. In this case, we don’t need to virtualize the hardware and the guest OS anymore. The hardware resources are getting well utilized.
So far, we haven't mentioned anything about docker. Simply because the key technology of containerization has nothing to do with docker. Docker wrapped the dirty jobs for us so we can create container easier. (For anyone interested can refer this article to build a container without Docker)
1. When we run a container of 'Ubuntu', is it running a real Ubuntu in my host machine or not?
Yes it will be a ‘Ubuntu’ but share the kernel with the host. The Ubuntu image will provide everything except Linux kernel. If your application needs a different kernel version than your host (which is very rare), you’ll need a VM for your app then.
2. How do the images work? If I have many images, does it mean I also have all the copies?
The container provides OS kernel, each image provides a “mounting” file system. There is a base image “scratch” that contains almost nothing. Every docker image is built on top of another image. The ‘scratch’ image is the very root of them. Docker is adopting UnionFS to operate with image/container data. Take a look at this article to understand how docker image layers work.
3. When running container on Windows, is it running natively or still via the hyper-v?
Just had a look at this post from Microsoft. Windows container has two types of container, one is Windows Server Containers which is similar to Linux Container that sharing kernel between containers, the other is Hyper-V Isolation which is running a highly optimized Windows virtual machine.
4. What’s the differences between image for windows container and image for linux container?
The containers share kernel with host OS, so only image that support the host kernel can be created in a container. In windows, we run a Linux container in a Linux virtual machine (docker host).
5. How do the images manage their versions?
Each image name contains three parts: registry, name and tag. The default registry is
docker.io and the default tag is
latest. Different version of the same image has different tag. For instance, dotnet-frameworkhas tags ‘4.7.2’, ‘4.7’, 4.6’ to differentiate each version.
6. When I run a container without specify the image version, am I always getting the latest version?
docker run command will first try get the image from your local and then pull from server. So if you local repository happen cached the “latest” version, it won’t bother to pull another time. In this case you need to manually run
docker pull to update your local image.
7. When ship the app, which way is better? Build image with application, or provision the container with the app after it is up.
Because it is relatively cheap to build a docker image. Instead of running a lot overhead after a container started, it is better to wrap all you application into an image and deploy it directly on your server.