Introduction to Docker Containers

VMs, containers, Docker. Dockerfile, images, containers. Docker and docker-compose. Description and use.

Article Collection

This article is part of the following series:

1. Docker

Part 1: Introduction to Docker Containers (this article)

Introduction

Virtualization is a hardware-level feature that allows users to divide resources of one physical computer into multiple, separate (virtual) machines.

A similar, but different concept is called containerization and it is implemented in software, in the operating system’s kernel. The kernel provides the functionality necessary for the creation and running of multiple, separate, user-space containers.

The physical machine is typically called a “host” or “dom0”. Virtual machines are typically called virtual machines (VMs), guests, or containers.

VMs vs Containers

Virtual machines and containers both have their optimal use cases.

Virtual machines emulate hardware and can run completely different operating systems or even code compiled for other CPU architectures. That ultimate flexibility comes at the cost of a more complicated setup and some processing overhead.

Containers run under the host’s kernel. They incur less overhead and do not require an operating system, so they are easier to set up. A container consisting of a single, statically-compiled program could work. However, only executables that are compatible with the host’s CPU architecture and kernel can be run. In addition, as a side effect of software-based implementation, containerized processes are visible and manageable by the host OS.

Containers rely heavily on the functionality provided by the Linux kernel. Primary groups of such functionality are Linux control groups, Linux namespaces, and Linux capabilities.

Docker

Docker is a complete containerization solution. It comes with all the tools necessary for creating software images, running them in containers, and supporting them through their lifecycle.

Images are snapshots of files packed together as a single unit. Containers are running instances of those images.

Docker is a high-level technology that builds on top of numerous computing, Unix, Linux, and networking concepts. Being a complete solution, it also supports remote storage, convenient downloading, and orchestration of containers.

A system very similar to Docker, but one that is completely free and open source and less popular among companies, is called Linux Containers (LXC).

Docker Components

As mentioned, Docker is a complete containerization solution. It consists of the following components:

Dockerfile syntax, which is a configuration language allowing one to define how images are built. Images are often created by referencing other, existing images and customizing them
Engine for building the images, which takes Dockerfiles as input and produces Docker images as output
Functionality for uploading Docker images to public or private (authenticated) Docker registries
Functionality for downloading Docker images from existing public or private (authenticated) Docker registries
Engine for running the containers, which runs Docker images in isolated environments (containers)
Engine for orchestrating the containers, which runs groups of possibly dependent and scalable containers (docker compose)

Command docker is a single front-end program used to access all of Docker’s functionality. Different subcommands invoke different functions, such as docker image ..., docker container ..., and docker system ....

Previously a separate command docker-compose was used for orchestration, but its function has since been integrated into docker compose. It has a dedicated config file in [docker-]compose.yml.

Finally there is Docker Swarm, which can centrally manage a fleet of machines running Docker. That is a separate system and out of scope of this article.

Why Use Docker?

There are a number of reasons why you might want to use Docker.

For example:

If the host OS is outdated and can’t be upgraded, as long as the kernel is satisfactory newer software can run in containers
If you want to try out a program, but not risk it making modifications to your host OS, you can run it in a container
Some software is complex to install, so its authors might have prepared ready-to-use Docker containers, which is good for beginners
Container images can be used as basic units for software deployment, especially in the cloud, e.g. in Kubernetes

Why Not Use Docker?

This section is not intended to steer you away from using Docker, but to help you position it more correctly in your mind. Containers are sometimes ideal to deploy software and tinker with systems and concepts, but there could be concerns.

In a non-container scenario, when you want to use software on GNU/Linux, you install it via the host’s native package management tools (such as Debian’s apt), configure it if needed, and run it. That is the default workflow. You are involved in the whole process, while also taking maximum advantage of all the effort that package maintainers have invested in:

Reviewing the licensing and quality of the software
Making it adhere to the distribution’s defined standards
Integrating it into the distribution’s standard procedures and tools
Documenting it and often providing configuration examples
Pre-configuring it and generally making it ready and easy to use

Docker images, on the other hand, can be created by anyone. Images are not verified or tuned by the distribution’s package maintainers, and software in them is not installed and configured manually by end users. Both steps are already done in advance by image authors and to their liking.

That raises the following concerns:

Images may contain code or behavior that you would not approve. Since images are bigger and less transparent than packages, you might start using them without knowing what exactly they are doing, or without the determination necessary to audit and remove the offending parts
Software images, being pre-installed and pre-configured, can deliver functionality quickly. But if you rely only on images, you never learn how to install and configure software yourself. That makes you potentially miss out on the features of original software that were not made available through the image, makes you unable to customize it, and reduces your level of skill in general
Using software through images and containers, or through proprietary platforms on which they may be deployed, might make you accustomed to using “software as a service”, rather than demanding to have full control and ownership of your software, data, and devices

Installation

Docker

Docker installation is not a part of this article since it is adequately covered in numerous places elsewhere.

For Debian GNU-based systems, see for example Install Docker Engine on Debian and then return here.

Permissions

In a default scenario, Docker uses a simple permission model where all members of group docker are able to use it.

So our first task is to add the current user to group docker:

sudo adduser $USER docker

Adding user `user' to group `docker' ...
Done.

Note that the operating system caches user group memberships for performance. Group memberships are re-read on first user login. Thus, for the cache to be refreshed and the new group membership applied, you should completely log out of the system, and then log back in. If that is inconvenient, temporarily you can force creation of a new shell wit the new group visible using one of the following methods:

newgrp docker
newgrp $USER

su - $USER

To confirm that you have the necessary privilege to use Docker, simply run id to check that “docker” is in the list of auxiliary groups, and then run e.g. docker ps. If no error message is printed, you are OK.

Quick Start

Starting Containers

As mentioned, in its core Docker is a system for creating software images and running them in containers.

However, we do not have to build all images ourselves. Docker maintains a public registry of available images, and as soon as we reference an image that does not exist locally, Docker will connect to its public Internet registry and try downloading it from there.

(Docker’s eagerness to look up images remotely is even inconvenient – it only takes a 1-letter typo or a mismatch in image version for Docker to not find it locally and try download it from the public registry!)

Let’s start using Docker by confirming that, in a fresh installation, we do not have any containers or images. The following commands should just print empty results:

# Show running containers
docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

# Show all containers
docker container list -a
CONTAINER ID   IMAGE     COMMAND     CREATED          STATUS          PORTS     NAMES

# Show all images available locally
docker images
REPOSITORY    TAG       IMAGE ID       CREATED        SIZE

Now, knowing about Docker’s automatic lookup of images in the public registry, let’s run our first container “hello-world”.

When we run the command, the first part of output will be from Docker, informing us about downloading the image. The second part will be the actual message from the container that was started, printing “Hello from Docker” and a bunch of extra text.

Here is the first part of the output in which Docker is telling us that the image is being downloaded:

docker run --rm hello-world

Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
719385e32844: Pull complete 
Digest: sha256:dcba6daec718f547568c562956fa47e1b03673dd010fe6ee58ca806767031d1c
Status: Downloaded newer image for hello-world:latest

And here is the second part, the output from the running container that begins with “Hello from Docker!”:

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (amd64)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

If that is the output you see – it works!

Now we can check our list of local images again. One image will be there:

docker images

REPOSITORY    TAG       IMAGE ID       CREATED        SIZE
hello-world   latest    9c7a54a9a43c   3 months ago   13.3kB

The command docker ps, which shows running containers, will still be empty. That is because our container has started, printed the message, and exited, so there are no running containers at the moment:

docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

Building Images

While we have the “hello-world” image at hand, let’s show building our own image that we will produce by customizing “hello-world”.

We already mentioned that the steps for building Docker images are stored in Dockerfiles, and that new images can be built on top of existing ones.

The existing “hello-world” image contains and runs a program stored at path /hello.

To make our image just a little different, we are going to add a new command, /hello2, which will just print a brief Hello, World! to the screen and exit.

First, we need to create the hello2 program. If you have programmed in C, you will recognize the following snippet as a C program. But in any case, just run the following commands – they will install the compiler, create a minimal .c file, compile it, and run it:

sudo apt install gcc

echo -e '#include <stdio.h>\n#include <stdlib.h>\n\nint main() {\n  printf("Hello, World!\\n");\n  exit(0);\n}' > hello2.c
cat hello2.c
gcc -o hello2 -static hello2.c

./hello2

Hello, World!

Now that we have our program hello2, let’s create a Dockerfile for our new image.

# Dockerfile

FROM hello-world
COPY ./hello2 /
CMD [ "/hello2" ]

The above lines specify that we want to use an existing image hello-world as a base, copy file hello2 from the host to /hello2 in the new image, and run /hello2 every time the container starts.

Note that Dockerfiles only define how images are built, not how they are named or which version they are; those options are passed at build time.

With hello2 and Dockerfile, we can build our image with:

docker build -f Dockerfile -t hello-world2 .  # (Don't forget the dot at the end)

Once the image is built, we can verify its presence in the local Docker cache:

docker images

REPOSITORY    TAG       IMAGE ID       CREATED        SIZE
hello-world2  latest    d97789789d8d   4 seconds ago  775kB

Note that the tag (version) “latest” is important. If a version is not specified when trying to run an image, Docker will look for an image tagged “latest”.

And we can now start a container based on our image “hello-world2”:

docker run --rm hello-world2

Hello, World!

Explicitly mentioning the command to run in the container (/hello2) was not necessary because we already configured it as the default CMD in Dockerfile.

But since we built our image on top of the original “hello-world”, and our image contains both /hello and /hello2, what if we wanted to run the original /hello?

We should just specify the command to run after all other parameters. The command will override the default CMD that was defined in the image:

docker run --rm hello-world2 /hello

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
...
...

Removing Images

The images hello-world and hello-world2 are extremely simple. They consist of programs /hello and /hello2 which print welcome messages and exit. There is nothing else useful we can do with them, other than maybe inspecting them for the sake of practice and then removing them from the cache:

docker image inspect hello-world

{
		"Id": "sha256:9c7a54a9a43cca047013b82af109fe963fde787f63f9e016fdc3384500c2823d",
		"RepoTags": [
				"hello-world:latest"
		],
		...
		...
		...

docker image rm hello-world
docker image rm hello-world2

It is possible that above commands will fail, saying:

Error response from daemon: conflict: unable to remove repository reference "hello-world2" (must force) - container ... is using its reference image d97789789d8d

That simply means there are containers which still reference this image, so the image cannot be deleted. List all containers and remove them before removing the images:

docker container list -a
docker container rm ...

docker image rm hello-world
docker image rm hello-world2

Lastly, while working with Docker, you will notice that its cache can easily fill gigabytes of disk space, so we will also show a space-saving command here. That command will not make a difference with just a few images, but will come handy in the future. Please note that it will remove all cache and images:

docker system df
docker system prune -f

Interacting with Containers

By default, Docker creates one default virtual network and all containers get a network IP address from that subnet so they can talk to each other.

The containers also have access to the host’s networking, so if the host machine is connected to the Internet, containers will be able to access it as well. However, other than that default egress provision, containers run in completely separate environments, including storage.

Separate environments are great for isolation, but may be a problem for durability of data. While it is quite normal to have long-lived containers, containers are often also created temporarily, and in any case the data is lost when containers are deleted. Similarly, container isolation may be a problem if we actually want the host and containers to talk to each other.

There are a couple ways to enable that interaction:

By exposing containers’ network ports to the host OS or other containers
By copying additional files or other required data directly into the containers at build time
By mounting some host OS directories (ad-hoc volumes) into the containers at startup time

Requesting disk volumes to be mounted inside containers is done with option -v HOST_PATH:CONTAINER_PATH, and exposing network ports is done with option -p HOST_PORT:CONTAINER_PORT.

Let’s show those scenarios in practice.

Exposing Network Ports in Containers

We have seen the “hello-world” image in the previous chapter. The image did not exist locally, so it was automatically pulled from Docker’s public registry when we ran it.

That container did not require much interaction. All it did was print a welcome message and exit.

But to show network interaction with containers and set things up for other examples, we are going to explicitly download and run a Docker image for Apache, an HTTP (web) server. The image name is httpd:

docker pull httpd

To be useful, an HTTP server must be reachable by clients. We are going to run the container and set up its networking so that the OS’ port 8080 is routed to port 80 in the container.

Port 80 is a standard port on which web (HTTP) servers are listening for unencrypted (non-SSL/TLS) connections. It is preferable that we do not use port 80 directly on the host because it may already be in use by an existing web server running on the host, or we might not have the necessary permission to bind to a port number below 1024.

docker run -ti --rm -p 8080:80 httpd

This will start the container in foreground (non-detached) mode.

We can now use a web browser to open http://0:8080/ and we will be greeted by Apache with a simple message “It works!”.

When you are done with the test, press Ctrl+c to terminate the foreground process. The container will exit when the command exits, and because of option --rm, the container will also be removed automatically upon termination.

Additional Files in Containers

But, what about a more useful website? What if we had a personal or company website, and wanted to serve it from this container?

If you are familiar with the basics of HTTP protocol, you know the original idea was that a client would request a particular URL on the server, that URL would map to some HTML file on disk, and the server would return the file contents to the user.

From the documentation on Docker official image 'httpd', we see that Apache’s root directory for serving HTML files is set to /usr/local/apache2/htdocs/.

Therefore, the simplest thing we can do to serve our website instead of the default “It works!” is to copy our files over the default ones.

Let’s do that now and confirm that it worked by seeing the message change from “It works!” to “Hello, World!”:

First, locally we will create a directory public_html/, containing one page for our new website:

mkdir public_html
echo "<html><body>Hello, World!</body></html>" > public_html/index.html

Then, we will create a separate Dockerfile, e.g. Dockerfile.apache, for our new image:

FROM httpd
COPY ./public_html/ /usr/local/apache2/htdocs/

And finally, we will build and run the image:

docker build -f Dockerfile.apache -t hello-apache2 .  # (Don't forget the dot at the end)

docker run -ti --rm --name test-website -p 8080:80 hello-apache2

Visiting http://0:8080/ will now show our website with message “Hello, World!”.

We are done with the test so press Ctrl+c to terminate the process.

Mounted Host OS Directories

The previous example worked, but copying data into images is not very flexible. When data changes, we need to rebuild the images and also restart containers that are using them.

As mentioned earlier, the solution is to mount host OS directories (ad-hoc volumes) into the container with option -v HOST_PATH:CONTAINER_PATH.

Since we already have our public_html/ directory, and mounting volumes does not require changing the images, we can use the original httpd image directly:

docker run -ti --rm --name test-website-volume -p 8080:80 -v ./public_html:/usr/local/apache2/htdocs/ httpd

Visiting http://0:8080/ will now show our new website and message “Hello, World!”.

But the example is not equivalent to the previous one. This data is now “live”. If we modify any file in public_html/ and visit it through the browser, we will immediately see the updated contents. (You might need to press Ctrl+r, F5, or Ctrl+Shift+r, or click Shift+Reload in the browser to force refreshing the visible page.)

Furthermore, since we now have a long-running container, we can verify its presence in the output of docker ps:

docker ps

CONTAINER ID   IMAGE     COMMAND            CREATED      STATUS      PORTS                                    NAMES
66bb93476t99   httpd     "http-foreground"  1 hour ago   Up 1 hour   0.0.0.0:8080->80/tcp, :::8080->80/tcp    test-website-volume

Running Commands in Containers

In containers, just like in any environment, you can only run commands that exist. The commands may exist because they have been included in the image, mounted from a volume, or copied into the container in runtime.

Let’s take a look at commonly used scenarios for manually or automatically running programs in containers.

At Startup

From Dockerfile

There are two Dockerfile directives that define the default program to run at startup – ENTRYPOINT and CMD. Both are by default empty (undefined).

The full command that Docker will run is $ENTRYPOINT $CMD. (That is, any ENTRYPOINT to which any CMD is appended.)

We have seen the example of CMD in our earlier Dockerfile:

FROM hello-world
COPY ./hello2 /
CMD [ "/hello2" ]

Example of CMD with additional command line arguments:

CMD [ "/some/program", "--with-option", "123" ]

And an example of both ENTRYPOINT and CMD, which will result in Docker starting /some/program --with-option 123:

ENTRYPOINT [ "/some/program" ]
CMD [ "--with-option", "123" ]

Note that ENTRYPOINT and CMD above show the preferred “exec” syntax, but a “shell” syntax is also available if really necessary. See ENTRYPOINT and CMD for details.

From Command Line

It is possible to override both ENTRYPOINT and CMD on the command line, at time of container startup.

Option --entrypoint overrides ENTRYPOINT, while CMD is overridden just by listing arguments on the command line:

#                          [       ENTRYPOINT       ]   [      CMD      ]
docker run --rm some-image --entrypoint /some/program   --with-option 123

In Runtime

Often times we want to run commands in containers that are already active and running.

To show an example, let’s first start a generic container that just runs Debian GNU/Linux:

docker run --name my_debian -ti --rm debian

root@5821b3a41434:/#

Then, in another terminal let’s run docker ps to confirm our container is running:

CONTAINER ID   IMAGE     COMMAND   CREATED          STATUS          PORTS     NAMES
5821b3a41434   debian    "bash"    15 seconds ago   Up 13 seconds             my_debian

Now with a running container, we can execute commands in it via docker container exec. Here is an example that shows disk space in the container:

docker container exec -ti my_debian df -h

Filesystem      Size  Used Avail Use% Mounted on
overlay          15G  6.0G  7.6G  44% /
tmpfs            64M     0   64M   0% /dev
tmpfs           1.2G     0  1.2G   0% /sys/fs/cgroup
shm              64M     0   64M   0% /dev/shm
/dev/xvda3       15G  6.0G  7.6G  44% /etc/hosts
tmpfs           1.2G     0  1.2G   0% /proc/asound
tmpfs           1.2G     0  1.2G   0% /proc/acpi
tmpfs           1.2G     0  1.2G   0% /proc/scsi
tmpfs           1.2G     0  1.2G   0% /sys/firmware

If there exists a shell in the container, which in a Debian image it of course does, we can also run a shell itself, which will give us an interactive command line:

docker container exec -ti my_debian /bin/bash

root@5821b3a41434:/#

The shell can be exited as usual, using commands logout or exit or by pressing Ctrl+d.

Hybrid

It is completely fine to combine startup and runtime methods of executing commands in Docker containers.

In many container images that implement some client/server applications, such as databases, it is customary that their Docker image will start the server by default, but if you want to start a client, then you override the command to just start the client.

You can do this either by running docker run IMAGE_NAME [CMD] twice, one time without, and one time with the command manually specified. This will run the same image twice, in two separate containers, and you will be able to confirm this with docker ps.

However, you can also run the second command with docker container exec CONTAINER CMD and it would have a similar, but different effect. It would run the second command in the first container, rather than starting two separate containers.

Article Collection

This article is part of the following series:

1. Docker

Part 1: Introduction to Docker Containers (this article)

Automatic Links

The following links appear in the article:

Article Collection

1. Docker

Table of Contents

Introduction

VMs vs Containers

Docker

Docker Components

Why Use Docker?

Why Not Use Docker?

Installation

Docker

Permissions

Quick Start

Starting Containers

Building Images

Removing Images

Interacting with Containers

Exposing Network Ports in Containers

Additional Files in Containers

Mounted Host OS Directories

Running Commands in Containers

At Startup

From Dockerfile

From Command Line

In Runtime

Hybrid

Article Collection

1. Docker

Automatic Links