Article Collection
This article is part of the following series:1. Docker
- Part 1: Introduction to Docker Containers (this article)
Table of Contents
Introduction
Virtualization is a hardware-level feature that allows users to divide resources of one physical computer into multiple, separate (virtual) machines.
A similar, but different concept is called containerization and it is implemented in software, in the operating system’s kernel. The kernel provides the functionality necessary for the creation and running of multiple, separate, user-space containers.
The physical machine is typically called a “host” or “dom0”. Virtual machines are typically called virtual machines (VMs), guests, or containers.
VMs vs Containers
Virtual machines and containers both have their optimal use cases.
Virtual machines emulate hardware and can run completely different operating systems or even code compiled for other CPU architectures. That ultimate flexibility comes at the cost of a more complicated setup and some processing overhead.
Containers run under the host’s kernel. They incur less overhead and do not require an operating system, so they are easier to set up. A container consisting of a single, statically-compiled program could work. However, only executables that are compatible with the host’s CPU architecture and kernel can be run. In addition, as a side effect of software-based implementation, containerized processes are visible and manageable by the host OS.
Containers rely heavily on the functionality provided by the Linux kernel. Primary groups of such functionality are Linux control groups, Linux namespaces, and Linux capabilities.
Docker
Docker is a complete containerization solution. It comes with all the tools necessary for creating software images, running them in containers, and supporting them through their lifecycle.
Images are snapshots of files packed together as a single unit. Containers are running instances of those images.
Docker is a high-level technology that builds on top of numerous computing, Unix, Linux, and networking concepts. Being a complete solution, it also supports remote storage, convenient downloading, and orchestration of containers.
A system very similar to Docker, but one that is completely free and open source and less popular among companies, is called Linux Containers (LXC).
Docker Components
As mentioned, Docker is a complete containerization solution. It consists of the following components:
-
Dockerfile syntax, which is a configuration language allowing one to define how images are built. Images are often created by referencing other, existing images and customizing them
-
Engine for building the images, which takes Dockerfiles as input and produces Docker images as output
-
Functionality for uploading Docker images to public or private (authenticated) Docker registries
-
Functionality for downloading Docker images from existing public or private (authenticated) Docker registries
-
Engine for running the containers, which runs Docker images in isolated environments (containers)
-
Engine for orchestrating the containers, which runs groups of possibly dependent and scalable containers (
docker compose
)
Command docker
is a single front-end program used to access all of Docker’s functionality. Different subcommands invoke different functions, such as docker image ...
, docker container ...
, and docker system ...
.
Previously a separate command docker-compose
was used for orchestration, but its function has since been integrated into docker compose
. It has a dedicated config file in [docker-]compose.yml
.
Finally there is Docker Swarm, which can centrally manage a fleet of machines running Docker. That is a separate system and out of scope of this article.
Why Use Docker?
There are a number of reasons why you might want to use Docker.
For example:
-
If the host OS is outdated and can’t be upgraded, as long as the kernel is satisfactory newer software can run in containers
-
If you want to try out a program, but not risk it making modifications to your host OS, you can run it in a container
-
Some software is complex to install, so its authors might have prepared ready-to-use Docker containers, which is good for beginners
-
Container images can be used as basic units for software deployment, especially in the cloud, e.g. in Kubernetes
Why Not Use Docker?
This section is not intended to steer you away from using Docker, but to help you position it more correctly in your mind. Containers are sometimes ideal to deploy software and tinker with systems and concepts, but there could be concerns.
In a non-container scenario, when you want to use software on GNU/Linux, you install it via the host’s native package management tools (such as Debian’s apt
), configure it if needed, and run it.
That is the default workflow. You are involved in the whole process, while also taking maximum advantage of all the effort that package maintainers have invested in:
- Reviewing the licensing and quality of the software
- Making it adhere to the distribution’s defined standards
- Integrating it into the distribution’s standard procedures and tools
- Documenting it and often providing configuration examples
- Pre-configuring it and generally making it ready and easy to use
Docker images, on the other hand, can be created by anyone. Images are not verified or tuned by the distribution’s package maintainers, and software in them is not installed and configured manually by end users. Both steps are already done in advance by image authors and to their liking.
That raises the following concerns:
-
Images may contain code or behavior that you would not approve. Since images are bigger and less transparent than packages, you might start using them without knowing what exactly they are doing, or without the determination necessary to audit and remove the offending parts
-
Software images, being pre-installed and pre-configured, can deliver functionality quickly. But if you rely only on images, you never learn how to install and configure software yourself. That makes you potentially miss out on the features of original software that were not made available through the image, makes you unable to customize it, and reduces your level of skill in general
-
Using software through images and containers, or through proprietary platforms on which they may be deployed, might make you accustomed to using “software as a service”, rather than demanding to have full control and ownership of your software, data, and devices
Installation
Docker
Docker installation is not a part of this article since it is adequately covered in numerous places elsewhere.
For Debian GNU-based systems, see for example Install Docker Engine on Debian and then return here.
Permissions
In a default scenario, Docker uses a simple permission model where all members of group docker
are able to use it.
So our first task is to add the current user to group docker
:
sudo adduser $USER docker
Adding user `user' to group `docker' ...
Done.
Note that the operating system caches user group memberships for performance. Group memberships are re-read on first user login. Thus, for the cache to be refreshed and the new group membership applied, you should completely log out of the system, and then log back in. If that is inconvenient, temporarily you can force creation of a new shell wit the new group visible using one of the following methods:
newgrp docker
newgrp $USER
su - $USER
To confirm that you have the necessary privilege to use Docker, simply run id
to check that “docker” is in the list of auxiliary groups, and then run e.g. docker ps
. If no error message is printed, you are OK.
Quick Start
Starting Containers
As mentioned, in its core Docker is a system for creating software images and running them in containers.
However, we do not have to build all images ourselves. Docker maintains a public registry of available images, and as soon as we reference an image that does not exist locally, Docker will connect to its public Internet registry and try downloading it from there.
(Docker’s eagerness to look up images remotely is even inconvenient – it only takes a 1-letter typo or a mismatch in image version for Docker to not find it locally and try download it from the public registry!)
Let’s start using Docker by confirming that, in a fresh installation, we do not have any containers or images. The following commands should just print empty results:
# Show running containers
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# Show all containers
docker container list -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
# Show all images available locally
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
Now, knowing about Docker’s automatic lookup of images in the public registry, let’s run our first container “hello-world”.
When we run the command, the first part of output will be from Docker, informing us about downloading the image. The second part will be the actual message from the container that was started, printing “Hello from Docker” and a bunch of extra text.
Here is the first part of the output in which Docker is telling us that the image is being downloaded:
docker run --rm hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
719385e32844: Pull complete
Digest: sha256:dcba6daec718f547568c562956fa47e1b03673dd010fe6ee58ca806767031d1c
Status: Downloaded newer image for hello-world:latest
And here is the second part, the output from the running container that begins with “Hello from Docker!”:
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/
For more examples and ideas, visit:
https://docs.docker.com/get-started/
If that is the output you see – it works!
Now we can check our list of local images again. One image will be there:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest 9c7a54a9a43c 3 months ago 13.3kB
The command docker ps
, which shows running containers, will still be empty. That is because
our container has started, printed the message, and exited, so there are no running containers
at the moment:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Building Images
While we have the “hello-world” image at hand, let’s show building our own image that we will produce by customizing “hello-world”.
We already mentioned that the steps for building Docker images are stored in Dockerfile
s, and that new images can be built on top of existing ones.
The existing “hello-world” image contains and runs a program stored at path /hello
.
To make our image just a little different, we are going to add a new command, /hello2
, which will just print a brief Hello, World!
to the screen and exit.
First, we need to create the hello2
program. If you have programmed in C, you will recognize the following snippet as a C program.
But in any case, just run the following commands – they will install the compiler, create a minimal .c
file, compile it, and run it:
sudo apt install gcc
echo -e '#include <stdio.h>\n#include <stdlib.h>\n\nint main() {\n printf("Hello, World!\\n");\n exit(0);\n}' > hello2.c
cat hello2.c
gcc -o hello2 -static hello2.c
./hello2
Hello, World!
Now that we have our program hello2
, let’s create a Dockerfile
for our new image.
# Dockerfile
FROM hello-world
COPY ./hello2 /
CMD [ "/hello2" ]
The above lines specify that we want to use an existing image hello-world
as a base, copy
file hello2
from the host to /hello2
in the new image, and run /hello2
every time the container starts.
Note that Dockerfiles only define how images are built, not how they are named or which version they are; those options are passed at build time.
With hello2
and Dockerfile
, we can build our image with:
docker build -f Dockerfile -t hello-world2 . # (Don't forget the dot at the end)
Once the image is built, we can verify its presence in the local Docker cache:
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world2 latest d97789789d8d 4 seconds ago 775kB
Note that the tag (version) “latest” is important. If a version is not specified when trying to run an image, Docker will look for an image tagged “latest”.
And we can now start a container based on our image “hello-world2”:
docker run --rm hello-world2
Hello, World!
Explicitly mentioning the command to run in the container (/hello2
) was not necessary because we already configured it as the default CMD in Dockerfile
.
But since we built our image on top of the original “hello-world”, and our image contains both /hello
and /hello2
, what if we wanted to run the original /hello
?
We should just specify the command to run after all other parameters. The command will override the default CMD
that was defined in the image:
docker run --rm hello-world2 /hello
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
...
...
Removing Images
The images hello-world
and hello-world2
are extremely simple. They consist of programs /hello
and /hello2
which print welcome messages and exit.
There is nothing else useful we can do with them, other than maybe inspecting them for the sake of practice and then removing them from the cache:
docker image inspect hello-world
{
"Id": "sha256:9c7a54a9a43cca047013b82af109fe963fde787f63f9e016fdc3384500c2823d",
"RepoTags": [
"hello-world:latest"
],
...
...
...
docker image rm hello-world
docker image rm hello-world2
It is possible that above commands will fail, saying:
Error response from daemon: conflict: unable to remove repository reference "hello-world2" (must force) - container ... is using its reference image d97789789d8d
That simply means there are containers which still reference this image, so the image cannot be deleted. List all containers and remove them before removing the images:
docker container list -a
docker container rm ...
docker image rm hello-world
docker image rm hello-world2
Lastly, while working with Docker, you will notice that its cache can easily fill gigabytes of disk space, so we will also show a space-saving command here. That command will not make a difference with just a few images, but will come handy in the future. Please note that it will remove all cache and images:
docker system df
docker system prune -f
Interacting with Containers
By default, Docker creates one default virtual network and all containers get a network IP address from that subnet so they can talk to each other.
The containers also have access to the host’s networking, so if the host machine is connected to the Internet, containers will be able to access it as well. However, other than that default egress provision, containers run in completely separate environments, including storage.
Separate environments are great for isolation, but may be a problem for durability of data. While it is quite normal to have long-lived containers, containers are often also created temporarily, and in any case the data is lost when containers are deleted. Similarly, container isolation may be a problem if we actually want the host and containers to talk to each other.
There are a couple ways to enable that interaction:
-
By exposing containers’ network ports to the host OS or other containers
-
By copying additional files or other required data directly into the containers at build time
-
By mounting some host OS directories (ad-hoc volumes) into the containers at startup time
Requesting disk volumes to be mounted inside containers is done with option -v HOST_PATH:CONTAINER_PATH
, and exposing network ports is done with option -p HOST_PORT:CONTAINER_PORT
.
Let’s show those scenarios in practice.
Exposing Network Ports in Containers
We have seen the “hello-world” image in the previous chapter. The image did not exist locally, so it was automatically pulled from Docker’s public registry when we ran it.
That container did not require much interaction. All it did was print a welcome message and exit.
But to show network interaction with containers and set things up for other examples, we are going to explicitly download and run a Docker image for Apache, an HTTP (web) server.
The image name is httpd
:
docker pull httpd
To be useful, an HTTP server must be reachable by clients. We are going to run the container and set up its networking so that the OS’ port 8080
is routed to port 80
in the container.
Port 80
is a standard port on which web (HTTP) servers are listening for unencrypted (non-SSL/TLS) connections.
It is preferable that we do not use port 80
directly on the host because it may already be in use by an existing web server running on the host, or we might not have the necessary permission to bind to a port number below 1024.
docker run -ti --rm -p 8080:80 httpd
This will start the container in foreground (non-detached) mode.
We can now use a web browser to open http://0:8080/ and we will be greeted by Apache with a simple message “It works!”.
When you are done with the test, press Ctrl+c
to terminate the foreground process. The container will exit when the command exits, and because of option --rm
, the container will also be removed automatically upon termination.
Additional Files in Containers
But, what about a more useful website? What if we had a personal or company website, and wanted to serve it from this container?
If you are familiar with the basics of HTTP protocol, you know the original idea was that a client would request a particular URL on the server, that URL would map to some HTML file on disk, and the server would return the file contents to the user.
From the documentation on Docker official image 'httpd', we see that Apache’s
root directory for serving HTML files is set to /usr/local/apache2/htdocs/
.
Therefore, the simplest thing we can do to serve our website instead of the default “It works!” is to copy our files over the default ones.
Let’s do that now and confirm that it worked by seeing the message change from “It works!” to “Hello, World!”:
First, locally we will create a directory public_html/
, containing one page for our new website:
mkdir public_html
echo "<html><body>Hello, World!</body></html>" > public_html/index.html
Then, we will create a separate Dockerfile, e.g. Dockerfile.apache
, for our new image:
FROM httpd
COPY ./public_html/ /usr/local/apache2/htdocs/
And finally, we will build and run the image:
docker build -f Dockerfile.apache -t hello-apache2 . # (Don't forget the dot at the end)
docker run -ti --rm --name test-website -p 8080:80 hello-apache2
Visiting http://0:8080/ will now show our website with message “Hello, World!”.
We are done with the test so press Ctrl+c
to terminate the process.
Mounted Host OS Directories
The previous example worked, but copying data into images is not very flexible. When data changes, we need to rebuild the images and also restart containers that are using them.
As mentioned earlier, the solution is to mount host OS directories (ad-hoc volumes) into the container with option -v HOST_PATH:CONTAINER_PATH
.
Since we already have our public_html/
directory, and mounting volumes does not require changing the images, we can use the original httpd
image directly:
docker run -ti --rm --name test-website-volume -p 8080:80 -v ./public_html:/usr/local/apache2/htdocs/ httpd
Visiting http://0:8080/ will now show our new website and message “Hello, World!”.
But the example is not equivalent to the previous one. This data is now “live”. If we modify any file in public_html/
and visit it through the browser, we will immediately see the updated contents.
(You might need to press Ctrl+r
, F5
, or Ctrl+Shift+r
, or click Shift+Reload
in the browser to force refreshing the visible page.)
Furthermore, since we now have a long-running container, we can verify its presence in the output of docker ps
:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
66bb93476t99 httpd "http-foreground" 1 hour ago Up 1 hour 0.0.0.0:8080->80/tcp, :::8080->80/tcp test-website-volume
Running Commands in Containers
In containers, just like in any environment, you can only run commands that exist. The commands may exist because they have been included in the image, mounted from a volume, or copied into the container in runtime.
Let’s take a look at commonly used scenarios for manually or automatically running programs in containers.
At Startup
From Dockerfile
There are two Dockerfile
directives that define the default program to run at startup –
ENTRYPOINT and
CMD.
Both are by default empty (undefined).
The full command that Docker will run is $ENTRYPOINT $CMD
. (That is, any ENTRYPOINT
to which any CMD
is appended.)
We have seen the example of CMD
in our earlier Dockerfile
:
FROM hello-world
COPY ./hello2 /
CMD [ "/hello2" ]
Example of CMD
with additional command line arguments:
CMD [ "/some/program", "--with-option", "123" ]
And an example of both ENTRYPOINT
and CMD
, which will result in Docker starting /some/program --with-option 123
:
ENTRYPOINT [ "/some/program" ]
CMD [ "--with-option", "123" ]
Note that ENTRYPOINT
and CMD
above show the preferred “exec” syntax, but a “shell” syntax is also available if really necessary.
See ENTRYPOINT and
CMD for details.
From Command Line
It is possible to override both ENTRYPOINT
and CMD
on the command line, at time of container startup.
Option --entrypoint
overrides ENTRYPOINT
, while CMD
is overridden just by listing arguments on the
command line:
# [ ENTRYPOINT ] [ CMD ]
docker run --rm some-image --entrypoint /some/program --with-option 123
In Runtime
Often times we want to run commands in containers that are already active and running.
To show an example, let’s first start a generic container that just runs Debian GNU/Linux:
docker run --name my_debian -ti --rm debian
root@5821b3a41434:/#
Then, in another terminal let’s run docker ps
to confirm our container is running:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5821b3a41434 debian "bash" 15 seconds ago Up 13 seconds my_debian
Now with a running container, we can execute commands in it via docker container exec
. Here is an example that shows disk space in the container:
docker container exec -ti my_debian df -h
Filesystem Size Used Avail Use% Mounted on
overlay 15G 6.0G 7.6G 44% /
tmpfs 64M 0 64M 0% /dev
tmpfs 1.2G 0 1.2G 0% /sys/fs/cgroup
shm 64M 0 64M 0% /dev/shm
/dev/xvda3 15G 6.0G 7.6G 44% /etc/hosts
tmpfs 1.2G 0 1.2G 0% /proc/asound
tmpfs 1.2G 0 1.2G 0% /proc/acpi
tmpfs 1.2G 0 1.2G 0% /proc/scsi
tmpfs 1.2G 0 1.2G 0% /sys/firmware
If there exists a shell in the container, which in a Debian image it of course does, we can also run a shell itself, which will give us an interactive command line:
docker container exec -ti my_debian /bin/bash
root@5821b3a41434:/#
The shell can be exited as usual, using commands logout
or exit
or by pressing Ctrl+d
.
Hybrid
It is completely fine to combine startup and runtime methods of executing commands in Docker containers.
In many container images that implement some client/server applications, such as databases, it is customary that their Docker image will start the server by default, but if you want to start a client, then you override the command to just start the client.
You can do this either by running docker run IMAGE_NAME [CMD]
twice, one time without, and one time with the command manually specified. This will run the same image twice, in two separate containers, and you will be able to confirm this with docker ps
.
However, you can also run the second command with docker container exec CONTAINER CMD
and it would have a similar, but different effect. It would run the second command in the first container, rather than starting two separate containers.
Article Collection
This article is part of the following series:1. Docker
- Part 1: Introduction to Docker Containers (this article)
Automatic Links
The following links appear in the article:
1. Install Docker Engine on Debian - https://docs.docker.com/engine/install/debian/
2. CMD - https://docs.docker.com/reference/dockerfile/#cmd
3. ENTRYPOINT - https://docs.docker.com/reference/dockerfile/#entrypoint
4. Linux Control Groups - https://en.wikipedia.org/wiki/Cgroups
5. CPU Architectures - https://en.wikipedia.org/wiki/Comparison_of_CPU_architectures
6. Kernel - https://en.wikipedia.org/wiki/Kernel_(computer_science)
7. Linux Containers (LXC) - https://en.wikipedia.org/wiki/LXC
8. Linux Kernel - https://en.wikipedia.org/wiki/Linux_kernel
9. Linux Namespaces - https://en.wikipedia.org/wiki/Linux_namespaces
10. Orchestration - https://en.wikipedia.org/wiki/Orchestration_(computing)
11. Statically-Compiled - https://en.wikipedia.org/wiki/Static_library
12. User-Space - https://en.wikipedia.org/wiki/User-space
13. Virtualization - https://en.wikipedia.org/wiki/Virtualization
14. Containerization - https://en.wikipedia.org/wiki/Virtualization#Containerization
15. Docker Official Image 'Httpd' - https://hub.docker.com/_/httpd
16. Linux Capabilities - https://linux-audit.com/kernel/capabilities/linux-capabilities-101/