New file |
| | |
| | | #+TODO: TODO(t) NEXT(n) WAITING(w) SOMEDAY(s) DELEGATED(g) PROJ(p) PLANNED(l) | DONE(d) FORWARDED(f) CANCELLED(c) |
| | | #+startup: beamer |
| | | #+LaTeX_CLASS: beamer |
| | | #+LaTeX_CLASS_OPTIONS: [a4paper] |
| | | #+LaTeX_CLASS_OPTIONS: [captions=tableheading] |
| | | #+LATEX_HEADER: \usetheme{Warsaw} \usepackage{courier} |
| | | #+LATEX_HEADER: \usepackage{textpos} |
| | | #+LATEX_HEADER: \RequirePackage{fancyvrb} |
| | | #+LATEX_HEADER: \DefineVerbatimEnvironment{verbatim}{Verbatim}{fontsize=\tiny} |
| | | #+LATEX_HEADER: \setbeamercolor{title}{fg=green} |
| | | #+LATEX_HEADER: \setbeamercolor{structure}{fg=black} |
| | | #+LATEX_HEADER: \setbeamercolor{section in head/foot}{fg=green} |
| | | #+LATEX_HEADER: \setbeamercolor{subsection in head/foot}{fg=green} |
| | | #+LATEX_HEADER: \setbeamercolor{item}{fg=green} |
| | | #+LATEX_HEADER: \setbeamerfont{frametitle}{family=\ttfamily} |
| | | # logo |
| | | #+LATEX_HEADER: \addtobeamertemplate{frametitle}{}{ \begin{textblock*}{100mm}(0.85\textwidth,-0.8cm) \includegraphics[height=0.7cm,width=2cm]{niit-logo.png} \end{textblock*}} |
| | | #+OPTIONS: toc:nil title:nil ^:nil |
| | | #+LANGUAGE: en |
| | | #+TITLE: What are containers? |
| | | |
| | | * |
| | | file:~/git/olbohlen-org/presentations/praesentation-containers-intro.png |
| | | |
| | | |
| | | |
| | | * knock knock...wake up... |
| | | |
| | | You want to know what containers are? |
| | | |
| | | |
| | | |
| | | * The Spoon |
| | | ** left :BMCOL: |
| | | :PROPERTIES: |
| | | :BEAMER_col: 0.5 |
| | | :END: |
| | | - Do not try to run containers, that's impossible. Instead, only try to realize the truth... |
| | | |
| | | - What truth? |
| | | |
| | | - There is no container... |
| | | |
| | | - There is no container? |
| | | |
| | | - Then you'll see that it is not the container which runs, it is the process itself. |
| | | |
| | | ** right :BMCOL: |
| | | :PROPERTIES: |
| | | :BEAMER_col: 0.7 |
| | | :END: |
| | | |
| | | file:~/git/olbohlen-org/presentations/neo-spoon.jpg |
| | | |
| | | |
| | | |
| | | * What Is A Process? |
| | | |
| | | - it has its own private memory |
| | | - violations against process memory borders get a SIGSEGV(11) |
| | | - a process has a heap, a stack, code (TEXT) and data (ANON) |
| | | - the process can be observed by \textcolor{green}{ps}(1), which shows some attributes: |
| | | |
| | | #+begin_example |
| | | $ ps -fp $$ |
| | | UID PID PPID C STIME TTY TIME CMD |
| | | olbohlen 11651 10046 0 23:07:43 pts/6 0:00 ksh |
| | | #+end_example |
| | | |
| | | - we see the user id, process id, parent-pid, start time, the tty, the cpu time and command name |
| | | - in UNIX these attributes are bundled in a C structure called proc_t |
| | | - Linux uses task_struct which is a more hierarchical structure |
| | | |
| | | |
| | | |
| | | * Container Implementations |
| | | |
| | | There are various implementations: |
| | | - Linux: OpenVZ (2005), docker (2013), podman (~2018), etc... |
| | | - FreeBSD: jails (Mar 2000) |
| | | - illumos/Solaris: containers (Feb 2004) |
| | | - AIX: wpars |
| | | and various others... |
| | | |
| | | |
| | | |
| | | * The Lady In The Red Dress |
| | | |
| | | Welcome to a training program, let's start a simple container with podman... |
| | | |
| | | #+begin_example |
| | | [olbohlen@rhel85 ~]$ podman run -d ubi8 sleep 10000 |
| | | 6b336fb0012f6f3d8fadca333e1e2bd900b7ede9560594bb0c5acc27a3aef4ee |
| | | [olbohlen@rhel85 ~]$ podman ps |
| | | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES |
| | | 6b336fb0012f registry.access.redhat.com/ubi8:latest sleep 10000 2 seconds ago Up 2 seconds ago nifty_hypatia |
| | | [olbohlen@rhel85 ~]$ ps -ef | grep "sleep 10000" |
| | | olbohlen 5026 5017 0 23:44 ? 00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 10000 |
| | | [root@6b336fb0012f /]# ps -ef | grep sleep |
| | | root 1 0 0 22:44 ? 00:00:00 /usr/bin/coreutils --coreutils-prog-shebang=sleep /usr/bin/sleep 10000 |
| | | [olbohlen@rhel85 ~]$ ps -fZp 5017 |
| | | LABEL UID PID PPID C STIME TTY TIME CMD |
| | | unconfined_u:system_r:container_runtime_t:s0 olbohlen 5017 1 0 23:44 ? 00:00:00 /usr/bin/conmon --api-version 1 -c 6b336fb0012f6f3d8fadca333e1e2bd900b7ede95 |
| | | #+end_example |
| | | |
| | | |
| | | |
| | | * First a Few Details |
| | | |
| | | - podman uses \textcolor{green}{runc}(8) - the OCI container runtime |
| | | - containers are instantiated using different technologies |
| | | - namespaces: providing resource "visibilities" |
| | | - cgroups: limiting compute resources as cpu and memory |
| | | - chroot: creating a fake root directory |
| | | - seccomp: limiting access to systemcalls |
| | | - SELinux: proving extra layers to prevent escapes |
| | | |
| | | |
| | | |
| | | * The World You See Is Not Real |
| | | |
| | | Namespaces "scope" the visibility of various things |
| | | Linux supports different types of \textcolor{green}{namespaces}(7) like: |
| | | |
| | | - cgroup: Cgroup root directory |
| | | - ipc: System V IPC, POSIX message queues |
| | | - mnt: Mount points |
| | | - net: Network devices, stacks, ports, etc. |
| | | - pid: Process IDs |
| | | - user: User and group IDs |
| | | - uts: Hostname and NIS domain name |
| | | |
| | | Which can isolate processes in different ways |
| | | Namespaces can be created by \textcolor{green}{unshare}(1) |
| | | |
| | | |
| | | * Let's Learn Some Kung Fu |
| | | |
| | | Let's build a simple container on our own with \textcolor{green}{unshare}(1) and \textcolor{green}{chroot}(1): |
| | | |
| | | #+begin_example |
| | | $ mkdir -p ~/sysroot/{bin,lib64,proc} |
| | | $ for f in $(ldd /bin/{bash,df,ls,lsns,mount,ps,uname} | \ |
| | | > tr '[ :]' '\n' | grep /); do cp $f sysroot/$f; done |
| | | $ sudo mount --bind /home/olbohlen/sysroot/proc /home/olbohlen/sysroot/proc |
| | | $ unshare -irmnpuUCf --mount-proc=$PWD/sysroot/proc chroot $PWD/sysroot /bin/bash |
| | | bash-4.4# /bin/ps -ef |
| | | UID PID PPID C STIME TTY TIME CMD |
| | | 0 1 0 0 16:58 ? 00:00:00 /bin/bash |
| | | 0 2 1 0 16:58 ? 00:00:00 /bin/ps -ef |
| | | bash-4.4# /bin/mount |
| | | /dev/mapper/rhel_rhel85-root on /proc type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) |
| | | proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) |
| | | #+end_example |
| | | |
| | | |
| | | |
| | | * So, Was That Real? |
| | | |
| | | Well, we have to trust Linux here a bit... |
| | | But on other UNIX systems we can actually dig deeper: |
| | | |
| | | Our rabbit hole entry is the kernel debugger, which we can attach |
| | | to a running UNIX kernel and observe (and modify) the system live. |
| | | |
| | | Allow me to do that on illumos, as the process structures are a |
| | | bit more "organized". |
| | | |
| | | |
| | | |
| | | * Again A Bit Of Boring Info |
| | | |
| | | When we attach the kernel debugger (mdb) against a running kernel, |
| | | we have raw memory access. UNIX organizes data in C structures, |
| | | which may contain other data types such as int or char (or again |
| | | structs).\\ |
| | | \\ |
| | | A simple C structure could look like this: |
| | | #+begin_src C :exports code |
| | | struct position { |
| | | int x; |
| | | int y; |
| | | }; |
| | | #+end_src |
| | | |
| | | And if we would read the struct it may look like: |
| | | #+begin_example |
| | | position.x = 42 |
| | | position.y = 23 |
| | | #+end_example |
| | | |
| | | |
| | | |
| | | * With Annoying Details... |
| | | |
| | | (Un)fortunately the debugger does not know the format of a data |
| | | structure at a given address, so we need to validate that we |
| | | got correct data.\\ |
| | | \\ |
| | | The debugger has some commands to look at known places for certain |
| | | structures, such as the process table or in our example the list |
| | | of containers.\\ |
| | | \\ |
| | | |
| | | |
| | | |
| | | * Down The Rabbit Hole |
| | | |
| | | So let's run the debugger: |
| | | |
| | | #+begin_example |
| | | (701) x230:/root# mdb -k |
| | | Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci zfs sata sd ip hook neti sockfs arp usba i915 xhci mm smbios fctl stmf stmf_sbd lofs random idm cpc crypto fcip fcp ufs logindmux nsmb ptm smbsrv nfs sppp kvm ipc ] |
| | | > ::zone |
| | | ADDR ID STATUS NAME PATH |
| | | fffffffffbd08c20 0 running global / |
| | | fffffe16886626c0 1 running asterix /export/zones/asterix/root/ |
| | | fffffe16929ebd80 2 running obelix /export/zones/obelix/root/ |
| | | fffffe16bbc66500 4 running rhel85 /export/zones/rhel85/root/ |
| | | #+end_example |
| | | |
| | | Wait, we have a container called "rhel85", wasn't that the rhel machine from the demos before? |
| | | Yes, actually that container runs a bhyve hypervisor process which runs RHEL 8.5... |
| | | #+begin_example |
| | | (629) x230:/export/home/olbohlen$ ps -f -z rhel85 |
| | | UID PID PPID C STIME TTY TIME CMD |
| | | root 15267 5136 0 17:57:03 ? 1:40 /usr/sbin/bhyve -U 37960a3a-c5ac-6c8b-d14b-8204ca044474 -H -B 1,manufacturer=Op |
| | | root 5113 1 0 Jan 01 ? 0:00 zsched |
| | | root 5136 5113 0 Jan 01 ? 0:00 /usr/bin/python3.5 /usr/lib/brand/bhyve/init |
| | | (630) x230:/export/home/olbohlen$ |
| | | #+end_example |
| | | |
| | | * Down The Rabbit Hole |
| | | |
| | | We have the bhyve process with the PID 15267 running according to \textcolor{green}{ps}(1), let's look in mdb: |
| | | #+begin_example |
| | | > ::ps ! egrep "(PID|bhyve)" |
| | | S PID PPID PGID SID UID FLAGS ADDR NAME |
| | | R 15267 5136 5113 5113 0 0x4a004000 fffffe16b8d6d010 bhyve |
| | | #+end_example |
| | | |
| | | ADDR is the start address in RAM for the proc_t data structure |
| | | |
| | | #+begin_example |
| | | > fffffe16b8d6d010::print -a proc_t ! less |
| | | [...] |
| | | > fffffe16b8d6d010::print -a proc_t p_user.u_psargs |
| | | fffffe16b8d6d879 p_user.u_psargs = [ "/usr/sbin/bhyve -U 37960a3a-c5ac-6c8b-d14b-8204ca044474 -H -B 1,manufacturer=Op" ] |
| | | #+end_example |
| | | |
| | | The proc_t structure store all attributes to a process, so those that \textcolor{green}{ps}(1) shows and more. |
| | | Also in that proc_t we have the container id in it (p_zone, think of it as the namespace id): |
| | | |
| | | #+begin_example |
| | | > fffffe16b8d6d010::print -a proc_t p_zone |
| | | fffffe16b8d6d658 p_zone = 0xfffffe16bbc66500 |
| | | > 0xfffffe16bbc66500::zone |
| | | ADDR ID STATUS NAME PATH |
| | | fffffe16bbc66500 4 running rhel85 /export/zones/rhel85/root/ |
| | | > |
| | | #+end_example |
| | | |
| | | |
| | | * Container Images |
| | | ** left :BMCOL: |
| | | :PROPERTIES: |
| | | :BEAMER_col: 0.5 |
| | | :END: |
| | | - We need images. |
| | | Lots of images. |
| | | ** right :BMCOL: |
| | | :PROPERTIES: |
| | | :BEAMER_col: 0.7 |
| | | :END: |
| | | |
| | | file:~/git/olbohlen-org/presentations/matrix-storage.jpg |
| | | |
| | | |
| | | |
| | | * Storing The Data |
| | | |
| | | podman/docker use so called images to instantiate containers. |
| | | These images are made of Layers, like viewfoils on overhead projectors. |
| | | |
| | | #+begin_example |
| | | [olbohlen@rhel85 scratch]$ skopeo inspect docker://registry.access.redhat.com/rhscl/postgresql-10-rhel7 \ |
| | | > | jq ".Layers" |
| | | [ |
| | | "sha256:ac08ca107ad9ed699cbd28339749dd6463a84c73aa1d468a4241385fc4ec3876", |
| | | "sha256:b46ca46c303b49d886a7585735ebd1dc8651e83d0fab5823300cf3a9fd2febc1", |
| | | "sha256:cdd22b43a6f986fc909d504043ef6ad6528a6c1927f27c80eea2d19ffe5079fe", |
| | | "sha256:4c9f611df095eef49c081f758ad314b62a297172e22a8a746514d252a7a89c45" |
| | | ] |
| | | #+end_example |
| | | |
| | | This image contains four layers which itself are tar archives which you can extract. |
| | | |
| | | |
| | | |
| | | * Exploring The Image |
| | | |
| | | Let's extract an image to a local directory: |
| | | |
| | | #+begin_example |
| | | [olbohlen@rhel85 scratch]$ skopeo copy --remove-signatures \ |
| | | > docker://registry.access.redhat.com/rhscl/postgresql-10-rhel7 dir:///$PWD |
| | | Copying blob ac08ca107ad9 done |
| | | Copying blob b46ca46c303b done |
| | | Copying blob cdd22b43a6f9 done |
| | | Copying blob 4c9f611df095 done |
| | | Copying config 00a55534f8 done |
| | | Writing manifest to image destination |
| | | Storing signatures |
| | | [olbohlen@rhel85 scratch]$ ls |
| | | 00a55534f8db45877d6657cc9b1ba77841c49cb21cc4d7a4c9cd4e98020a4bc8 |
| | | 4c9f611df095eef49c081f758ad314b62a297172e22a8a746514d252a7a89c45 |
| | | ac08ca107ad9ed699cbd28339749dd6463a84c73aa1d468a4241385fc4ec3876 |
| | | b46ca46c303b49d886a7585735ebd1dc8651e83d0fab5823300cf3a9fd2febc1 |
| | | cdd22b43a6f986fc909d504043ef6ad6528a6c1927f27c80eea2d19ffe5079fe |
| | | manifest.json |
| | | version |
| | | #+end_example |
| | | |
| | | Also use \textcolor{green}{jq}(1) to inspect the manifest and the config. |
| | | |
| | | |
| | | |
| | | * Finding The Image Config |
| | | |
| | | There's an obvious manifest.json, so let's look into it. |
| | | #+begin_example |
| | | [olbohlen@rhel85 scratch]$ jq ".config.digest" <manifest.json |
| | | "sha256:00a55534f8db45877d6657cc9b1ba77841c49cb21cc4d7a4c9cd4e98020a4bc8" |
| | | #+end_example |
| | | |
| | | That's our image config, itself a json file: |
| | | #+begin_example |
| | | [olbohlen@rhel85 scratch]$ jq . 00a55534f8db45877d6657cc9b1ba77841c49cb21cc4d7a4c9cd4e98020a4bc8 |
| | | { |
| | | "architecture": "amd64", |
| | | [...] |
| | | #+end_example |
| | | |
| | | Looks familiar? Yes, that's more or less podman inspect. |
| | | |
| | | In the manifest.json we also see the layers: |
| | | |
| | | #+begin_example |
| | | [olbohlen@rhel85 scratch]$ jq ".layers[].digest" manifest.json |
| | | "sha256:ac08ca107ad9ed699cbd28339749dd6463a84c73aa1d468a4241385fc4ec3876" |
| | | "sha256:b46ca46c303b49d886a7585735ebd1dc8651e83d0fab5823300cf3a9fd2febc1" |
| | | "sha256:cdd22b43a6f986fc909d504043ef6ad6528a6c1927f27c80eea2d19ffe5079fe" |
| | | "sha256:4c9f611df095eef49c081f758ad314b62a297172e22a8a746514d252a7a89c45" |
| | | [olbohlen@rhel85 scratch]$ du -h ac08ca107ad9ed699cbd2833[...] |
| | | 73M ac08ca107ad9ed699cbd28339749dd6463a84c73aa1d468a4241385fc4ec3876 |
| | | 4.0K b46ca46c303b49d886a7585735ebd1dc8651e83d0fab5823300cf3a9fd2febc1 |
| | | 7.0M cdd22b43a6f986fc909d504043ef6ad6528a6c1927f27c80eea2d19ffe5079fe |
| | | 33M 4c9f611df095eef49c081f758ad314b62a297172e22a8a746514d252a7a89c45 |
| | | #+end_example |
| | | These are \textcolor{green}{tar}(1) archives we can extract and inspect. |
| | | When you start a container, the extracted layers will be mounted with |
| | | OverlayFS. |
| | | |
| | | |
| | | |
| | | * Can We Simulate Layering? |
| | | |
| | | podman uses \textcolor{green}{fuse-overlayfs}(1) to mount container image layers. |
| | | Since Linux 4.18 this can be done also by non-root users: |
| | | |
| | | #+begin_example |
| | | $ mkdir layer1 |
| | | $ mkdir layer2 |
| | | $ mkdir ephemeral-layer |
| | | $ mkdir mountdir |
| | | $ echo "this is file one" >layer1/f1 |
| | | $ echo "this is file two" >layer2/f2 |
| | | $ fuse-overlayfs -o lowerdir=$PWD/layer1:$PWD/layer2 -o upperdir=$PWD/ephemeral-layer \ |
| | | > -o workdir=$PWD/fuse-work $PWD/mountdir |
| | | $ ls mountdir |
| | | f1 f2 |
| | | $ echo "this is file three" >mountdir/f3 |
| | | $ fusermount -u $PWD/mountdir |
| | | $ ls */f? |
| | | ephemeral-layer/f3 layer1/f1 layer2/f2 |
| | | #+end_example |
| | | |
| | | |
| | | |
| | | * Communication System |
| | | ** left :BMCOL: |
| | | :PROPERTIES: |
| | | :BEAMER_col: 0.5 |
| | | :END: |
| | | - We need an exit! |
| | | |
| | | ** right :BMCOL: |
| | | :PROPERTIES: |
| | | :BEAMER_col: 0.7 |
| | | :END: |
| | | |
| | | file:~/git/olbohlen-org/presentations/matrix-communication.jpg |
| | | |
| | | * Network Access |
| | | |
| | | podman uses CNI (Container Native Interface) to provide a network |
| | | interface for a container (so, a namespaced NIC), which will be usuall |
| | | created on a bridge. This is only possible for containers started as root: |
| | | |
| | | #+begin_example |
| | | [olbohlen@rhel85 ~]$ sudo podman run -it registry.access.redhat.com/ubi8 \ |
| | | > bash -c "(dnf install -y iproute && ip a s)" |
| | | Updating Subscription Management repositories. |
| | | [...] |
| | | Complete! |
| | | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 |
| | | link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 |
| | | inet 127.0.0.1/8 scope host lo |
| | | valid_lft forever preferred_lft forever |
| | | inet6 ::1/128 scope host |
| | | valid_lft forever preferred_lft forever |
| | | 2: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default |
| | | link/ether ca:dc:cc:3a:c9:e5 brd ff:ff:ff:ff:ff:ff link-netnsid 0 |
| | | inet 10.88.0.3/16 brd 10.88.255.255 scope global eth0 |
| | | valid_lft forever preferred_lft forever |
| | | inet6 fe80::c8dc:ccff:fe3a:c9e5/64 scope link |
| | | valid_lft forever preferred_lft forever |
| | | #+end_example |
| | | |
| | | |
| | | |
| | | * Communication Without Privileges |
| | | |
| | | Since a normal user can't instantiate interfaces usually, rootless containers |
| | | can't use an interface on a bridge. Instead rootless containers use the userland |
| | | tap driver (known from openvpn or virtualbox for example): |
| | | |
| | | #+begin_example |
| | | [olbohlen@rhel85 ~]$ podman run -it registry.access.redhat.com/ubi8 \ |
| | | > bash -c "(dnf install -y iproute && ip a s)" |
| | | Updating Subscription Management repositories. |
| | | [...] |
| | | Complete! |
| | | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 |
| | | link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 |
| | | inet 127.0.0.1/8 scope host lo |
| | | valid_lft forever preferred_lft forever |
| | | inet6 ::1/128 scope host |
| | | valid_lft forever preferred_lft forever |
| | | 2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000 |
| | | link/ether 86:21:df:f9:40:43 brd ff:ff:ff:ff:ff:ff |
| | | inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0 |
| | | valid_lft forever preferred_lft forever |
| | | inet6 fe80::8421:dfff:fef9:4043/64 scope link |
| | | valid_lft forever preferred_lft forever |
| | | #+end_example |
| | | |
| | | |
| | | |
| | | * A TAP On The Net |
| | | |
| | | The tap driver is part of the universal tun/tap driver being developed since 1999 for |
| | | Linux, FreeBSD and Solaris. It allows user processes to create an interface. |
| | | Depending on your code it will create a tun or a tap interface. |
| | | |
| | | What is the difference? |
| | | |
| | | - a tun interface behaves like a Point-To-Point interface and handles IP packets |
| | | - a tap interface behaves like a Ethernet interface and handles Ethernet frames |
| | | |
| | | All packets sent to these interfaces will be received by the application which created |
| | | them. Popular examples are the \textcolor{green}{pppd}(8) or openvpn. |
| | | |
| | | podman uses \textcolor{green}{slirp4netns}(1) to create a user-mode network interface |
| | | |
| | | |
| | | |
| | | * Let's Hack The Matrix |
| | | |
| | | so, first we set up our simple container again: |
| | | |
| | | #+begin_example |
| | | $ mkdir -p ~/sysroot/{bin,lib64,proc,sbin} |
| | | $ for f in $(ldd /bin/{bash,df,ls,lsns,mount,ps,uname,ping} /sbin/{ip,ifconfig} | \ |
| | | > tr '[ :]' '\n' | grep /); do cp $f sysroot/$f; done |
| | | $ sudo mount --bind /home/olbohlen/sysroot/proc /home/olbohlen/sysroot/proc |
| | | $ unshare -irmnpuUCf --mount-proc=$PWD/sysroot/proc chroot $PWD/sysroot /bin/bash |
| | | bash-4.4# /bin/ps -ef |
| | | UID PID PPID C STIME TTY TIME CMD |
| | | 0 1 0 0 16:58 ? 00:00:00 /bin/bash |
| | | 0 2 1 0 16:58 ? 00:00:00 /bin/ps -ef |
| | | bash-4.4# /bin/mount |
| | | /dev/mapper/rhel_rhel85-root on /proc type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota) |
| | | proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) |
| | | #+end_example |
| | | |
| | | |
| | | |
| | | * Let's Hack The Matrix |
| | | |
| | | On the host OS: |
| | | |
| | | #+begin_example |
| | | [olbohlen@rhel85 ~]$ pgrep -P $(pgrep -x unshare) bash |
| | | 2425 |
| | | [olbohlen@rhel85 ~]$ slirp4netns --configure --mtu=65520 2425 tap0 |
| | | sent tapfd=5 for tap0 |
| | | received tapfd=5 |
| | | Starting slirp |
| | | * MTU: 65520 |
| | | * Network: 10.0.2.0 |
| | | * Netmask: 255.255.255.0 |
| | | * Gateway: 10.0.2.2 |
| | | * DNS: 10.0.2.3 |
| | | * Recommended IP: 10.0.2.100 |
| | | WARNING: 127.0.0.1:* on the host is accessible as 10.0.2.2 (set --disable-host-loopback to prohibit connecting to 127.0.0.1:*) |
| | | #+end_example |
| | | |
| | | |
| | | |
| | | * Let's Hack The Matrix |
| | | |
| | | Back in the container: |
| | | #+begin_example |
| | | bash-4.4# /sbin/ip a s |
| | | 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 |
| | | link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 |
| | | inet 127.0.0.1/8 scope host lo |
| | | valid_lft forever preferred_lft forever |
| | | inet6 ::1/128 scope host |
| | | valid_lft forever preferred_lft forever |
| | | 2: tap0: <BROADCAST,UP,LOWER_UP> mtu 65520 qdisc fq_codel state UNKNOWN group default qlen 1000 |
| | | link/ether be:0c:f2:d0:28:79 brd ff:ff:ff:ff:ff:ff |
| | | inet 10.0.2.100/24 brd 10.0.2.255 scope global tap0 |
| | | valid_lft forever preferred_lft forever |
| | | inet6 fe80::bc0c:f2ff:fed0:2879/64 scope link |
| | | valid_lft forever preferred_lft forever |
| | | bash-4.4# /bin/ping 10.0.2.2 |
| | | PING 10.0.2.2 (10.0.2.2) 56(84) bytes of data. |
| | | 64 bytes from 10.0.2.2: icmp_seq=1 ttl=255 time=0.563 ms |
| | | 64 bytes from 10.0.2.2: icmp_seq=2 ttl=255 time=0.127 ms |
| | | ^C |
| | | --- 10.0.2.2 ping statistics --- |
| | | 2 packets transmitted, 2 received, 0% packet loss, time 1007ms |
| | | rtt min/avg/max/mdev = 0.127/0.345/0.563/0.218 ms |
| | | #+end_example |
| | | |
| | | |
| | | |
| | | * Other Hacks With Rootless |
| | | |
| | | A bigger issue is that a user cannot start processes with different uids. |
| | | For podman rootless containers, there is a UID mapping.\\ |
| | | The file /etc/subuid specifies a range of uids per user: |
| | | |
| | | #+begin_example |
| | | [olbohlen@rhel85 ~]$ id -a |
| | | uid=4100(olbohlen) gid=4100(olbohlen) groups=4100(olbohlen),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 |
| | | [olbohlen@rhel85 ~]$ cat /etc/subuid |
| | | olbohlen:100000:65536 |
| | | #+end_example |
| | | That means all uids from 100000 to 165535 are reserved for olbohlen. |
| | | The mapping looks like this: |
| | | |
| | | #+ATTR_LATEX: :environment longtable :align l|l |
| | | | uid in container | uid outside container | |
| | | |------------------+--------------------------| |
| | | | 0 | 4100 (users primary uid) | |
| | | | 1 | 100000 (first subuid) | |
| | | | 2 | 100001 | |
| | | | ... | ... | |
| | | |
| | | |
| | | |
| | | * And What Is A Pod? |
| | | ** left :BMCOL: |
| | | :PROPERTIES: |
| | | :BEAMER_col: 0.5 |
| | | :END: |
| | | - a pod is a set of containers |
| | | |
| | | - usually contains side-car containers |
| | | |
| | | - these containers share certain namespaces |
| | | |
| | | ** right :BMCOL: |
| | | :PROPERTIES: |
| | | :BEAMER_col: 0.7 |
| | | :END: |
| | | |
| | | file:~/git/olbohlen-org/presentations/matrix-pod.jpg |
| | | |
| | | |
| | | |
| | | * Why Using Side-Cars? |
| | | |
| | | Kubernetes does not manage containers, it manages pods as |
| | | the most atomic item. |
| | | #+begin_src ditaa :file pods-1.png :cmdline -E -s 0.8 |
| | | Pod |
| | | +------------------------------------+ |
| | | | Container Container | |
| | | | +--------+ +------------------+ | |
| | | | | apache | | Monitoring Agent | | |
| | | | | (main) | | (side car) | | |
| | | | +--------+ +------------------+ | |
| | | | | |
| | | +------------------------------------+ |
| | | ^ ^ |
| | | | | |
| | | +------------+ +----------------+ |
| | | |apache image| |Monitoring image| |
| | | |{s} | |{s} | |
| | | +------------+ +----------------+ |
| | | #+end_src |
| | | |
| | | The idea is to seperate applications from helper |
| | | applications to provide seperate releases. |
| | | |
| | | |
| | | |
| | | * Pods And Linux Namespaces |
| | | |
| | | So, what namespaces does a pod share between containers? |
| | | |
| | | - net: They share the IP address and ports |
| | | - ipc: so you can use IPC (shared memory, semaphores, etc) |
| | | - uts: all containers share the same hostname |
| | | |
| | | You can also enable sharing the PID namespaces by setting: |
| | | |
| | | v1.pod.spec.shareProcessNamespace: true |
| | | |
| | | |
| | | |
| | | * Pods And Linux Namespaces |
| | | |
| | | |
| | | #+begin_example |
| | | (738) x230:/export/home/olbohlen/scratch$ oc logs hi-7459f5c556-qkxj4 |
| | | error: a container name must be specified for pod hi-7459f5c556-qkxj4, |
| | | choose one of: [hi sidecarone] |
| | | (741) x230:/export/home/olbohlen/scratch$ oc rsh -c sidecarone hi-7459f5c556-qkxj4 ps -ef |
| | | PID USER TIME COMMAND |
| | | 1 10006000 0:00 sleep 360000 |
| | | 9 10006000 0:00 ps -ef |
| | | (747) x230:/export/home/olbohlen/scratch$ oc rsh -c hi hi-7459f5c556-qkxj4 ps -ef |
| | | UID PID PPID C STIME TTY TIME CMD |
| | | 1000600+ 1 0 0 19:33 ? 00:00:00 httpd -D FOREGROUND |
| | | 1000600+ 26 1 0 19:33 ? 00:00:00 /usr/bin/coreutils --coreuti |
| | | 1000600+ 27 1 0 19:33 ? 00:00:00 /usr/bin/coreutils --coreuti |
| | | 1000600+ 28 1 0 19:33 ? 00:00:00 /usr/bin/coreutils --coreuti |
| | | 1000600+ 29 1 0 19:33 ? 00:00:00 /usr/bin/coreutils --coreuti |
| | | 1000600+ 30 1 0 19:33 ? 00:00:00 httpd -D FOREGROUND |
| | | 1000600+ 36 1 0 19:33 ? 00:00:00 httpd -D FOREGROUND |
| | | 1000600+ 43 1 0 19:33 ? 00:00:00 httpd -D FOREGROUND |
| | | 1000600+ 64 1 0 19:33 ? 00:00:00 httpd -D FOREGROUND |
| | | 1000600+ 66 1 0 19:33 ? 00:00:00 httpd -D FOREGROUND |
| | | 1000600+ 72 1 0 19:33 ? 00:00:00 httpd -D FOREGROUND |
| | | 1000600+ 82 1 0 19:33 ? 00:00:00 httpd -D FOREGROUND |
| | | 1000600+ 88 1 0 19:33 ? 00:00:00 httpd -D FOREGROUND |
| | | 1000600+ 106 0 0 19:42 pts/0 00:00:00 ps -ef |
| | | #+end_example |
| | | |
| | | * Thank You |
| | | |
| | | Thank you for your attention.\\ |
| | | \\ |
| | | Do you have any questions?\\ |
| | | \\ |
| | | Feel free to ask now or contact me later: |
| | | |
| | | [[mailto:olaf.bohlen@niit.com][olaf.bohlen@niit.com]] |