2020-04-12 Docker Engine with Proxy 2/3

Docker Engine with Proxy — 2 of 3

Goal

Setup caching for apt through a squid proxy.

Background

Local APT usage with a Proxy

We now have a proxy that can cache, conditionally upon providing any proxy enabled command like wget with the http_proxy value on calling it or exporting it beforehand. As a Debian distribution, Ubuntu’s package manager apt is also able to use the proxy as well. apt likewise obeys the http_proxy variable, and will respond to its use the same as we did with wget.

Optionally we can first install a package, I’m testing with neovim as it doesn’t requires some dependencies and will not require a graphical shell in our container.

Beware: Know how your package management system behaves. In my case; apt install does not treat downloaded package files the the same as apt-get install. apt’s default is to remove *.deb files after successful install. apt-get keeps *.deb files within the archive. I discovered that the resident /var/cache/apt/archives/*.deb were due to previous calls to apt-get install not calls to apt install. In my case, apt’s behavior was ideal, as I didn’t need to worry about disabling the internal caching mechanism before starting. PS I like this article’s explanation here: removing-packages-and-configurations-with-apt-get.

Detailed Learning

First test we can install normally direct to the internet. I’d like to first clean the APT cache (APT includes apt, apt-get, apt-cache, apt-key etc.):

On your system you can find where APT is caching your deb’s by concatenating the output below: Dir + Dir::Cache + Dir::Cache::archives:

$ sudo apt-config dump | \
   grep '^Dir \|Dir::Cache \|Dir::Cache::archives 'Dir "/";
Dir::Cache "var/cache/apt";
Dir::Cache::archives "archives/";

Hence for my system the deb’s are saved at /var/cache/apt/archives/*.deb

If you’ve previously run apt you’ll likely find some files sitting there:

$ sudo ls /var/cache/apt/archives/*.deb
...
javascript-common_11_all.deb
libluajit-5.1-common_2.1.0~beta3+dfsg-5.1_all.deb
libluajit-5.1-2_2.1.0~beta3+dfsg-5.1_amd64.deb
python-trollius_2.1~b1-5_all.deb
libtermkey1_0.20-3_amd64.deb
libvterm0_0~bzr718-1_amd64.deb
libunibilium4_2.0.0-4_amd64.deb
python-msgpack_0.5.6-1build2_amd64.deb
python3-msgpack_0.5.6-1build2_amd64.deb
libmsgpackc2_3.0.1-3_amd64.deb
...

APT provides a cache cleaner, to see what it will clean:

$ sudo apt clean --dry-runDel /var/cache/apt/archives/* /var/cache/apt/archives/partial/*
Del /var/lib/apt/lists/partial/*
Del /var/cache/apt/pkgcache.bin /var/cache/apt/srcpkgcache.bin

Running sudo apt clean will render will render those directories empty.

Let’s use apt-get to install neovim, and we will see the cached *.deb files…

$ sudo apt-get install --yes neovim; #lots of output...
#Notes during install: 
#“Get” =http GET request
#“Selecting...unselected” = reinstalling packages of base distro
#“Unpacking” = extracting from .deb
#“Setting up” = auto generating .conf files moving binaries around.
#“Processing triggers” = loads files into OS. Prevents restart need.
$ sudo ls /var/cache/apt/archives/*.deb
...
/var/cache/apt/archives/javascript-common_11_all.deb
/var/cache/apt/archives/libjs-jquery_3.3.1~dfsg-3_all.deb
/var/cache/apt/archives/libjs-sphinxdoc_1.8.5-3_all.deb
/var/cache/apt/archives/libjs-underscore_1.9.1~dfsg-1_all.deb
...

Now we can uninstall the neovim program. In this case apt and apt-get will achieve the equivalent, you can use either:

$ sudo apt remove --purge --yes neovim; #purge removes neovim config
$ sudo apt autoremove — yes; #removes no longer reqd dependencies.
$ sudo ls /var/cache/apt/archives/*.deb; #should show no files.

Lets now do the same but with apt install. Fist I want to demonstrate that we get no apt cache with a standard apt install:

$ sudo apt install --yes neovim; # you'll see lots of "Get" actions.
#meaning that apt is reaching out to the internet
$ sudo ls /var/cache/apt/archives/*.deb
ls: cannot access '/var/cache/apt/archives/*.deb': No such file or directory
$ sudo apt remove --purge --yes neovim; #dependent binaries remain
$ sudo apt install --yes neovim; # this time only one "Get"
$ sudo apt remove --purge --yes neovim;
$ sudo apt -o APT::Keep-Downloaded-Packages="true" \
    install --yes neovim; #you will see a single download
#and Download rate summary:
#Get:1 http://au.archive.ubuntu.com/ubuntu eoan/universe 
#   amd64 neovim amd64 0.3.8-1 [1,263 kB]
#Fetched 1,263 kB in 1s (1,798 kB/s)
$ ls -l /var/cache/apt/archives/neovim_*.deb;  #file in APT cache.
-rw-r--r-- 1 root root 1263436 Jul 24  2019 /var/cache/apt/archives/neovim_0.3.8-1_amd64.deb;
$ sudo apt remove --purge --yes neovim; #now remove keeping cache.
$ sudo apt install --yes neovim; # this won't cache any files but 
# apt *will* use the existing neovim.deb file
# you will noticed *no* "Get" statement in the install logs!

note: apt’s optional directives can look like either:

sudo apt -o APT::Keep-Downloaded-Packages="true" \
    install --yes neovim;
sudo apt -o 'APT::Keep-Downloaded-Packages=true' \
    install --yes neovim;

You can override default config behavior and keep the *.deb files with an option value at install time or add a new config directly into the persistent config too

#optional
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
| sudo tee /etc/apt/apt.conf.d/01keep-debs

Putting this all together - how this works with Squid!

Great so now you know exactly how to ensure you haven’t inadvertently used APT cache, we can now focus on using the squid proxy. Important notes:

APT caching only works with http protocol. https is encrypted and cannot be inspected with Squid or any proxy. This is by design for clients to guarantee secrecy. Hence the apt application requires the http_proxy directive NOT the https_proxy directive. Squid cache should be cleaned and inspected.

Cleaning the squid cache

Squid’s cache cannot be reset on a running instance. Hence often Squid is load balanced to permit high availability. You can however permit hot config reload with squid -k reconfigure. I’ve not tried this, but it would be likely be very quick, requiring a new pre built cache_dir before the hot swap with reconfigure. In my case I will shutdown squid first. Further note — systemctl stop is a blocking command, systemctl start is non-blocking, so polling to wait for squid to start is required if automating this. I use:

#!/usr/bin/env bash
function process_wait(){
   proc=$1
   #status=$2 #active/inactive
   status='active'
   while [[ $(sudo systemctl is-active ${proc}) != ${status} ]]
   do
      sleep 1
      echo 'sleeping'
   done
}set -x
sudo systemctl stop squid
sudo systemctl status squid
sudo rm -rf /var/spool/squid/*
sudo squid -zSF #reset the index with "z"
sudo squid -k shutdown # I prefer starting with systemctl
sudo systemctl start squid
process_wait 'squid'
sudo systemctl status squid
sudo find /var/spool/squid/ -type f -ls #you should see no files
set +x

We are confident that by apt remove --purge and apt clean the APT cache is removed. Do this now before we start with apt via squid.

$ sudo apt remove --purge --yes neovim;
$ sudo apt clean; #empty the APT cache

Also by calling apt without the cache option directive, apt won’t store a *.deb file in the archive. Note again above that we use the http not https protocol directive. Also if we are to push the proxy config to a super user shell when we call apt, we must force sudo not to strip the http_config variable from the subprocess call hence we use sudo -E as it pull through the environment of the parent shell.

http_proxy=http://127.0.0.1:3128/ sudo -E apt install --yes neovim
Inspecting the tail of the access log (last line)
$ sudo tail /var/log/squid/access.log
#filenames truncated for neatness1586606592.437    334 127.0.0.1 TCP_MISS/200 1263816 
   GET http://.../neovim_0.3.8-1_amd64.deb - 
   HIER_DIRECT/202.158.214.106 application/x-troff-man

Importantly above you can see the TCP_MISS statement, which indicates Squid has seen the requested file but failed to find the file in the cache, and has retrieved it externally with a GET http protocol request. I recommend converting the long integer group prefix to a real timestamp with perl.

$ sudo cat /var/log/squid/access.log | \
   perl -p -e 's/^([0-9]*)/"[".localtime($1)."]"/e'
#filenames truncated for neatness[Sat Apr 11 20:03:12 2020].437    334 127.0.0.1 TCP_MISS/200 1263816 
   GET http://.../neovim_0.3.8-1_amd64.deb - 
   HIER_DIRECT/202.158.214.106 application/x-troff-man
audit the apt cache

Now lets check for the neovim_0.3.8–1_amd64.deb file in the APT cache, it should NOT exist.

$ ls /var/cache/apt/archives/
lock  partial/

But we know the file is 1263436 bytes and that Squid made a “Get” request. Hence is should appear in the Squid cache…

$ sudo find /var/spool/squid/ -type f -ls
   541991      4 -rw-r-----   1 proxy    proxy         144 Apr 12 12:58 /var/spool/squid/swap.state
   541990   1236 -rw-r-----   1 proxy    proxy     1263880 Apr 12 12:58 /var/spool/squid/00/00/00000000

Linux file command cannot determine that it is infact a debian package file, it simply identifies it a “data” but it is relatively the same size. Now lets remove the neovim application again (one way to do this could be to use md5sum signatures).

sudo apt remove --purge --yes neovim;
http_proxy=http://127.0.0.1:3128/ sudo -E apt install --yes neovim;

Calling an install again, APT reports during install:

...
Get:1 http://au.archive.ubuntu.com/ubuntu eoan/universe amd64 neovim amd64 0.3.8-1 [1,263 kB]
Fetched 1,263 kB in 0s (63.2 MB/s)
...

Although `apt reports that it reached out externally to retrieve the package, after inspecting the Squid access.log, we can see in fact that Squid had a cache “HIT” finding the file locally and forwarding it to the APT application.

$ sudo tail -n1 /var/log/squid/access.log
1586669427.684      2 127.0.0.1 TCP_HIT/200 1263825 
GET http://au.archive.ubuntu.com/ubuntu/pool/universe/n/neovim/neovim_0.3.8-1_amd64.deb - 
HIER_NONE/- application/x-troff-man

Nice! we hoped for either TCP_HIT (from squid cache files) or TCP_MEM_HIT (from squid owned system memory). So Squid has pulled a file directly from Squid cache, serving it to APT transparently. We can now make this setting permanent with:

cat <<EOF | sudo tee /etc/apt/apt.conf.d/50proxy
Acquire {
  HTTP::proxy "http://127.0.0.1:3128";
}
EOF

Outcome

We’ve now witnessed how to install, test and debug squid proxy settings with `apt. Next we will introduce the settings to a Docker environment. It is a bit simpler but requires a little Network knowledge.