2020-04-11 Docker Engine with Proxy 1/3

Docker Engine with a Caching Proxy (Squid) — 1 of 3

Goal

Install and configure a Squid proxy on my laptop to serve cached responses to my native operating system’s calls for internet resources

Background

speed up container builds During the start of Covid19 I was operating off a slow and low bandwidth internet connection. Doing repeated Docker container builds with a low bandwidth connection necessitated a brief investigation into installation of a local Squid proxy on my laptop. Unnecessarily calls to external repositories from running containers was causing an unnecessary bandwidth drain. Setting up local software repositories whilst solving the one use case, would need to be solved for every respective container’s package manager variant. Additionally I wanted direct calls to files like wget or curl externally to likewise be covered by caching repeated calls.

secure sockets causing caching issues This is the basic initial setup for Ubuntu 20.04, eventually I discovered the version in the official repositories was lacking required capability for HTTPs contect but for basic wget to HTTP endpoints (increasingly more difficult to utilize as almost everything is HTTPS at this point and APT that also HTTP without secure sockets files can be inspected and cached by Squid. APT uses cryptographic fingerprints to verify files, however HTTP remains in use for convenice for a number of reasons. Redhat’s RPM/Yum/DNF and Java’s MAVEN packaging systems use HTTPS so later in another post I discuss Squid’s use of packet inspection to permit caching.

Detailed Learning

Method

This is primarily for bandwidth protection for container builds. However, I implemented a staged approach using

  1. The host’s CLI — wget
  2. my host’s package manager — APT
  3. a container’s APT tool

Testing nomenclature

Since the later version of HTTP/1.1 (taken from here) Squid’s /var/log/squid/access.log now gives a more detailed definition of how caching was achieved. Basic TCP_HIT now is extended to TCP_REFRESH_UNMODIFIED and so on, it’s also cached but verified with the server.

Welcome to HTTP/1.1. Those are all HTTP/1.1 revalidation requests updating the cached content before delivery to the client. While saving bandwidth in ways that HIT and MISS cannot.

Squid is a cache, not an archive. It self-updates the cache content as needed.

  • the UNMODIFIED are when the copy the Squid already has cached is not changed. No payload object is fetched from the server.

  • the MODIFIED are where both the Squid cached object is outdated. A replacment object is delivered by the server.

  • the 304 are when the client copy has not changed. So no payload is delivered from Squid to client.

  • the 200 are when the client copy is outdated. A replacment object is delivered by Squid.

Initial Squid Setup

Installing squid 4.8 on Ubuntu 19.10 was installed by:

sudo apt install --yes squid

Configuration: Squid’s CLI seems rather basic, the complexity lies in it’s config located at /etc/squid/squid.conf

Refreshing: Rather than restarting squid with sudo systemctl restart squid these settings can be reloaded into the daemon without shutdown using sudo squid -k reconfigure. Modern squid.service systemd config files offer the convenience command sudo systemctl reload squid to do same.

Initial Changes: Notable default configuration gotchas are:

  • Caching to disk is disabled by default — you need to uncomment:
cache_dir ufs /var/spool/squid 100 16 256
  • if the directory doesn’t exist you might need to:
#if you don't have the directory already
sudo mkdir /var/spool/squid
sudo chown proxy:proxy /var/spool/squid
  • Proxy usage disabled other than for localhost — you need to uncomment:
http_access allow localnet

Interrogating sudo netstat -letpn will show:

 Active Internet connections (only servers)
 Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
 ...
 tcp6 0 0 :::3128 :::* LISTEN 0 1172651 12507/(squid-1)
 ...

This says it’s responding on “all the hosts’ network devices” at port 3128 on IPV4 & IPV6 protocols. This is convenient because the resident containers can’t see the loopback address of the lost that we would normally point to, but more on that later…

  • Initially the maximum file size to cache is 4MB — you need to uncomment and modify the following to the maximum you’re likely to need cached.
maximum_object_size 10 MB

#### Testing

To test that the file size limit is being overcome:

```sh
function test_download(){
   http_proxy=http://127.0.0.1:3128/ wget \
      --show-progress \
      --directory-prefix=${HOME}/Downloads \
      $1
}
test_download $FILENAME
#where filename is:
#http://deb.debian.org/debian/pool/main/p/python3.7/libpython3.7-stdlib_3.7.3-2%2bdeb10u1_amd64.deb(1.7MB will get TCP_REFRESH_UNMODIFIED/200)
#http://deb.debian.org/debian/pool/main/n/neovim/neovim-runtime_0.3.4-3_all.deb(3.3MB will likely get TCP_HIT/200)
#http://ports.ubuntu.com/pool/main/l/linux-signed/linux-image-5.3.0-26-generic_5.3.0-26.28_arm64.deb(9.3MB will always TCP_MISS/200 UNTIL you update the size)

Check the output by executing sudo tail -n1 /etc/squid/squid.conf

Update squid with sudo vim /etc/squid/squid.conf. Then adding just after the port number setting of 3128: maximum_object_size 10 MB.

Test Squid Cache Directory:

$ sudo find /var/spool/squid -type f -ls
 541990 4 -rw-r — — — 1 proxy proxy 288 Apr 11 18:05 /var/spool/squid/swap.state
 541989 1696 -rw-r — — — 1 proxy proxy 1735262 Apr 11 17:55 /var/spool/squid/00/00/00000000
 541991 9452 -rw-r — — — 1 proxy proxy 9676288 Apr 11 18:05 /var/spool/squid/00/00/00000002
 541992 3332 -rw-r — — — 1 proxy proxy 3411670 Apr 11 17:55 /var/spool/squid/00/00/00000001

You should see three numbered files, that evaluate to the three files in Bytes.

Note we are not using an exported environment variable. We have just initialized it on the CLI just before the wget command (by way of http_proxy=http://127.0.0.1:3128/ before the wget command), so we are not presently using the proxy by default yet. Next we will see it work with a Yum install locally.

Outcome

Manually and specifically we are controlling use of squid’s caching. Not particularly useful yet, as Squid’s use needs to be made default across a number of actions. Goal is to direct requests to the proxy controlled “Gateway” port of 3128 to give it the opportunity to respond directly to requests rather than go out to the internet.