Install and configure a Squid proxy on my laptop to serve cached responses to my native operating system’s calls for internet resources
speed up container builds During the start of Covid19 I was operating off a slow and low bandwidth internet connection. Doing repeated Docker container builds with a low bandwidth connection necessitated a brief investigation into installation of a local Squid proxy on my laptop. Unnecessarily calls to external repositories from running containers was causing an unnecessary bandwidth drain. Setting up local software repositories whilst solving the one use case, would need to be solved for every respective container’s package manager variant. Additionally I wanted direct calls to files like wget or curl externally to likewise be covered by caching repeated calls.
secure sockets causing caching issues This is the basic initial setup for Ubuntu 20.04, eventually I discovered the version in the official repositories was lacking required capability for HTTPs contect but for basic wget to HTTP endpoints (increasingly more difficult to utilize as almost everything is HTTPS at this point and APT that also HTTP without secure sockets files can be inspected and cached by Squid. APT uses cryptographic fingerprints to verify files, however HTTP remains in use for convenice for a number of reasons. Redhat’s RPM/Yum/DNF and Java’s MAVEN packaging systems use HTTPS so later in another post I discuss Squid’s use of packet inspection to permit caching.
This is primarily for bandwidth protection for container builds. However, I implemented a staged approach using
Since the later version of HTTP/1.1 (taken from here) Squid’s /var/log/squid/access.log
now gives a more detailed definition of how caching was achieved. Basic TCP_HIT
now is extended to TCP_REFRESH_UNMODIFIED
and so on, it’s also cached but verified with the server.
Welcome to HTTP/1.1. Those are all HTTP/1.1 revalidation requests updating the cached content before delivery to the client. While saving bandwidth in ways that HIT and MISS cannot.
Squid is a cache, not an archive. It self-updates the cache content as needed.
the UNMODIFIED are when the copy the Squid already has cached is not changed. No payload object is fetched from the server.
the MODIFIED are where both the Squid cached object is outdated. A replacment object is delivered by the server.
the 304 are when the client copy has not changed. So no payload is delivered from Squid to client.
the 200 are when the client copy is outdated. A replacment object is delivered by Squid.
Installing squid 4.8 on Ubuntu 19.10 was installed by:
sudo apt install --yes squid
Configuration: Squid’s CLI seems rather basic, the complexity lies in it’s config located at /etc/squid/squid.conf
Refreshing: Rather than restarting squid with sudo systemctl restart squid
these settings can be reloaded into the daemon without shutdown using sudo squid -k reconfigure
. Modern squid.service
systemd config files offer the convenience command sudo systemctl reload squid
to do same.
Initial Changes: Notable default configuration gotchas are:
cache_dir ufs /var/spool/squid 100 16 256
#if you don't have the directory already
sudo mkdir /var/spool/squid
sudo chown proxy:proxy /var/spool/squid
http_access allow localnet
Interrogating sudo netstat -letpn
will show:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
...
tcp6 0 0 :::3128 :::* LISTEN 0 1172651 12507/(squid-1)
...
This says it’s responding on “all the hosts’ network devices” at port 3128 on IPV4 & IPV6 protocols. This is convenient because the resident containers can’t see the loopback address of the lost that we would normally point to, but more on that later…
maximum_object_size 10 MB
#### Testing
To test that the file size limit is being overcome:
```sh
function test_download(){
http_proxy=http://127.0.0.1:3128/ wget \
--show-progress \
--directory-prefix=${HOME}/Downloads \
$1
}
test_download $FILENAME
#where filename is:
#http://deb.debian.org/debian/pool/main/p/python3.7/libpython3.7-stdlib_3.7.3-2%2bdeb10u1_amd64.deb(1.7MB will get TCP_REFRESH_UNMODIFIED/200)
#http://deb.debian.org/debian/pool/main/n/neovim/neovim-runtime_0.3.4-3_all.deb(3.3MB will likely get TCP_HIT/200)
#http://ports.ubuntu.com/pool/main/l/linux-signed/linux-image-5.3.0-26-generic_5.3.0-26.28_arm64.deb(9.3MB will always TCP_MISS/200 UNTIL you update the size)
Check the output by executing sudo tail -n1 /etc/squid/squid.conf
Update squid with sudo vim /etc/squid/squid.conf
. Then adding just after the port number setting of 3128: maximum_object_size 10 MB
.
$ sudo find /var/spool/squid -type f -ls
541990 4 -rw-r — — — 1 proxy proxy 288 Apr 11 18:05 /var/spool/squid/swap.state
541989 1696 -rw-r — — — 1 proxy proxy 1735262 Apr 11 17:55 /var/spool/squid/00/00/00000000
541991 9452 -rw-r — — — 1 proxy proxy 9676288 Apr 11 18:05 /var/spool/squid/00/00/00000002
541992 3332 -rw-r — — — 1 proxy proxy 3411670 Apr 11 17:55 /var/spool/squid/00/00/00000001
You should see three numbered files, that evaluate to the three files in Bytes.
Note we are not using an exported environment variable. We have just initialized it on the CLI just before the wget command (by way of http_proxy=http://127.0.0.1:3128/
before the wget
command), so we are not presently using the proxy by default yet. Next we will see it work with a Yum install locally.
Manually and specifically we are controlling use of squid’s caching. Not particularly useful yet, as Squid’s use needs to be made default across a number of actions. Goal is to direct requests to the proxy controlled “Gateway” port of 3128
to give it the opportunity to respond directly to requests rather than go out to the internet.