Setup caching for apt
through a squid
proxy.
Local APT usage with a Proxy
We now have a proxy that can cache, conditionally upon providing any proxy enabled command like wget
with the http_proxy
value on calling it or exporting it beforehand. As a Debian distribution, Ubuntu’s package manager apt
is also able to use the proxy as well. apt
likewise obeys the http_proxy
variable, and will respond to its use the same as we did with wget
.
Optionally we can first install a package, I’m testing with neovim
as it doesn’t requires some dependencies and will not require a graphical shell in our container.
Beware: Know how your package management system behaves. In my case; apt install
does not treat downloaded package files the the same as apt-get install
. apt
’s default is to remove *.deb
files after successful install. apt-get
keeps *.deb
files within the archive. I discovered that the resident /var/cache/apt/archives/*.deb
were due to previous calls to apt-get install
not calls to apt install
. In my case, apt
’s behavior was ideal, as I didn’t need to worry about disabling the internal caching mechanism before starting. PS I like this article’s explanation here: removing-packages-and-configurations-with-apt-get.
First test we can install normally direct to the internet. I’d like to first clean the APT cache (APT includes apt
, apt-get
, apt-cache
, apt-key
etc.):
On your system you can find where APT is caching your deb’s by concatenating the output below: Dir + Dir::Cache + Dir::Cache::archives:
$ sudo apt-config dump | \
grep '^Dir \|Dir::Cache \|Dir::Cache::archives 'Dir "/";
Dir::Cache "var/cache/apt";
Dir::Cache::archives "archives/";
Hence for my system the deb’s are saved at /var/cache/apt/archives/*.deb
If you’ve previously run apt
you’ll likely find some files sitting there:
$ sudo ls /var/cache/apt/archives/*.deb
...
javascript-common_11_all.deb
libluajit-5.1-common_2.1.0~beta3+dfsg-5.1_all.deb
libluajit-5.1-2_2.1.0~beta3+dfsg-5.1_amd64.deb
python-trollius_2.1~b1-5_all.deb
libtermkey1_0.20-3_amd64.deb
libvterm0_0~bzr718-1_amd64.deb
libunibilium4_2.0.0-4_amd64.deb
python-msgpack_0.5.6-1build2_amd64.deb
python3-msgpack_0.5.6-1build2_amd64.deb
libmsgpackc2_3.0.1-3_amd64.deb
...
APT provides a cache cleaner, to see what it will clean:
$ sudo apt clean --dry-runDel /var/cache/apt/archives/* /var/cache/apt/archives/partial/*
Del /var/lib/apt/lists/partial/*
Del /var/cache/apt/pkgcache.bin /var/cache/apt/srcpkgcache.bin
Running sudo apt clean
will render will render those directories empty.
Let’s use apt-get
to install neovim
, and we will see the cached *.deb
files…
$ sudo apt-get install --yes neovim; #lots of output...
#Notes during install:
#“Get” =http GET request
#“Selecting...unselected” = reinstalling packages of base distro
#“Unpacking” = extracting from .deb
#“Setting up” = auto generating .conf files moving binaries around.
#“Processing triggers” = loads files into OS. Prevents restart need.
$ sudo ls /var/cache/apt/archives/*.deb
...
/var/cache/apt/archives/javascript-common_11_all.deb
/var/cache/apt/archives/libjs-jquery_3.3.1~dfsg-3_all.deb
/var/cache/apt/archives/libjs-sphinxdoc_1.8.5-3_all.deb
/var/cache/apt/archives/libjs-underscore_1.9.1~dfsg-1_all.deb
...
Now we can uninstall the neovim
program. In this case apt
and apt-get
will achieve the equivalent, you can use either:
$ sudo apt remove --purge --yes neovim; #purge removes neovim config
$ sudo apt autoremove — yes; #removes no longer reqd dependencies.
$ sudo ls /var/cache/apt/archives/*.deb; #should show no files.
Lets now do the same but with apt install
. Fist I want to demonstrate that we get no apt
cache with a standard apt install
:
$ sudo apt install --yes neovim; # you'll see lots of "Get" actions.
#meaning that apt is reaching out to the internet
$ sudo ls /var/cache/apt/archives/*.deb
ls: cannot access '/var/cache/apt/archives/*.deb': No such file or directory
$ sudo apt remove --purge --yes neovim; #dependent binaries remain
$ sudo apt install --yes neovim; # this time only one "Get"
$ sudo apt remove --purge --yes neovim;
$ sudo apt -o APT::Keep-Downloaded-Packages="true" \
install --yes neovim; #you will see a single download
#and Download rate summary:
#Get:1 http://au.archive.ubuntu.com/ubuntu eoan/universe
# amd64 neovim amd64 0.3.8-1 [1,263 kB]
#Fetched 1,263 kB in 1s (1,798 kB/s)
$ ls -l /var/cache/apt/archives/neovim_*.deb; #file in APT cache.
-rw-r--r-- 1 root root 1263436 Jul 24 2019 /var/cache/apt/archives/neovim_0.3.8-1_amd64.deb;
$ sudo apt remove --purge --yes neovim; #now remove keeping cache.
$ sudo apt install --yes neovim; # this won't cache any files but
# apt *will* use the existing neovim.deb file
# you will noticed *no* "Get" statement in the install logs!
note: apt’s optional directives can look like either:
sudo apt -o APT::Keep-Downloaded-Packages="true" \
install --yes neovim;
sudo apt -o 'APT::Keep-Downloaded-Packages=true' \
install --yes neovim;
You can override default config behavior and keep the *.deb
files with an option value at install time or add a new config directly into the persistent config too
#optional
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
| sudo tee /etc/apt/apt.conf.d/01keep-debs
Great so now you know exactly how to ensure you haven’t inadvertently used APT cache, we can now focus on using the squid proxy. Important notes:
APT caching only works with
http
protocol.https
is encrypted and cannot be inspected with Squid or any proxy. This is by design for clients to guarantee secrecy. Hence theapt
application requires thehttp_proxy
directive NOT thehttps_proxy
directive. Squid cache should be cleaned and inspected.
Squid’s cache cannot be reset on a running instance. Hence often Squid is load balanced to permit high availability. You can however permit hot config reload with squid -k reconfigure.
I’ve not tried this, but it would be likely be very quick, requiring a new pre built cache_dir
before the hot swap with reconfigure. In my case I will shutdown squid
first. Further note — systemctl stop
is a blocking command, systemctl start
is non-blocking, so polling to wait for squid to start is required if automating this. I use:
#!/usr/bin/env bash
function process_wait(){
proc=$1
#status=$2 #active/inactive
status='active'
while [[ $(sudo systemctl is-active ${proc}) != ${status} ]]
do
sleep 1
echo 'sleeping'
done
}set -x
sudo systemctl stop squid
sudo systemctl status squid
sudo rm -rf /var/spool/squid/*
sudo squid -zSF #reset the index with "z"
sudo squid -k shutdown # I prefer starting with systemctl
sudo systemctl start squid
process_wait 'squid'
sudo systemctl status squid
sudo find /var/spool/squid/ -type f -ls #you should see no files
set +x
We are confident that by apt remove --purge
and apt clean
the APT cache is removed. Do this now before we start with apt
via squid
.
$ sudo apt remove --purge --yes neovim;
$ sudo apt clean; #empty the APT cache
Also by calling apt
without the cache
option directive, apt
won’t store a *.deb
file in the archive. Note again above that we use the http
not https
protocol directive. Also if we are to push the proxy config to a super user shell when we call apt
, we must force sudo not to strip the http_config
variable from the subprocess call hence we use sudo -E
as it pull through the environment of the parent shell.
http_proxy=http://127.0.0.1:3128/ sudo -E apt install --yes neovim
$ sudo tail /var/log/squid/access.log
#filenames truncated for neatness1586606592.437 334 127.0.0.1 TCP_MISS/200 1263816
GET http://.../neovim_0.3.8-1_amd64.deb -
HIER_DIRECT/202.158.214.106 application/x-troff-man
Importantly above you can see the TCP_MISS
statement, which indicates Squid has seen the requested file but failed to find the file in the cache, and has retrieved it externally with a GET http protocol request. I recommend converting the long integer group prefix to a real timestamp with perl.
$ sudo cat /var/log/squid/access.log | \
perl -p -e 's/^([0-9]*)/"[".localtime($1)."]"/e'
#filenames truncated for neatness[Sat Apr 11 20:03:12 2020].437 334 127.0.0.1 TCP_MISS/200 1263816
GET http://.../neovim_0.3.8-1_amd64.deb -
HIER_DIRECT/202.158.214.106 application/x-troff-man
apt
cacheNow lets check for the neovim_0.3.8–1_amd64.deb file in the APT cache, it should NOT exist.
$ ls /var/cache/apt/archives/
lock partial/
But we know the file is 1263436 bytes and that Squid made a “Get” request. Hence is should appear in the Squid cache…
$ sudo find /var/spool/squid/ -type f -ls
541991 4 -rw-r----- 1 proxy proxy 144 Apr 12 12:58 /var/spool/squid/swap.state
541990 1236 -rw-r----- 1 proxy proxy 1263880 Apr 12 12:58 /var/spool/squid/00/00/00000000
Linux file command cannot determine that it is infact a debian package file, it simply identifies it a “data” but it is relatively the same size. Now lets remove the neovim application again (one way to do this could be to use md5sum signatures).
sudo apt remove --purge --yes neovim;
http_proxy=http://127.0.0.1:3128/ sudo -E apt install --yes neovim;
Calling an install again, APT reports during install:
...
Get:1 http://au.archive.ubuntu.com/ubuntu eoan/universe amd64 neovim amd64 0.3.8-1 [1,263 kB]
Fetched 1,263 kB in 0s (63.2 MB/s)
...
Although `apt reports that it reached out externally to retrieve the package, after inspecting the Squid access.log, we can see in fact that Squid had a cache “HIT” finding the file locally and forwarding it to the APT application.
$ sudo tail -n1 /var/log/squid/access.log
1586669427.684 2 127.0.0.1 TCP_HIT/200 1263825
GET http://au.archive.ubuntu.com/ubuntu/pool/universe/n/neovim/neovim_0.3.8-1_amd64.deb -
HIER_NONE/- application/x-troff-man
Nice! we hoped for either TCP_HIT
(from squid cache files) or TCP_MEM_HIT
(from squid owned system memory). So Squid has pulled a file directly from Squid cache, serving it to APT transparently. We can now make this setting permanent with:
cat <<EOF | sudo tee /etc/apt/apt.conf.d/50proxy
Acquire {
HTTP::proxy "http://127.0.0.1:3128";
}
EOF
We’ve now witnessed how to install, test and debug squid
proxy settings with `apt. Next we will introduce the settings to a Docker environment. It is a bit simpler but requires a little Network knowledge.