Make sure the SECOND and subsequent maven container builds consume cached library files from the local proxy
When I first embarked on configuring Maven with Docker, I was failing to create a cached layer in the container and thought that it was to do with my docker configuration: it was not. Pulling from the MVN repositories is considered non-deterministic, hence the docker build step to interrogate repositories every time this is called. The exception is the “offline” directive which will force mvn to work with the files it has. But if the Dockerfile
is changed in any way before the mvn
statement it will pull down all the *.jar
files again, and it’s takes about 15 minutes. Normally Maven does a good job of caching JAR files locally, and the caching problem could be similated of (i.e. the process of re-interrogating external Maven repositories) by removing the local user’s Maven repo cache deleting file from ~/.m2/repository/* .
Note: I’m going to refer to a helper function called log_format. It’s simply an alias for perl -p -e ’s/^([0–9]*)/"[".localtime($1)."]"/e’. I recommend you put the following in your ~/.bashrc and re source the file, this provides a human readable date format rather than seconds past epoch.
$ tail ~/.bashrc
#...
alias log_format="perl -p -e 's/^([0-9]*)/\"[\".localtime(\$1).\"]\"/e'"
#...
Lets start from scratch with a Dockerfile using an image curated by the Apache Maven team:
$ docker run --interactive --tty --rm --name=mvn \
maven:3.6.3-jdk-11 bash
root@7a7807de0fa2:/# exit
$
Lets now wrap this up as a Dockerfile of our own that we will add to:
$ cat ./Dockerfile
FROM maven:3.6.3-jdk-11
CMD ["bash"]
$ docker build --no-cache --tag mvn-proxy:0.1 .
$ docker run --interactive --tty --rm --name=mvn mvn-proxy:0.1 bash
root@1af26683abcb:/project# exit
$
Lets start hooking it up to our freshly minted Squid 4.11 service on the host. First on the host identify the host’s IP that provides the gateway to docker instances:
$ ip addr show dev docker0 | \
sed -n '/inet/s/\s*inet \([^ ]\+\) .*/\1/p'
172.17.0.1/16
$ sudo netstat -letpn
Proto Local-Address State PID/Program name
tcp6 :::3128 LISTEN 8532/(squid-1)$
So given our Squid instance is IP protocols of IPV4 & IPV6 on all devices out of port 3128
, and that the docker gateway is 172.17.0.1
, the appropriate setting for http_proxy
and https_proxy
will be http://172.17.0.1:3128
. Note that in docker environment arguments are not available at build time, so need to be parsed on the CLI as --build-env http_proxy=http://172.17.0.1:3128
declared it in the container Docker file as ARG http_proxy
.
$ cat ./Dockerfile
FROM maven:3.6.3-jdk-11
ARG http_proxy
RUN apt-get update && \
apt-get install --yes vim && \
mkdir -p /project/api
CMD ["bash"]$ docker build --build-arg http_proxy=http://172.17.0.1:3128 \
--no-cache --tag mvn-proxy:0.1 . # build this with proxy
$ sudo tail -n1 /opt/squid-4.11/var/log/access.log | log_format
[Wed Apr 22 14:50:17 2020].355 225 172.17.0.2
TCP_MISS/200 1281190
GET http://deb.debian.org/.../vim_8.1.0875-5_amd64.deb
- HIER_DIRECT/151.101.106.133 application/x-debian-package
Good: Extracting the last line of the Squid access.log you can see that it’s pulling the vim package install via the proxy, TCP_MISS says it didn’t get it from the squid cache and needed to reach out to the Debian Repos. Running the rebuild again without docker cache shows a repeated TCP_MISS so we need to start configuring squid-4.11’s squid.conf file to start locally caching.
Lets return to us the capability we obtained from the original changes we made to squid in the previous blob post here.
$ sudo vim /opt/squid-4.11/etc/squid.conf
#...
# Uncomment and adjust the following to add a disk cache directory.
cache_dir ufs /opt/squid-4.11/var/swap 100 16 256
maximum_object_size 10 MB
#...
$ sudo chgrp proxy /opt/squid-4.11/var/swap
$ sudo chmod g+w /opt/squid-4.11/var/swap
$ sudo systemctl reload squid-4.11
$ docker build --build-arg http_proxy=http://172.17.0.1:3128 \
--no-cache --tag mvn-proxy:0.1 .
First we can prove that we’re now caching files:
$ sudo find /opt/squid-4.11/var/swap/ -type f -ls
...
1844755 4 -rw-r----- 1 proxy proxy 864 Apr 22 15:29 /opt/squid-4.11/var/swap/swap.state
1844764 5644 -rw-r----- 1 proxy proxy 5775428 Apr 22 15:29 /opt/squid-4.11/var/swap/00/00/00000009
1844765 1252 -rw-r----- 1 proxy proxy 1281274 Apr 22 15:29 /opt/squid-4.11/var/swap/00/00/0000000A
...
Running the docker build again should now access the files in the cache:
$ sudo tail -n1 /opt/squid-4.11/var/log/access.log | log_format
[Wed Apr 22 15:37:39 2020].582 19 172.17.0.2
TCP_REFRESH_UNMODIFIED/200 1281228
GET http://deb.debian.org/.../vim_8.1.0875-5_amd64.deb
- HIER_DIRECT/151.101.106.133 application/x-debian-package
Good, so it’s using the file from cache, but checking the internet for confirmation that it hasn’t changed. We will come back to this REFRESH statement later.
Now this was not a lesson on APT but on MAVEN. Maven’s Java repositories are held at HTTPS sites. This is good because it means that artifacts are guaranteed by cryptography to have been provided by the source they say it is. IE not intercepted by a man-in-the-middle, and provided alternately. Lets set Maven up. We will copy a project file called a pom.xml file. And pull down some dependencies from external Maven repositories.
<?xml version="1.0" encoding="UTF-8"?>
<project
xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion><groupId>com.foo</groupId>
<artifactId>bar</artifactId>
<version>0.0.1-SNAPSHOT</version><dependencies><dependency>
<groupId>com.google.collections</groupId>
<artifactId>google-collections</artifactId>
<version>1.0</version>
</dependency></dependencies><build>
<directory>lib</directory>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<outputDirectory>
${project.build.directory}
</outputDirectory>
</configuration>
</plugin>
</plugins>
</build></project>
In the Dockerfile we shall remove the apt
statements, and zero out the access.log
for clarity. Note that instead of mvn clean
or mvn package
I’m using mvn dependency:go-offline
which is a convenience function to pull down all external dependencies so that building packages can be done without internet connectivity:
$ echo '' | sudo tee /opt/squid-4.11/var/log/access.log #clear log$ cat ./Dockerfile
FROM maven:3.6.3-jdk-11 as api-mvn-init
ARG http_proxy
#RUN apt-get update && \
# apt-get install --yes vim
RUN mkdir -p /project/api
WORKDIR /project
COPY pom.xml /project/api/
RUN mvn --file /project/api dependency:go-offline$ docker build --no-cache --tag mvn-proxy:0.1 .$ sudo cat /opt/squid-4.11/var/log/access.log # produces nada!
Now we will direct the Maven client to use the proxy by amending the user’s maven settings in the settings.xml
file. The ~/.m2/settings-docker.xml
file has settings only references the /usr/share/maven/ref/repository
. This needs to be extended and renamed to ~/.m2/settings.xml
.
<settings
xmlns="http://maven.apache.org/SETTINGS/1.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0
https://maven.apache.org/xsd/settings-1.0.0.xsd">
<localRepository>/usr/share/maven/ref/repository</localRepository>
<proxies>
<proxy>
<id>example-proxy</id>
<active>true</active>
<protocol>https</protocol>
<host>172.17.0.1</host>
<port>3128</port>
</proxy>
</proxies></settings>
You do need one per protocol, so if you want http as well, another line entry is required. Now lets again run docker build and check squid’s access.log
.
docker build --no-cache --tag mvn-proxy:0.1 .
Now after checking back with the host’s logs we are now happily producing squid access.log output indicating that the docker container is happily passing its mvn external requests through the host’s proxy:
sudo cat /opt/squid-4.11/var/log/access.log | log_format
...
[Thu Apr 23 08:52:25 2020].117 42385 172.17.0.2
TCP_TUNNEL/200
8504288 CONNECT repo.maven.apache.org:443 -
HIER_DIRECT/151.101.40.215 -
[Thu Apr 23 08:52:25 2020].117 42387 172.17.0.2
TCP_TUNNEL/200
738504 CONNECT repo.maven.apache.org:443 -
HIER_DIRECT/151.101.40.215 -
[Thu Apr 23 08:52:25 2020].117 42389 172.17.0.2
TCP_TUNNEL/200
555829 CONNECT repo.maven.apache.org:443 -
HIER_DIRECT/151.101.40.215 -
[Thu Apr 23 08:52:25 2020].117 42388 172.17.0.2
TCP_TUNNEL/200
1232867 CONNECT repo.maven.apache.org:443 -
HIER_DIRECT/151.101.40.215 -
[Thu Apr 23 08:52:25 2020].117 133748 172.17.0.2
TCP_TUNNEL/200
2520133 CONNECT repo.maven.apache.org:443 -
HIER_DIRECT/151.101.40.215 -
...
Note: mvn does not require docker build
to pass a --build-arg https_proxy=
environment variable here, as it’s hard coded into the settings file and will be used regardless of the container environment.
Note that although we have now been able to get apt-get and mvn processes running through the host’s squid proxy, the TCP_TUNNEL being logs against mvn requests states that squid is permitting a cryptographically secure tunnel through to the target repo. This tunnel is unable to be inspected or cached — squid simply acts and a traffic director. To permit https request caching the user needs to permit squid to decrypt the request, repackage it and on-send it. Excluding user permission, this is the very definition of a man-in-the-middle attack on yourself, and from this point the squid administrator becomes responsible for content provided to the docker client, because package accessibility, timeliness and freshness guarantees of the HTTPS standard is now voided, and passed to the squid admin — serious business, you need to trust the software and the admin. Compiling your code from source from the squid website is part one of this trust assurance.
When a squid server becomes trusted clients will permit https packets to be signed by squid. Communication to squid is encrypted with squid’s public key, which squid can decrypt inspect and onsend. We need to create both these keys, the pubic key becomes the squid proxy’s certificate.
#on the host
VERSION='4.11'
#need to use '-E' with sudo to pass VERSION to sudo processes...$ sudo -E mkdir /opt/squid-${VERSION}/certs
$ cd /opt/squid-${VERSION}/certs
$ sudo openssl req -new -newkey rsa:2048 -nodes -x509 -sha256 \
-extensions v3_ca -days 365 \
-keyout squid-ca-key.pem \
-out squid-ca-cert.pem
-subj "/C=AU/ST=WA/L=Perth/O=D2I Pty Ltd/OU=Innovation/CN=squid.d2i.net.au/emailAddress=innovation@d2i.net.au"
$ sudo cat squid-ca-cert.pem squid-ca-key.pem | \
sudo tee squid-ca-cert-key.pem$ sudo -E chown -R proxy:proxy /opt/squid-${VERSION}/certs
As yet we have not used this incarnation of squid beyond the functionality that we originally had from the Ubuntu 19.10 distributed squid version. The two new compilation directives only become useful from this point on. Obtaining --enable-ssl-crtd
functionality permits secure storage of SSL keys and certs, the defauld location is ${squid-swap-dir}/ssl_db
In our case as we compiled with:
VERSION=‘4.11’ –prefix=/opt/squid-${VERSION} –with-swapdir=${prefix}/var/swap
Then the location is for the ssl database /opt/squid-4.11/var/swap/ssl_db
. Once again it’s important that /opt/squid-4.11/var/swap
is writable for proxy
user. We have already done this above, but we also need to make sure that when we create this database, we create it as the proxy
user because if created as root
, squid will fail read/write to it, and refuse to start.
sudo -Eu proxy /opt/squid-${VERSION}/lib/security_file_certgen -c \
-s /opt/squid-${VERSION}/var/swap/ssl_db -M 16MB
Certs are associated to the serviced port within the squid.conf
against the allocated http_port
or https_port
directive, not directly pushed to this ssl_db
. We now need to make the appropriate changes to the squid.conf
file.
$ sudo cat /opt/squid-4.11/etc/squid.conf
#...
http_port 3128 \
ssl-bump \
generate-host-certificates=on \
dynamic_cert_mem_cache_size=4MB \
cert=/opt/squid-4.11/certs/squid-ca-cert-key.pem
sslcrtd_program /opt/squid-4.11/lib/security_file_certgen \
-s /opt/squid-4.11/var/swap/ssl_db -M 16MB
acl step1 at_step SslBump1
ssl_bump peek step1
ssl_bump bump all
ssl_bump splice all
#...
$ sudo -u proxy /opt/squid-4.11/sbin/squid -k parse
$ sudo systemctl reload squid-4.11
$ sudo sustemctl status squid-4.11squid-4.11.service
- Squid Web Proxy Server
Loaded: loaded
(/etc/systemd/system/squid-4.11.service;
disabled; vendor preset: enabled)
Active: active (running)
since Thu 2020-04-23 12:15:12 AWST; 6s ago
Docs: man:squid(8)
Process: 8194 ExecStartPre=/opt/squid-4.11/sbin/squid \
--foreground -z (code=exited, status=0/SUCCESS)
Process: 8210 ExecStart=/opt/squid-4.11/sbin/squid -sYC \
(code=exited, status=0/SUCCESS)
Main PID: 8211 (squid)
Tasks: 9 (limit: 4915)
Memory: 15.7M
CGroup: /system.slice/squid-4.11.service
├─8211 /opt/squid-4.11/sbin/squid -sYC
├─8213 (squid-1) --kid squid-1 -sYC
├─8221 (security_file_certgen) -s /.../ssl_db -M 16MB
├─8222 (security_file_certgen) -s /.../ssl_db -M 16MB
├─8224 (security_file_certgen) -s /.../ssl_db -M 16MB
├─8225 (security_file_certgen) -s /.../ssl_db -M 16MB
├─8228 (security_file_certgen) -s /.../ssl_db -M 16MB
├─8230 (logfile-daemon) /.../access.log
└─8231 (unlinkd)Apr 23 12:15:13 t450 squid[8213]: 0 Objects expired.
Apr 23 12:15:13 t450 squid[8213]: 0 Objects cancelled.
Apr 23 12:15:13 t450 squid[8213]: 0 Duplicate URLs purged.
Apr 23 12:15:13 t450 squid[8213]: 0 Swapfile clashes avoided.
Apr 23 12:15:13 t450 squid[8213]: Took 0.01 sec (3279 objects/sec).
Apr 23 12:15:13 t450 squid[8213]: Beginning Validation Procedure
Apr 23 12:15:13 t450 squid[8213]: Completed Validation Procedure
Apr 23 12:15:13 t450 squid[8213]: Validated 38 Entries
Apr 23 12:15:13 t450 squid[8213]: store_swap_size = 31964.00 KB
Apr 23 12:15:14 t450 squid[8213]: storeLateRelease: 0 objects
We now require our clients to trust this new “Authority”. Hence we will load the squid certificate into the containers’ Java Keystore, this will inform the maven process to trust squid certificates spliced in it’s responses to maven requests. This certificate must get passed to the container and loaded into the JRE’s cacerts file.
$ cp /opt/squid-4.11/certs/squid-ca-cert.pem .
$ cat ./Dockerfile
FROM maven:3.6.3-jdk-11 as api-mvn-init
RUN mkdir -p /project/api
WORKDIR /project
COPY pom.xml /project/api/
COPY settings.xml /root/.m2/
COPY squid-ca-cert.pem /tmp/
RUN keytool -v -alias mavensrv -import \
-file /tmp/squid-ca-cert.pem \
-storepass changeit \
-trustcacerts -noprompt -cacerts
RUN mvn --file /project/api dependency:go-offline
$ docker build --no-cache --tag mvn-proxy:0.1 .
sudo tail -n1 /opt/squid-4.11/var/log/access.log | log_format
[Thu Apr 23 12:34:14 2020].579 208 172.17.0.2
TCP_MISS/200 645
GET https://repo.maven.apache.org/.../{filename}{jar|sha1|pom}
- HIER_DIRECT/151.101.40.215 text/plain
We will now see in the access log a different advisory with regard to mavens requests. Rather than a few TCP_TUNNEL
events logged, with inspection, Squid can now “bump” and inspect and cache the *.sha1
, *.pom
and *.jar
files. We presently have a lot of TCP_MISS
statements, indicating that Squid couldn’t pull the file from the cache, as it didn’t exist, but it could inspect and know the file. We can also prove that we are caching files with an inspection of many new files in squid’s cache directory:
sudo find /opt/squid-4.11/var/swap/ -type f | \
wc -l # produces a line count of over 700 new files
Running this build again will produce instead produce a build time of
time docker build --no-cache --tag mvn-proxy:0.1 .
...
real 2m21.653s
user 0m0.476s
sys 0m0.454s
Some improvement but not a great one. The problem stems from some rather firm refresh headers the get spat out of Maven’s default settings, to refuse intermediate caching and to force reloading/refresh refer here. These basically say to any intermediate proxy under no circumstances do I want you cache, or store anything, and if you do I consider it “stale” the second it hits your store:
Cache-control: no-cache
Cache-store: no-store
Pragma: no-cache
Expires: 0
Accept-Encoding: gzip
However since you have told Java that you absolutely trust everything that comes from the Squid proxy now, Maven can now be forced to eat it. Now Squid can step it up. You officially take responsibility of WWW content, and now apply the nuclear settings:
offline_mode on #never revalidate from the net
ignore-reload #disrespect client request to check the external copies
ignore-no-store #discrespect client request to not store
ignore-private #disrespect client request not to pull from caches
$ cat squid.conf
#place these settings before the first refresh_pattern directive:
#...
offline_mode on
refresh_pattern (\.jar$|\.pom$\|.sha1$) 1440 20% 10080 \
ignore-reload ignore-no-store ignore-private
#...
$ sudo -Eu proxy /opt/squid-${VERSION}/sbin/squid -k parse
$ sudo systemctl reload squid-4.11.service
$ time docker build --no-cache --tag mvn-proxy:0.1 .
...
real 0m14.105s
user 0m0.391s
sys 0m0.446s
Boom! 14 Seconds for a Full Refresh and Maven Build!
This is where you want to get to, however this is a pretty brutal caching override, and I don’t recommend these unqualified caching settings, but this does achieve an important goal of allowing a full container bebuild but still enabling a bandwidth reduction and speedup from an intermediate private caching proxy, in my case on my development machine. A full review of squid’s refresh_pattern options should be done to ensure that your cache does not become horribly stale, some packages can be cached for eternity as they are versioned. Other files like repo metadata, might need to be refreshed each time because you may miss important security patches or verion upgrades. Consider this wisely Cleaning cache.
If dropping the cache Squid must be shut down, so if you have an HA requirement best do a rolling refresh. We can delete the swap directory contents but we need to rebuild the ssl_db
first, before starting.
$ sudo -E rm -rf /opt/squid-${VERSION}/var/swap/*
$ sudo -Eu proxy /opt/squid-${VERSION}/lib/security_file_certgen -c
-s /opt/squid-${VERSION}/var/swap/ssl_db -M 16MB
$ sudo -E systemctl start squid-${VERSION}
$ time docker build --no-cache --tag mvn-proxy:0.1 . #initial store
...
real 2m20.783s
...$ time docker build --no-cache --tag mvn-proxy:0.1 . #no check
...
real 0m14.489s
...
Hope you enjoy, squid is the most full featured open source implementation of the HTTP caching standard, there are UI’s that sit over the top and there are many other parts to it but what you’ve seen is a subset that can help you as a developer speed up your develoment cycles. Happy coding!