Make SSH use gpg-agent

ssh-agent

OpenSSH is the defacto "standard" for SSH clients and servers on Linux and macOS. (This is apparently true for newer versions of windows as well.)

The ssh program normally uses a program called ssh-agent to hold SSH secret keys in memory. The ssh-agent program actually performs the encryption operations necessary to authenticate SSH connections, without ssh needing to know the actual secret key.

On most systems, ssh-agent is started as part of each user's login process. When it starts, it creates a "Unix Domain Socket". The full pathname to this socket ends up being stored in an SSH_AUTH_SOCK environment variable, which ends up being inherited by other processes within the user's login session.

Unix Domain Sockets work the same way that a network socket works, but ...

  • Unix sockets can only be used to communicate with other processes on the same machine.

  • Instead of each endpoint being an IP address and port number, the endpoint is a filename on the local filesystem.

Programs like ssh, scp, and sftp use the SSH_AUTH_SOCK environment variable to find the agent. If this variable doesn't exist, ssh will not be able to use an agent, and will only be able to authenticate using passwords or secret key files stored on the local disk.

The protocol (or "language") that SSH clients use when talking to ssh-agent is fairly simple, although it doesn't seem to be widely documented. The best thing I've been able to find every time I've looked for it is an IETF draft document which "expired" in 2020 ... which doesn't make it any less valid, it just means that the document hadn't been updated for six months (which is probably a good thing, it means the document didn't need to be updated.)

macOS

macOS 10.5 and later set things up to start an ssh-agent process as part of each user's login process. The underlying mechanics are different for different macOS versions, and the filename of the Unix socket is randomly generated, but the result is that every process running as part of the user's login session, will inherit an SSH_AUTH_SOCK environment variable pointing to that Unix socket.

With macOS 10.15 and later, the SIP (System Integrity Protection) mechanism makes it difficult (and in later versions, impossible) to make macOS not start ssh-agent automatically.

Linux

Most Linux distributions do something similar, especially if the login session involves a GUI desktop environment. If it doesn't happen automatically, it's usually fairly simple to edit your "login scripts" (such as a .bashrc file) to either start an agent, or find an existing agent process, and export the SSH_AUTH_SOCK environment variable for you.

Note that I haven't needed to mess with this stuff in at least ten years, and I don't honestly remember any details about it.

gpg-agent

GnuPG has a program called gpg-agent which performs the same kind of in-memory caching, but for for PGP keys.

The gpg-agent program can be configured to open a unix socket and speak the ssh-agent protocol. If you do this, gpg-agent will be able to perform the same signing operations that ssh-agent does, using any of the following:

  • SSH secret key files (such as "id_rsa") from disk.
  • PGP authentication subkeys from your keyring.
  • PGP authentication subkeys stored on a smartcard, such as a YubiKey.

So what we want to do is make all SSH clients talk to gpg-agent instead of ssh-agent. SSH clients use the SSH_AUTH_SOCK environment variable to find the agent, so ...

If we make the SSH_AUTH_SOCK environment variable point to the Unix socket that gpg-agent opens, when an SSH client tries to talk to ssh-agent, it will actually be talking to gpg-agent.

Ultimately, we need to make the SSH_AUTH_SOCK variable to point to the Unix socket file that gpg-agent creates.

macOS

Back in 2018, I figured out how to stop macOS from starting the ssh-agent process, and how to make the login process set the SSH_AUTH_SOCK environment variable point to the socket created by gpg-agent. This worked for a while, but then SIP came along (and later APFS with its immutable filesystems) and that approach didn't work anymore.

Then I found this article, which explains how to "do it the other way around". Instead of trying to change what macOS does, we can replace the Unix socket file with a symbolic link, pointing to the Unix socket where gpg-agent is listening for connections from SSH clients.

This is so much simpler than what I had originally come up with.