The world of proxies is a weird one... I've spent a lot of time working on various proxies. There are quirks in each different protocol, some more, some less bewildering, so in this article I'll explain the HTTP and SOCKS proxy protocols from my experience (which is lacking, but I've still implemented a much bigger subset of those protocols than most proxy servers implement).
SOCKS is a special protocol for proxies. It's rather nice to implement because of how simple it is. But even then you quickly face a "quirk" - it actually allows you to listen for TCP traffic! Most SOCKS implementations don't offer this feature, but it's an interesting one indeed.
The first version of the protocol... no, the first version still relevant today is SOCKS4. It's very simple and only supports IPv4 - you simply send the byte "4" for protocol version, the command (whether to open a TCP connection, or establish a TCP port binding, that is, listen for TCP connections from the server), the server port (obviously in network endian - that is, big endian), and the server IP. I'm not sure how you're supposed to set the server port in "listen" requests - that isn't specified in the standard, so I guess you should zero it out. After that, the "user ID" follows, which is simply a null-terminated string specifying the... "user ID". Additionally, there's identd - a program that could run on your computer, accessible on port 113, which could identify the username associated with your open SOCKS4 connection - this could additionally be used by the SOCKS4 server. Well, this may still be useful for corporate networks, but definitely not for use over the internet. The SOCKS4 protocol is definitely showing its age.
VN | CD | DSTPORT | DSTIP | USERID | |
---|---|---|---|---|---|
BYTES | 1 | 1 | 2 | 4 | N |
VALUE | 4 | 1=connect, 2=bind | Destination port | Destination IPv4 | Null-terminated ID |
Then, the server replies with a success (or error), and in case of a listen request it also tells the client the address it's listening on. In case of a connection, the IP/port fields are present as well, but must be ignored by the client.
VN | CD | DSTPORT | DSTIP | |
---|---|---|---|---|
BYTES | 1 | 1 | 2 | 4 |
VALUE | 0 (reply ver) | 0x5A=ok, 0x5B-0x5D=errors | Listen port | Listen IP |
After that, the data is simply forwarded between the server and the client.
As you can probably tell, this is quite limited, despite the weird listen feature (well, this was made in a simpler time, when NAT wasn't a given, so some protocols relied on the client being able to listen for requests). An extension "SOCKS4a" was proposed - by setting the destination IP to "0.0.0.x" (where x is non-zero, to distinguish from the "unset" IP), you can attach null-terminated domain name after the user ID.
DOMAIN | ||
---|---|---|
BYTES | 8+N | M |
VALUE | SOCKS4 header with 0.0.0.x addr | Null-terminated domain |
Still, this is really hacky, and still limited. That's why SOCKS5 was introduced - a new version of the protocol.
SOCKS5 has similar semantics to SOCKS4. However, it has native authentication support. That's why the very first request from the client is the protocol version (5), and the list of "authentication methods" supported by the client. There is a lot of methods, but the most commonly used ones are "no authentication" (0) and "username/password authentication" (2).
VER | NMETHODS | METHODS | |
---|---|---|---|
BYTES | 1 | 1 | NMETHODS |
VALUE | 5 | Auth method count | Auth methods |
The server replies with its preferred authentication method out of those supported by the client:
VER | METHOD | |
---|---|---|
BYTES | 1 | 1 |
VALUE | 5 | Auth method |
In case username/password authentication is required, the client then sends an "authentication header", with the auth header version (1), and the username and password themselves.
VER | ULEN | UNAME | PLEN | PASSWD | |
---|---|---|---|---|---|
BYTES | 1 | 1 | ULEN | 1 | PLEN |
VALUE | 1 | Username length | Username | Password length | Password |
Followed by the reply:
VER | STATUS | |
---|---|---|
BYTES | 1 | 1 |
VALUE | 1 | Status code (0=success, anything else=error) |
Remember, this round-trip is skipped for no-auth requests!
Other authentication methods may be more sophisticated than just "username/password". In that case, the SOCKS5 protocol allows encrypting or otherwise encapsulating all traffic that follows authentication.
Finally, before explaining the protocol itself, I need to explain how socket addresses are encoded in the SOCKS format, so I don't have to repeat myself. The first byte is the address type, and then there are three options:
ATYP | IPV4 | PORT | |
---|---|---|---|
BYTES | 1 | 4 | 2 |
VALUE | 1 | IPv4 address | Port |
ATYP | DLEN | DOMAIN | PORT | |
---|---|---|---|---|
BYTES | 1 | 1 | DLEN | 2 |
VALUE | 3 | Domain length | Domain | Port |
ATYP | IPV6 | PORT | |
---|---|---|---|
BYTES | 1 | 16 | 2 |
VALUE | 4 | IPv6 address | Port |
I like how domain names have the length encoded in the first byte
instead of being null-terminated - it's much easier to parse than
SOCKS4. Also, not all proxy servers support domain name resolution,
so sometimes proxies which support it are labeled "SOCKS5h" - so if you
want CURL to pass domain names instead of IP addresses, use socks5h://
schema instead of socks5://
.
Now that this is out of the way, the actual protocol is simple. After authorization (if it was necessary in the first place), a connection request is self. It's pretty similar to SOCKS4, but this time there's an additional command - not just TCP connect and TCP bind, but also UDP associate! This means you can send DNS and QUIC (HTTP/3) traffic over a fully implemented SOCKS proxy.
VER | CMD | RSV | DSTADDR | |
---|---|---|---|---|
BYTES | 1 | 1 | 1 | ? |
VALUE | 5 | 1=connect, 2=bind, 3=udp | 0 | Destination address |
Then the reply is very similar to SOCKS4:
VER | STATUS | RSV | BNDADDR | |
---|---|---|---|---|
BYTES | 1 | 1 | 1 | ? |
VALUE | 5 | 0=ok, 1-8=errors | 0 | Bound address, unused for connect cmd |
And then the data gets forwarded.
That's not all - there's also UDP! In case of UDP, the server opens a UDP socket and sends its address in BNDADDR. The client is then expected to send UDP datagrams to that address. The SOCKS proxy will transmit them to the server, and send the server's datagrams back to the client.
But rather than sending plain datagrams, the client must encapsulate them as follows:
RSV | FRAG | DSTADDR | DATA | |
---|---|---|---|---|
BYTES | 2 | 1 | ? | The rest of the datagram |
VALUE | 0 | Fragment number | Destination address | The data itself |
Did you notice anything weird? You should! Didn't the client already send DSTADDR in the first TCP request? Why does it have to send it alongside each datagram? That's because for UDP, the first DSTADDR is allowed to be zeroed out, so it's effectively just a suggestion! The DSTADDR that actually matters is the one sent in the datagrams.
Another interesting fact is that you lose from 10 to 262 bytes of the datagram, so the effective size of the datagram you can send to the SOCKS server is lower. In order to be able to send larger datagrams, UDP fragmentation is available as an optional SOCKS5 feature (that most SOCKS servers don't support... though most SOCKS servers don't support UDP at all). if FRAG is non-zero, then the 7 lowest bits of FRAG specify the fragment number (it doesn't necessarily start with one, but can't overflow in the same sequence - fragment 1 can't follow fragment 127), and the highest bit specifies "end-of-fragment-sequence" - that is, end of "fragment sequence", not "end of fragment" sequence - this bit being set means this is the last datagram out of this sequence.
In turn, the server uses the exact same format for sending datagrams back to the client (funnily enough this means the client may have to support domain resolution and UDP fragmentation if the server sends such packets), but instead of destination address it writes the source address in the DSTADDR field.
But there's a problem - when should the server close the UDP socket? After all, UDP is a stateless protocol, so unlike TCP where there is a clear indication of the stream being closed, in UDP there's no such thing.
In this case, the TCP stream where the UDP port association request has been sent must be kept opened for as long as the UDP socket needs to stay open. When the TCP stream is closed, the server closes all related UDP sockets as well.
Evidently, SOCKS5 protocol requires quite a few roundtrips (though with a well-behaving server you can just send all the data in advance, expecting the server to tell you it's ok or bail out). There is a SOCKS6 protocol draft that reduces the number of required roundtrips, but it's just that - a draft. Additionally, there are some draft extensions for SOCKS5 - but, again, they aren't official standards so they aren't usually implemented by any clients.
HTTP is a weird proxy protocol, because it wasn't designed for this purpose, and because the combination of different layers into a single protocol is just weird (also, it doesn't support UDP). Well, you will see for yourself.
At its core, it's a very simple protocol. Instead of sending the following:
GET /index.html HTTP/1.1
Host: example.org
Accept: text/html
...you send the following:
GET https://example.org/index.html HTTP/1.1
Host: example.org
Accept: text/html
...or the following:
GET https://example.org/index.html HTTP/1.1
Accept: text/html
Proxy-Authorization: Basic <base64-encoded credentials>
...and the proxy server simply replies with the origin server's response:
HTTP/1.1 200 OK
Content-Length: 123
...content of index.html
It may also be sent over TLS, just like you can send regular HTTP requests over TLS (HTTPS) - in that case it's called a HTTPS proxy.
Notice anything weird?
Weird thing number one - "Proxy-Authorization". As you can see, you pass the headers intended for the proxy alongside the headers intended for the website. There are a number of such headers:
Basic <base64
data>
, where <base64 data>
is the username and password joined with
a colon, all encoded with base64 (username:password
).Weird thing number two - notice that little "https://"? Yes - http proxies must support encryption! They can't just pass the data to the server, they have to encrypt it if the client requests that. While this means they can be used to make legacy software compatible with modern TLS, it still increases the implementation complexity.
Weird thing number three - look at the Host header. It's there in the direct request, it's there in the first proxied request, but it isn't there in the second proxied request. That's right - it may or may not be sent by the client, but the proxy server is required to ignore it and instead use the host specified in the very first line of the header.
Weird thing number four - there is no way to distinguish proxy-generated
errors from origin-generated errors! If the client gets 500 Internal
Server Error
, it may be the origin server's response - or it may be the
proxy server's response, there's no way to know!
Weird thing number five - no encryption, the proxy may snoop on the connection (although sometimes it's desired). This is actually fine because of the existence of the HTTP method CONNECT:
CONNECT example.org:443 HTTP/1.1
Proxy-Authorization: Basic ...
This method allows you to transmit arbitrary TCP data over HTTP - including HTTPS. Many HTTP proxies only allow CONNECT to port 443.
Now, back to Proxy-Connection.
First, what is "Connection"? It specifies whether multiple HTTP requests
can be sent over a single TCP connection (since TCP and especially TLS
handshakes have quite a bit of overhead). Something like this, with >
signifying a request and <
signifying a response:
>GET /index.html HTTP/1.1
>Host: example.org
>Accept: text/html
>Connection: keep-alive
>
<HTTP/1.1 200 OK
<Content-Length: 123
<Connection: keep-alive
<
<...content of index.html
>GET /favicon.ico HTTP/1.1
>Host: example.org
>Accept: image/avif,image/webp,*/*
>Connection: keep-alive
>
<HTTP/1.1 200 OK
<Content-Length: 456
<Connection: keep-alive
<
<...content of favicon.ico
Now, how does this apply to proxies? At first you may think - isn't sending Connection to the final server in the chain enough? Does the proxy really have to support keep-alive connection? Wouldn't something like this work?
>GET http://example.org/index.html HTTP/1.1
>Accept: text/html
>Connection: keep-alive
>
<HTTP/1.1 200 OK
<Content-Length: 123
<Connection: keep-alive
<
<...content of index.html
>GET /favicon.ico HTTP/1.1
>Host: example.org
>Accept: image/avif,image/webp,*/*
>Connection: keep-alive
>
<HTTP/1.1 200 OK
<Content-Length: 456
<Connection: keep-alive
<
<...content of favicon.ico
And... yeah, this would probably be enough, especially because
standards-conforming HTTP servers are required to reply to
http://abc/xyz
as well as /xyz
. However, I'm not sure this is what
actually happens in real life (sorry, I haven't dug that deeply into
HTTP proxy client implementations - I'm not sure just how little of HTTP
you can get away with implementing to have a fully compatible HTTP proxy),
and either way, keep-alive support on the proxy side is a mandatory
feature - if you don't support it, some clients won't work with your
proxy!
Let me explain. As you may remember, there's the header
Proxy-Authorization
. If it isn't provided, the proxy may reply with
status code 407 (authorization required).
That's all fine and good - but... there is a proxy client that just refuses to send the credentials to the proxy server. That is - Mozilla Firefox (or at least the FoxyProxy extension). I was wondering why it did that.
Turns out that even if the proxy server sends 407, it doesn't remember that 407. It will only remember to start sending the credentials if the proxy keeps the connection alive - even if the proxy says it supports HTTP/1.0, but not HTTP/1.1. To visualize it, this is the only way the proxy server can force the client to use authorization:
>GET http://example.org/index.html HTTP/1.1
>Accept: text/html
>Proxy-Connection: keep-alive
>
<HTTP/1.1 407 Proxy Authentication Required
<Proxy-Authenticate: Basic
<
<
>GET http://example.org/index.html HTTP/1.1
>Accept: text/html
>Proxy-Connection: keep-alive
>Proxy-Authorization: Basic ...
>
<HTTP/1.1 200 OK
<Content-Length: 123
<Connection: keep-alive
<
<...content of index.html
(yes, there are three newlines after the 407 header - the first newline signifies the end of the Proxy-Authenticate header, the second newline signifies the end of the entire HTTP header, and the third newline signifies the end of the response payload)
This is such a mess! The whole world of proxies is an underdocumented mess. Hope you enjoyed this semi-article-semi-rant, and hope this article may help other people with implementing proxy servers. This is just the protocol side of proxies - there are other important and challenging aspects in implementing them.
Have any comments, questions, feedback? You can click here to leave it, publicly or privately!