Experiment on TCP Hole Punching

I recently need to find a way to connect to a subversion server behind a NAT. I used to tunnel through a SSH server with public IP. It worked perfectly, but recently I lost access to the server. So I want to try TCP hole punching.

It’s not hard to find related resource online. I followed the approach described in the paper “Peer-to-Peer Communication Across Network Address Translators”. The basic idea is to let both peers do connect and listen on the same port. If the internet gateway sees an outgoing SYN packet to X, the gateway will allow subsequent packets from X. As a result, at least one of the SYN packet should punch trough the NAT.

Before this, we need to know the external IP and port of both peers. Fortunately, most NAT implementations always map the same internal IP/port to the same external IP/port. It’s known as “independent mapping”. Even better, most NAT will use the same external port as the internal port if it’s not occupied. It’s known as “port preserving”. To know the external IP/port, we can connect to a third server and let it tell us, just like STUN.

So I implemented the idea in Ford’s paper.

#include <stdio.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <netdb.h>

#define DIE(format,...) do {perror(NULL); printf(format, ##__VA_ARGS__); exit(1);} while(0)

int say_something (int sock)
{
	char buff[256];
	int len, flags;

	flags = fcntl(sock, F_GETFL);
	flags = flags & (~ O_NONBLOCK);
	if (fcntl(sock, F_SETFL, flags))
		DIE("fcntl() failed\n");

	snprintf(buff, sizeof(buff), "Hello. I'm %d", getpid());
	printf("sending %s\n", buff);
	if (send(sock, buff, strlen(buff) + 1, 0) != strlen(buff) + 1)
		DIE("send() failed\n");

	len = recv(sock, buff, sizeof(buff), 0);
	if (len <= 0)
		DIE("recv() failed\n");
	printf("received %s\n", buff);

	return 0;
}

// TODO address type, length...
int getaddr (struct sockaddr *addr, const char *host, const char *port)
{
	struct addrinfo hints, *res;

	memset(&hints, 0, sizeof(hints));
	hints.ai_family = AF_INET;
	hints.ai_socktype = SOCK_STREAM;
	hints.ai_protocol = 0;
	hints.ai_flags = AI_PASSIVE;

	if (getaddrinfo(host, port, &hints, &res))
		return -1;

	if (res == NULL)
		return -1;

	memcpy(addr, res->ai_addr, res->ai_addrlen);
	freeaddrinfo(res);
	return 0;
}

int main (int argc, char *argv[])
{
	int ssock, csock;
	struct sockaddr_in local_addr, remote_addr;
	fd_set rfds, wfds;
	struct timeval tv;
	int i;
	socklen_t len;

	if (argc != 4) {
		printf("Usage: %s localport remotehost remoteport\n", argv[0]);
		exit(0);
	}

	if (getaddr((struct sockaddr *)&local_addr, NULL, argv[1]))
		DIE("getaddr() failed\n");
	if (getaddr((struct sockaddr *)&remote_addr, argv[2], argv[3]))
		DIE("getaddr() failed\n");

	if ((ssock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0)
		DIE("socket() failed\n");
	if ((csock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0)
		DIE("socket() failed\n");

	i = 1;
	if (setsockopt(ssock, SOL_SOCKET, SO_REUSEADDR, &i, sizeof(int)))
		DIE("setsockopt() failed\n");
	if (setsockopt(csock, SOL_SOCKET, SO_REUSEADDR, &i, sizeof(i)))
		DIE("setsockopt() failed\n");

	if (bind(ssock, (const struct sockaddr *)&local_addr, sizeof(local_addr)))
		DIE("bind() failed\n");
	if (bind(csock, (const struct sockaddr *)&local_addr, sizeof(local_addr)))
		DIE("bind() failed\n");

	if (fork()) {
		close(csock);

		if (listen(ssock, 1))
			DIE("listen() failed\n");
		while (1) {
			len = sizeof(remote_addr);
			i = accept(ssock, (struct sockaddr *)&remote_addr, &len);
			if (i < 0) {
				perror("accept() failed.");
			} else {
				printf("accept() succeed.");
				return say_something(i);
			}
		}
	} else {
		close(ssock);
		srandom(getpid());

		for (i = 0; i < 3; i ++) {
			if (connect(csock, (const struct sockaddr *)&remote_addr, sizeof(remote_addr))) {
				int sleeptime = random() * 1000000.0 / RAND_MAX + 1000000.0;
				sleeptime = sleeptime << i;
				perror("connect() failed");
				if (i < 2) {
					printf("sleeping for %.2f sec to retry\n", sleeptime / 1000000.0);
					usleep(sleeptime);
				}
			} else {
				printf("connect() succeed");
				return say_something(csock);
			}
		}
		return 1;
	}
}

It worked. host1 and host2 have external IP 1.1.1.1 and 2.2.2.2 respectively. Both NAT preserve ports so if host1 binds on port 30000, the external port is also 30000.

host1$ ./biconn 30000 2.2.2.2 20000
connect() failed: Connection timed out
sleeping for 1.13 sec to retry
connect() succeed: Connection timed out
sending Hello. I'm 8151
received Hello. I'm 6629
host2$ ./biconn 20000 1.1.1.1 30000
connect() failed: Connection refused
sleeping for 1.68 sec to retry
connect() succeed: Connection refused
sending Hello. I'm 6629
received Hello. I'm 8151

I noticed an unexpected behaviour. accept() never succeeded in either peer. connect() succeed in both peers.

Is it possible for two peers to symmetrically “connect()” to each other? Is question is not related to NAT. The answer is yes. Find any computer networks text book and look for the TCP state diagram. It’s possible to go from the SYN_SENT state to the SYN_RECV state by receiving a SYN packet. Someone has asked the question before.

So I wondered if I can remove the listen() part in the code, and use only one socket in each peer. A problem with the previous approach (as mentioned here) is that it’s not possible to bind additional sockets on the port after listen().

So I did the second experiment. It’s much cleaner.

#include <stdio.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/select.h>
#include <netinet/in.h>

void die (const char *msg)
{
	perror(msg);
	exit(1);
}

int main (int argc, char *argv[])
{
	int sock;
	struct sockaddr_in addr;
	char buff[256];

	if (argc != 4) {
		printf("Usage: %s localport remotehost remoteport\n", argv[0]);
		exit(0);
	}

	sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
	if (sock < 0)
		die("socket() failed");

	memset(&addr, 0, sizeof(addr));
	addr.sin_family = AF_INET;
	addr.sin_addr.s_addr = htonl(INADDR_ANY);
	addr.sin_port = htons(atoi(argv[1]));
	if (bind(sock, (const struct sockaddr *)&addr, sizeof(addr)))
		die("bind() failed\n");

	memset(&addr, 0, sizeof(addr));
	addr.sin_family = AF_INET;
	addr.sin_addr.s_addr = inet_addr(argv[2]);
	addr.sin_port = htons(atoi(argv[3]));

	while (connect(sock, (const struct sockaddr *)&addr, sizeof(addr))) {
		if (errno != ETIMEDOUT) {
			perror("connect() failed. retry in 2 sec.");
			sleep(2);
		} else {
			perror("connect() failed.");
		}
	}

	snprintf(buff, sizeof(buff), "Hi, I'm %d.", getpid());
	printf("sending \"%s\"\n", buff);
	if (send(sock, buff, strlen(buff) + 1, 0) != strlen(buff) + 1)
		die("send() failed.");

	if (recv(sock, buff, sizeof(buff), 0) <= 0)
		die("recv() failed.");
	printf("received \"%s\"\n", buff);

	return 0;
}

It works. I wonder what’s the reason of doing listen(). Does it related to the way connection tracking is implemented in different type of NAT? Or does it related to the way TCP is implemented in different OS?

host1$ ./biconn1 20000 2.2.2.2 30000
connect() failed. retry in 2 sec.: Connection refused
sending "Hi, I'm 6566."
received "Hi, I'm 7600."
host2$ ./biconn1 30000 1.1.1.1 20000
connect() failed. retry in 2 sec.: Connection refused
connect() failed.: Connection timed out
connect() failed.: Connection timed out
sending "Hi, I'm 7600."
received "Hi, I'm 6566."

My objective is to connect to my subversion server in a NAT. Now, I still need a publicly accessible server to coordinate the hole punching. It basically works like this: In the subversion server I run a program with persistent connection to the public server. When I want to connect from outside, I can contact the public server, which then notifies my program in the subversion server. Then I can launch the TCP hole punching and get a TCP connection, which can then be used to tunnel the subversion connection.

Without possessing a public accessible server, other mechanisms can be used. I can think of the following mechanisms:

  • Online forum: Post the client’s external IP/port in a forum and have a program running in the subversion server to periodically check the forum.
  • DHT, e.g. the mainline bittorrent DHT: The server randomly generates a infohash, and “announce” itself to be downloading this infohash. The server then periodically queries for peers on the infohash. To do hole punching, the client also announces itself to be downloading it. The server sees a new peer joining, then both parties can do hole punching. The limitation is that two peers cannot exchange port information, thus they need to predetermine a particular port.
  • IRC bot
  • Public SIP registrar: It’s a bit overkill, but quite related, and well supported (plenty public servers and libraries).

I’m not sure if there is any existing tool for this purpose. Before IPv6 getting well established, there are going to be more and more servers behind NAT, so this is going to be handy. Please leave a comment if you know any.

Advertisements

Tags:

10 Responses to “Experiment on TCP Hole Punching”

  1. Ashford Nichols Says:

    In the context of your program, listen() would allow you to receive connection requests from any client, whereas connect() only allow you to receive connection request from one specific client. And also connection has the ability to send connection request to a server while listen() does not

  2. oosbud Says:

    I strongly recommend to have a look into this: http://doc.cacaoweb.org/misc/cacaoweb-and-nats/nat-behavioral-specifications-for-p2p-applications/
    cacaoweb implements tcp hole punching the way you describe (no listen(), only connect()). Port discovery occurs through the DHT.

    • wuyongzheng Says:

      Hi oosbud,
      Thanks for sharing the info. AFAIK, cacaoweb is a web service which implements tcp hole punching. It would be better if the hole punching component can be made standalone or at least opensource.

  3. Ovian Says:

    Great example, Thanks

  4. pumbo Says:

    Thanks for the example. Never thought it was so easy.

  5. Daniel Says:

    Thanks for the example, however it’s not working for me.
    Neither between two different computers, nor locally:

    host1$ ./biconn1 20000 127.0.0.1 30000
    connect() failed. retry in 2 sec.: Connection refused
    connect() failed. retry in 2 sec.: Connection refused
    connect() failed. retry in 2 sec.: Connection refused
    connect() failed. retry in 2 sec.: Connection refused
    connect() failed. retry in 2 sec.: Connection refused

    And in parallel:
    host1$ ./biconn1 30000 127.0.0.1 20000
    connect() failed. retry in 2 sec.: Connection refused
    connect() failed. retry in 2 sec.: Connection refused
    connect() failed. retry in 2 sec.: Connection refused
    connect() failed. retry in 2 sec.: Connection refused
    connect() failed. retry in 2 sec.: Connection refused

    Shouldn’t it be running fine locally?
    Thank you:
    Daniel

    • wuyongzheng Says:

      Here’s my suspension. I might be wrong. If running locally, the OS will quickly respond with DESTINATION-UNREASONABLE. The time window that the peer sends the connection message is too short.

  6. Dinkar Bhat Says:

    Hi Wuyongzheng,
    Thanks for the code snippet. I am fairly new to network programming and I’ve been trying with TCP hole punching for some time now and couldn’t figure out what exactly has to be done programmatically in C, although there are many papers and theories on the web.

    Couple of questions.
    1. I see that if the above (biconn) application is run a couple of times , on some occasions, it blocks indefinitely in the connect call. Any clues on this. Every time the application is run, I change the src and dest ports to avoid bind failure, on both the hosts.
    2. The spec/paper says that , if S is the server, when host1 tries to connect() to host2, it should use the same port that was used when it connected with server S. How can this be achieved. I have written a server application to which both the hosts are communicating fine. But when the connect socket ( to the peer ) is bound ( bind() call ) to the source port that was used to talk to S, it fails with “Address in use”. I have tried the socket LINGER option on the socket that binds to the source port, but it doesn’t heip.

    • Dinkar Bhat Says:

      Update:
      I found the solution to my problem (2.) above. But my program is still unable to communicate across peers.

      Your program was working for the case,
      client A — NAT — internet — client B

      but is failing everytime for the case,
      client A — NAT — internet — NAT — client B,

      I’m pretty sure that the NATs are not symmetric , since UDP hole punching is working fine.

  7. sree Says:

    Hi Wuyongzheng,

    Thanks for the example.

    My windows application connects to a torque cluster (linux) and runs jobs on it. I want to create a socket connection (tcp hole punching) from the windows machine to the job running on torque cluster. I performed the following two experiments.

    1) socket connection from windows to torque head node through tcp hole punching (c++) 2) socket connection from torque head node to computing node ( where the job is running)

    To hole punch from windows machine to job running on the torque computing node, i need to know the public ip of the computing node. The command ‘curl icanhazip.com’ on computing node gives ‘Unreachable network’ error.

    Is it possible that the computing nodes are not assigned external ip? or blocked to the external network?

    Is it possible to use ‘tcp hole punching’ without external ip?

    Any help is appreciated.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: