Dealing with Network Port Abuse in Sockets in C++
How do you close a network socket such that the cost is lowest for the server and highest for the client? Specifically when a server open to the public receives thousands of DDOS hits, how can I close the connection such that the server pays the lowest cost and the client spam script pays the highest cost?
If you’re running on a linux kernel 3.5 or newer the answer is by misusing a new sockets API called TCP_REPAIR.
Some Background
I wrote the server that runs this site and a number of my other web projects. And it is open to the wild and woolly internet. This blog runs on WordPress which my server supports.
A recent vulnerability has been found in WordPress where an attacker can call in to WordPress’ XMLRPC implementation in /xmlrpc.php
. XMLRPC is used to implement the WordPress pingback feature. But it can be hijacked to make the WordPress server act as a proxy for a hackers DDOS attack on third site. Since your server is likely inside your network, it can also be used as a way of making arbitrary web requests inside your network, possibly reconfiguring web enabled routers. ( more here ).
Sure enough someone was routing thousands of DDOS requests through my server. It was easy enough to block the spammer by IP and to lock down requests to xmlrpc.php until WordPress figures this out.
When I detect a bad request from a banned IP or to a banned URL, I was simply closing the socket, like this:
close( mSocket );
When I close the connection this way the operating system terminates the TCP/IP session with a FIN packet that notifies the remote party that the connection has been closed.
Once the spam script detects the the closed connection it immediately opens a new one. So even after blocking the IP, the spammer was still making requests as fast as it could to my server even though every request is being blocked.
In addition each request creates a new socket that then “lingers” on my system for a few minutes, leaving a large number of closed sockets in TIME_WAIT state.
So the question is, can I do something to make every fraudulent request be as expensive as possible for the spammer at the lowest cost to me?
Closing Connections without FIN or RST
When you close a TCP/IP connection with close, it sends a FIN packet. If I close it any other normal way, for example with shutdown()
the kernel will send at least an RST packet which also notified the spam script.
The FIN packet causes the remote client to have a normal “connection closed” response from the socket API. The RST packet is what causes sockets to get the “connection reset by peer” response.
What I want to do is to simulate pulling the network cable. In other words to drop the connection without sending either one – making them have to wait around to see to realize we’re not answering… because we’re vindictive.
Using TCP_REPAIR
TCP_REPAIR is a new socket API designed to allow you to ‘freeze’ a socket, save its state, and reload the socket state on another process or even another system. Under normal usage the client would never know their connection was transferred elsewhere.
But we can abuse this API, we put the socket in repair mode, but don’t save its state and never restore it. When we close the socket in repair mode – it gets silently deleted.
We do it like this:
if ( abuse ) { // read some bytes from the spammer - to establish the connection u32 tries = 20; while ( tries ) { sleep( 100 ); char tmpBuf[32]; s32 readCount = recv( mSocket, &tmpBuf[0], 32, 0 ); if ( readCount > -1 ) break; tries--; } #ifdef TCP_REPAIR int aux = 1; if ( setsockopt( mSocket, SOL_TCP, TCP_REPAIR, &aux, sizeof( aux )) < 0 ) { reportError( "could not turn on repair mode" ); } #else // !TCP_REPAIR // no TCP REPAIR - best we can do is an abort close struct linger so_linger; so_linger.l_onoff = 1; so_linger.l_linger = 0; if ( setsockopt( mSocket, SOL_SOCKET, SO_LINGER, &so_linger, sizeof so_linger ) < 0 ) { reportError( "Cannot turn off SO_LINGER" ); } #endif // TCP_REPAIR } close( mSocket );
The code above works for abusive requests that have already begun. That is we’ve read the spammer’s request and decided it was fraud and TCP_REPAIR killed it.
But if you block requests by IP, right after connect, without first reading the socket, somehow the remote party is notified. They get an RST. Or probably something in the connection never quite completes and the remote system aborts the request almost immediately.
So we first read a few bytes from the hacker’s socket. In my case the socket is already in non-blocking mode. But if not you want to set the socket to non-block, or else you open yourself up to the hacker opening connection, but sending no packets and leaving your server hanging – like you plan to do to them. If after a few microseconds you don’t get a packet you shut them down anyway.
But if you read a few bytes from the socket, then the spam program is left waiting for a response from you that never comes.
Works like a charm, point your browser here:
Chrome at least will hang there for 5-10 seconds. Looking at my logs where I was getting 10 hits per second – he’s only able to hit me every 10 seconds or so. Load on my system from this: 0.
See ya sucker!
What if You Don’t Have TCP_REPAIR?
TCP_REPAIR is only available on Linux Kernels 3.5 and above. Below that the best you can do is a ‘dirty’ socket close. This is where instead of sending him a FIN you send him and RST. It will look to him like a valid connection was never established. To do this, you turn off SO_LINGER, to essentially break the socket connection close handshake, and then call close.
You can also terminate a connection and generate an RST by closing the socket with shutdown()
. But this is not as good because this way the socket remains on your system for a few minutes in TIME_WAIT state. Under a sustained attack your system can end up holding thousands of dead sockets.
Other Suboptimal Options
One suggested solution is to just not close the socket. This does essentially the same thing – with the socket open no FIN or RST is sent.
But this way the attacker still has the ability to saturate your system with open sockets. Each request creates a new socket that is left open. With DOS attack you eventually run out of sockets.
Then there is also the problem with managing the open sockets:
1. If I just don’t close it open sockets last forever. Cost for attacker: high – they get no FIN. Cost for me: higher. All my file descriptors eventually get used.
2. Spawn a thread per socket to sleep 10 minutes and then close the socket later. Cost for attacker: high – they get no FIN. Cost for me: higher. While I eventually do close the socket, for each request I have a socket used up for longer than the attacker does, and I have the overhead of a thread.
3. Spawn a thread that handles expiring all abused sockets. Cost for attacker: high – they get no fin. Cost for me: higher. Like 2, lots of sockets held open. Overhead of a single thread to manage it. Code complexity, annoyance.
Resources:
TCP connection repair | An article on the implementation of TCP_REPAIR |
TIME_WAIT and its design implications for protocols and scalable client server systems | An explanation of sockets TIME_WAIT |
[…] an earlier article I showed how to ban these using TCP_REPAIR. In that case we have to accept the connection first, before silently closing the connection. With […]