Man-in-the-Middle Attacks with DPDK: Transparent Packet Modification

overview of the man in the mirror attack

The Data Plane Development Kit (DPDK) is an open-source software project with a powerful community of development contributors. It provides a set of libraries to accelerate packet processing workloads running on a variety of CPU architectures.

I tried creating something exciting application with DPDK. This article describes my DPDK application mitm-dpdk.

1. What mitm-dpdk does

mitm-dpdk is a sort of man-in-the-middle (mitm) attack implementation, which transparently modify packet contents. Note that this application is created for learning DPDK libraries and IP packet structure. You cannot use it as a real world implementation.

The following is features of mitm-dpdk.

  • mitm-dpdk is placed between two machines (client and server) with wired connection, like a switch. Thus the machine which mitm-dpdk is installed requires at least two NICs, the one is connected with the client and the one is the server. mitm-dpdk inspects & modifies packets traveling between client and server.
  • Replace packet contents with user-defined regular expression pattern. Currently only works under IPv4, TCP, and no encryption.
  • The packet modification is performed keeping L2 transparency, meaning the machine in the middle can be treated as just an ethernet cable so that client and server are not aware of the existence. mitm-dpdk achieves L2 transparency by recalculating TCP sequence (SEQ) and acknowledge (ACK) numbers, IPv4 checksum, and total packet length.
  • mitm-dpdk can handle multiple TCP connections by storing each stream information in hash table.

I gave it a name “man-in-the-middle attack”, but it could be rather a network security enhancement system. The function of contents modification and stream management can act as Deep Packet Inspection (DPI), which evaluates up to L7. It’s possible to provide security solutions such as detecting malicious traffic based on multiple stream analysis, and anonymize data containing private information. I like this usage better than “attacking”.

2. How it works

The implementation is based on L2 Forwarding which is official DPDK sample application.

Here shows main.cpp, the core part of the application. I would like to explain how it processes incoming packets.

if (ipv4_hdr->next_proto_id == IPPROTO_TCP) {
    tcp_hdr = (struct rte_tcp_hdr *)((unsigned char *)ipv4_hdr + sizeof(struct rte_ipv4_hdr));
    l4_len = (tcp_hdr->data_off & 0xf0) >> 2;
    content = (uint8_t *)((char *)tcp_hdr + l4_len);
    content_len = packet_len - l2_len - l3_len - l4_len;
    src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
    dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
    seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
    ack= rte_be_to_cpu_32(tcp_hdr->recv_ack);
 
    if (src_port > dst_port) { // C2S connection
      sprintf(str, "%" PRIu32, ipv4_hdr->src_addr);
      strcat(hash_value, str);
      sprintf(str, "%" PRIu32, ipv4_hdr->dst_addr);
      strcat(hash_value, str);
      sprintf(str, "%" PRIu16, src_port);
      strcat(hash_value, str);
      sprintf(str, "%" PRIu16, dst_port);
      strcat(hash_value, str);
      sprintf(str, "%" PRIu8, ipv4_hdr->next_proto_id);
      strcat(hash_value, str);
    } else { // S2C connection
      sprintf(str, "%" PRIu32, ipv4_hdr->dst_addr);
      strcat(hash_value, str);
      sprintf(str, "%" PRIu32, ipv4_hdr->src_addr);
      strcat(hash_value, str);
      sprintf(str, "%" PRIu16, dst_port);
      strcat(hash_value, str);
      sprintf(str, "%" PRIu16, src_port);
      strcat(hash_value, str);
      sprintf(str, "%" PRIu8, ipv4_hdr->next_proto_id);
      strcat(hash_value, str);
    }
    printf("hash: %s\n", hash_value);
    auto itr = umap.find(string(hash_value));
    if( itr != umap.end() ) { // stream is in umap
        stream = itr->second;
    } else { // new stream
        stream = new Stream();
        umap[string(hash_value)] = stream;
    }
 
    // Modify the packet content
    if (content_len > 0 && stream) {
        char *c = (char *)rte_calloc(0, content_len+1, sizeof(char), 0);
        rte_memcpy(c, (void *)content, content_len);
        c[content_len] = '\0';
 
        string str_content = string(c);
        string before = str_content;
        string after;
        for (int i=0; i<rules.size(); i++) {
            string exp = rules[i][0];
            string replace = rules[i][1];
 
            regex reg(exp);
            after = regex_replace(before, reg, replace);
            before = after;
        }
 
        cout << "before:'" << str_content << "'" << endl;
        cout << "after :'" << after << "'" << endl;
        diff = after.length() - str_content.length();
        cout << "diff: " << diff << endl;
 
        if (str_content.compare(after) != 0) { // the content is modified
            if (diff >= 0) { // expand the packet payload
                if (rte_pktmbuf_append(m, (uint16_t)(diff))) {
                    // overwrite the payload
                    rte_memcpy(content, &after[0], after.length());
                    // recalculate ipv4 total length
                    ipv4_hdr->total_length = (uint16_t)rte_cpu_to_be_16(rte_pktmbuf_pkt_len(m) - sizeof(struct rte_ether_hdr));
                } else {
                    cout << "No tailroom space" << endl;
                }
            } else if (diff < 0) { // shrink the packet payload
                uint16_t len = -diff;
                int ret = rte_pktmbuf_trim(m, len);
                if (!ret) {
                    // overwrite the payload
                    rte_memcpy(content, &after[0], after.length());
                    // recalculate ipv4 total length
                    ipv4_hdr->total_length = (uint16_t)rte_cpu_to_be_16(rte_pktmbuf_pkt_len(m) - sizeof(struct rte_ether_hdr));
                } else {
                    cout << "rte_pktmbuf_trim error" << endl;
                }
            }
            recalc_checksum = true;
        }
 
    }
 
    // Recalculate SEQ, ACK
    if (stream->C2S_modified_bytes != 0 || stream->S2C_modified_bytes != 0 ) {
        if (src_port > dst_port) { // C2S connection
            tcp_hdr->sent_seq = rte_cpu_to_be_32(seq + stream->C2S_modified_bytes);
            tcp_hdr->recv_ack = rte_cpu_to_be_32(ack - stream->S2C_modified_bytes);
        } else { // S2C connection
            tcp_hdr->sent_seq = rte_cpu_to_be_32(seq + stream->S2C_modified_bytes);
            tcp_hdr->recv_ack = rte_cpu_to_be_32(ack - stream->C2S_modified_bytes);
        }
        recalc_checksum = true;
    }
 
    // Update modified bytes
    if (diff != 0) {
        if (src_port > dst_port) { // C2S connection
            if (seq != stream->C2S_last_seq) { // if the packet is not TCP retransmission
                stream->C2S_modified_bytes += diff;
                cout << "diff so far: " << stream->C2S_modified_bytes << endl;
            }
        } else { // S2C connection
            if (seq != stream->S2C_last_seq) { // if the packet is not TCP retransmission
                stream->S2C_modified_bytes += diff;
                cout << "diff so far: " << stream->S2C_modified_bytes << endl;
            }
        }
        recalc_checksum = true;
    }
 
    // Update stream information
    if (src_port > dst_port) { // C2S connection
        stream->C2S_last_seq = seq;
    } else { // S2C connection
        stream->S2C_last_seq = seq;
    }
 
    // recalculate checksum
    if (recalc_checksum) {
        ipv4_hdr->hdr_checksum = 0;
        tcp_hdr->cksum = 0;
        ipv4_hdr->hdr_checksum = rte_ipv4_cksum(ipv4_hdr);
        tcp_hdr->cksum = rte_ipv4_udptcp_cksum(ipv4_hdr, tcp_hdr);
    }
line 360-393: Hash value calculation

This part calculates hash value (hash_value) of incoming TCP/IP packet to distinguish TCP connections. The hash value is string concatenation of “client IP + server IP + client port + server port + protocol id”. The possible hash value space is the same size of possible 5-tuple patterns, so collisions never happen.

line 394-401: Search hash table

Search stream information from hash table (umap) using the hash value we’ve just calculated. If it is found, it means we have seen the stream before. The stream information tells us how much data we have modified the payload of packets belong to the TCP connection so far. This information is necessary to recalculate SEQ and ACK number later. If the stream information is not found, then it means we encountered new TCP connection.

line 402-424: Edit packet contents

Extract payload and edit string based on regex rule file (regex.json). In addition, calculate how much data changed after text editing (diff).

line 425-451: Replace packet payload

Replace the packet payload with the string we have edited in previous step. Here uses DPDK API rte_pktmbuf_append()to extend the payload and rte_pktmbuf_trim() to shrink the palyload. Then ipv4_hdr->total_length = ... is called to recalculate total packet length.

line 452-463: Recompute SEQ and ACK number

Based on the number of bytes we have rewritten so far, we need to replace SEQ and ACK with new computed number to keep TCP consistency. The calculation depends the direction of the packet, but basic idea is canceling data differences as if the modification never happened. Let’s say we added 100 bytes on the message sent from client to server (C2S), then we have to add 100 to SEQ on C2S packet in order to track correct amount of data. Besides we must subtract 100 from ACK on S2C (reply) packet to cancel the data changes so that the client does not realize 100 bytes data was appended to the message it sent.

line 464-493: Post processing

It updates stream class to prepare for next incoming packet. For example, it stores the diff as cumulated value (stream->C2S_modified_bytes += diff;). Lastly, we recalculate the checksum by using API rte_ipv4_udptcp_cksum().

3. Thoughts

The transparent packet modification works and it is fun, but when it comes to applying for a cybersecurity in real world, there are things to consider. Such as

  • L7 contents are usually divided into multiple packets. We need to concatenate payloads to match regex pattern which is split in two packets.
  • mitm-dpdk needs a private key to decrypt SSL. The one solution is, once mitm-dpdk detect TCP SYN packet, request a server to send a private key. This requires modifying server-side implementation.
  • It might be good idea to have a router function because we could analyze much more TCP flows at the same time.

This page has introduced my DPDK application that transparently inspect L7 contents in the middle of network communication. We have seen features and how mitm-dpdk process streams and packets.

I hope you enjoyed this article. See you next time.

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です