Analysis of the sockaddr and sockaddr_in structures in Socket programming

FwfiZF0acAAyLU8

1. Introduction#

In Socket programming, we often use the sockaddr_in structure to build socket information.

struct sockaddr_in serv_addr;
memset(&serv_addr, 0, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = inet_addr(ip);
serv_addr.sin_port = htons(port);

Let's take a look at the source code of the sockaddr_in structure:

struct sockaddr_in {
    short sin_family; // Address Family, AF_INET
    u_short sin_port; // 16-bit TCP/UDP port number, network byte order
    struct in_addr sin_addr; // 32-bit IP address, network byte order
    char sin_zero[8]; // Not used, can be used for padding
};

We notice that in line 4, we do not directly use the s_addr field to represent an IP address, but instead it is nested within a structure sin_addr.

So what are the benefits of this?

2. Analysis#

On Unix platforms, the in_addr structure is defined as:

typedef uint32_t in_addr_t;
struct in_addr {
    in_addr_t s_addr; // 32-bit IPV4 address, network byte order
};

On Windows platforms, the in_addr structure is defined as:

struct in_addr {
    union {
        struct {
            u_char s_b1, s_b2, s_b3, s_b4;
        } S_un_b;
        struct {
            u_short s_w1, s_w2;
        } S_un_w;
        u_long S_addr;
    } S_un;
};

As we can see, the way the s_addr field is handled is different on different platforms, so this design ensures platform compatibility.
This explains why we see the s_addr field wrapped in the in_addr structure in the sockaddr_in structure instead of directly using this field.

3. Analysis of the Union in in_addr#

On Windows platforms, the in_addr structure uses a Union type to represent the s_addr field, which represents different parts of the IPV4 address using 4 bytes, 2 16-bit integers, or 1 32-bit integer.

So when we initialize the in_addr field:

serv_addr.sin_addr.s_addr = inet_addr(ip);

We can use the aforementioned 3 types of Union to interpret the IPV4 address.

4. sockaddr Structure#

struct sockaddr{
    sa_family_t sin_family; // Address Family, the address type
    char sa_data[14]; // IP address and port number
};

struct sockaddr_in{
    sa_family_t sin_family; // Address Family, the address type
    uint16_t sin_port; // 16-bit port number
    struct in_addr sin_addr; // 32-bit IP address
    char sin_zero[8]; // Not used, usually filled with 0
};

struct sockaddr_in6 {
    sa_family_t sin6_family; // Address type, value is AF_INET6
    in_port_t sin6_port; // 16-bit port number
    uint32_t sin6_flowinfo; // IPv6 flow information
    struct in6_addr sin6_addr; // Specific IPv6 address
    uint32_t sin6_scope_id; // Interface scope ID
};

It can be observed that sockaddr, sockaddr_in, and sockaddr_in6 have the same length, but sockaddr combines the IP address and port number together, while the latter two are derived types of the former.

So why don't we directly pass in the IP:Port format?
Because the API does not provide functions to parse IP and Port, and the original sockaddr is inconvenient to use, which is why the latter two were created.

However, when using it, for example:

bind(serv_sock, (struct sockaddr*)&serv_addr, sizeof(server_addr)))

We use type punning (i.e., type casting) to call the above function, so that both sockaddr_in and sockaddr_in6 can be used interchangeably.

Type punning: It refers to the technique of accessing the same block of memory with different types in C/C++, thereby effectively changing the type of the storage space, i.e., obtaining a certain bit pattern by changing the type of the variable.
There are many ways to perform type punning, such as using Union and type casting, as well as officially sanctioned methods like memcpy.

However, using type punning may lead to strict aliasing issues, so it needs to be used with caution.

Strict aliasing: It is an optimization feature in C/C++, which means that accessing an object with a type different from another type is absolutely not allowed. It can effectively avoid optimization errors and ensure the correctness of operations.

The reason why it can be used here is that: These two structures have the same length, so there is no loss of bytes or extra bytes when performing type casting.