Does #pragma pack(n) apply to all the structures in a source code or it has to be applied to each structure separately ? It's not clear from the manual. In one place it says: "You can use #pragma pack(n) to make sure that any structures with unaligned data are packed." In other: "This pragma aligns members of a structure to the minimum of n"
The protocol layer may specify big endian. And not only that - the standard does define the location of the different bits.
For a normal PC-class program, that normally just means that an IPv4 address stored in a 32-bit integer must be processed with htonl() and ntohl() to make sure that the number ends up as the expected <n>.<n>.<n>.<n>.
But in the end, it's up to the driver layer and the compiler/library to make sure that you get valid structure data.
In your example, ip_hl and ip_v fits in a single byte. So if you look, everything but the 8-bit fields are already 16-bit aligned. And there are 12 bytes when you reach the two structures so they are 32-bit aligned.
The network card sends out the data byte-by-byte. So an 8-bit field isn't a problem to send and receive on the other end. It's fields that are larger than a single byte that are problematic since the network card can send them out byte-by-byte but the meaning of the high and low bytes can be swapped. That is why the standard have defined a "network byte order" and you have functions to perform conditional byte-reversals of 16-bit and 32-bit integers. So the code calls these functions without knowing if any byte-reversal will take place or not - the runtime library will do the required work.
But no byte swap is needed for an array of bytes - and your examples can be seen as two arrays of two bytes each.