[pcre-dev] [Bug 1049] Add support for UTF-16

Top Page
Delete this message
Author: Craig Silverstein
Date:  
To: pcre-dev
Old-Topics: [pcre-dev] [Bug 1049] New: Add support for UTF-16
Subject: [pcre-dev] [Bug 1049] Add support for UTF-16
------- You are receiving this mail because: -------
You are on the CC list for the bug.

http://bugs.exim.org/show_bug.cgi?id=1049




--- Comment #30 from Craig Silverstein <csilvers@???> 2011-11-16 20:09:37 ---
This construct is not legal in C, as far as I understand it. It is an
aliasing violation to store data in one field of a union and read it
from another. The compiler can do crazy things that would make you
sad.

There's a series of macros that are the 'blessed' way of figuring out
endianness. Here's what I do in one project I have:

// This is all to figure out endian-ness and byte-swapping on various systems
#if defined(HAVE_ENDIAN_H)
#include <endian.h>           // for the __BYTE_ORDER use below
#elif defined(HAVE_SYS_ENDIAN_H)
#include <sys/endian.h>       // location on FreeBSD
#elif defined(HAVE_MACHINE_ENDIAN_H)
#include <machine/endian.h>   // location on OS X
#endif
#if defined(HAVE_SYS_BYTEORDER_H)
#include <sys/byteorder.h>    // BSWAP_32 on Solaris 10
#endif
#ifdef HAVE_SYS_ISA_DEFS_H
#include <sys/isa_defs.h>     // _BIG_ENDIAN/_LITTLE_ENDIAN on Solaris 10
#endif


In linux (glibc), the macro __BYTE_ORDER is set to either
__LITTLE_ENDIAN or __BIG_ENDIAN. Other systems do something similar.
I can't guarantee all systems have a macro for this, though.

craig


--
Configure bugmail: http://bugs.exim.org/userprefs.cgi?tab=email