EBCDIC Code Pages and Square Brackets

Back in the previous millennium, I created a set of simple routines for handling initialization files (section & parameters) that worked all platforms. i.e. Windows, Linux, Unix, IBM i (OS/400) and z/OS (mainframe). And over the years, the routines have worked well on any platform I tested them on.
i.e. Sample initialization file

[MyTestSection]
foo=123
bar = "a test"

The problem comes when you give someone a program that has been compiled on a system using code page 37 but the customer is running their system on code page 273.

Back in the 60s/70s, when IBM was creating EBCDIC code pages, someone was smoking something because the square brackets are randomly using a different HEX values on the various EBCDIC code pages. Note: On distributed platforms, this is not a problem.

For example, on code page 37, ‘[‘ has a HEX value of 0xBA and ‘]’ has a HEX value of 0xBB whereas on code page 273, ‘[‘ has a HEX value of 0x63 and ‘]’ has a HEX value of 0xFC. Hence, in your C code, if you have something like this:

sprintf(sectionMarker, "[%s]", sectionName);

When the compiler compiles your code, it stores the square brackets in their HEX values. So, when the program uses those routines on a different code page to find a section name (with square brackets), it will fail because the HEX values of those square brackets will not match. It’s a real pain. Years ago, I hard-coded the HEX values for square brackets for the 3 most popular code pages.

Recently, a customer was trying out a program but it was not working. It took a couple of days to realize that they were running their system on a different code page. At first, I thought I would add their code page to the hard-coded list but then I thought that solution was stupid and started hunting for a proper solution.

I decided that what I needed was a lookup table based on the code page (CCSID) and the HEX values for the square brackets. The trick was figuring out how to get the code page from the OS (operating system). After what felt like a millions searches, I can across the nl_langinfo subroutine. It is available on AIX, Linux, IBM i and z/OS. If you call nl_langinfo subroutine with the parameter of CODESET then it will return the code page (aka code set, CCSID) as a string.

Next, I found 101 EBCDIC code pages and created a lookup table of the square brackets in sorted order by code page. Finally, I created a routine to issue the nl_langinfo subroutine and then perform a binary search of the lookup table and set the correct square brackets for the program to use.

So, it in the spirit of the holiday season, I figured, I would share what I created just in case other people have similar cross platform issues.

Here is the square bracket lookup table:

#if defined(MVS) || defined(OS400)
/* Structure for table of square brackets.   */
typedef struct SQUARE_BRACKETS_S
{
   int               codePage;
   unsigned char     LeftBracket;
   unsigned char     RightBracket;

} SQUARE_BRACKETS_T;

/*
 *  EBCDIC code pages and their square bracket HEX values.
 */
static SQUARE_BRACKETS_T SQUARE_BRACKETS_TABLE[] =
{
   {    1, 0x4A, 0x5A},
   {    2, 0x4A, 0x5A},
   {    6, 0x4A, 0x5A},
   {   10, 0x4A, 0x5A},
   {   16, 0xDA, 0xEA},
   {   21, 0xDA, 0xEA},
   {   22, 0xDA, 0xEA},
   {   24, 0x4A, 0x5A},
   {   29, 0xDA, 0xEA},
   {   37, 0xBA, 0xBB},  //   37 / 1140
   {   38, 0x4A, 0x5A},
   {   39, 0x4F, 0x6A},
   {  251, 0x4A, 0x5A},
   {  256, 0x4A, 0x5A},
   {  257, 0x4A, 0x5A},
   {  258, 0x4A, 0x5A},
   {  264, 0xAD, 0xBD},
   {  273, 0x63, 0xFC},  //  273 / 1141
   {  274, 0x4A, 0x5A},
   {  275, 0x71, 0x68},
   {  277, 0x9E, 0x9F},  //  277 / 1142
   {  278, 0xB5, 0x9F},  //  278 / 1143
   {  280, 0x90, 0x51},  //  280 / 1144
   {  281, 0xB1, 0xBB},
   {  282, 0x4A, 0x5A},
   {  283, 0x4A, 0x5A},
   {  284, 0x4A, 0x5A},  //  284 / 1145
   {  285, 0xB1, 0xBB},  //  285 / 1146
   {  290, 0x70, 0x80},
   {  297, 0x90, 0xB5},  //  297 / 1147
   {  330, 0x4A, 0x5A},
   {  352, 0xAD, 0xBD},
   {  361, 0x4A, 0x5A},  //  361 / 389
   {  382, 0x63, 0xFC},
   {  383, 0x4A, 0x5A},
   {  384, 0x71, 0x68},
   {  385, 0x44, 0x79},
   {  386, 0x9E, 0x5A},
   {  387, 0xB5, 0x5A},
   {  388, 0x90, 0xB5},
   {  389, 0x90, 0x51},  //  361 / 389
   {  410, 0x4A, 0x5A},
   {  423, 0x4A, 0x5A},  //  423 /  875 /  4971 / 9067
   {  424, 0xBA, 0xBB},  //  424 / 8616 / 12712
   {  425, 0xAD, 0xBD},
   {  500, 0x4A, 0x5A},  //  500 / 1148
   {  833, 0x70, 0x80},
   {  836, 0xBA, 0xBB},
   {  838, 0x49, 0x59},  //  838 / 1160
   {  870, 0x4A, 0x5A},  //  870 / 1110 / 1153
   {  871, 0xAE, 0x9E},  //  871 / 1149
   {  875, 0x4A, 0x5A},  //  423 /  875 / 4971 / 9067
   {  880, 0x4A, 0x5A},
   {  892, 0x4A, 0x5A},
   {  893, 0x4A, 0x5A},
   {  905, 0x68, 0xB6},
   {  918, 0x4A, 0x5A},
   {  924, 0xAD, 0xBD},  // 1047 /  924
   { 1002, 0xAD, 0xBD},
   { 1025, 0x4A, 0x5A},  // 1025 / 1154
   { 1026, 0x68, 0xAC},  // 1026 / 1155
   { 1027, 0xAD, 0xBD},
   { 1047, 0xAD, 0xBD},  // 1047 /  924
   { 1069, 0x4A, 0x5A},
   { 1070, 0xBA, 0xBB},
   { 1079, 0x4A, 0x5A},
   { 1081, 0x90, 0xB5},
   { 1084, 0x4A, 0x5A},
   { 1097, 0xBA, 0xBB},
   { 1110, 0x4A, 0x5A},  //  870 / 1110 / 1153
   { 1112, 0xBA, 0xBB},  // 1112 / 1156
   { 1113, 0x4A, 0x5A},
   { 1122, 0xB5, 0x9F},  // 1122 / 1157
   { 1123, 0x4A, 0x5A},  // 1123 /
   { 1130, 0x4A, 0x5A},  // 1130 / 1164
   { 1132, 0x49, 0x59},
   { 1137, 0xAD, 0xBD},
   { 1140, 0xBA, 0xBB},  //   37 / 1140
   { 1141, 0x63, 0xFC},  //  273 / 1141
   { 1142, 0x9E, 0x9F},  //  277 / 1142
   { 1143, 0xB5, 0x9F},  //  278 / 1143
   { 1144, 0x90, 0x51},  //  280 / 1144
   { 1145, 0x4A, 0x5A},  //  284 / 1145
   { 1146, 0xB1, 0xBB},  //  285 / 1146
   { 1147, 0x90, 0xB5},  //  297 / 1147
   { 1148, 0x4A, 0x5A},  //  500 / 1148
   { 1149, 0xAE, 0x9E},  //  871 / 1149
   { 1153, 0x4A, 0x5A},  //  870 / 1110 / 1153
   { 1154, 0x4A, 0x5A},  // 1025 / 1154
   { 1155, 0x68, 0xAC},  // 1026 / 1155
   { 1156, 0xBA, 0xBB},  // 1112 / 1156
   { 1157, 0xB5, 0x9F},  // 1122 / 1157
   { 1158, 0x4A, 0x5A},  // 1123 / 1158
   { 1160, 0x49, 0x59},  //  838 / 1160
   { 1164, 0x4A, 0x5A},  // 1130 / 1164
   { 1165, 0xAD, 0xBD},
   { 1166, 0x4A, 0x5A},
   { 4971, 0x4A, 0x5A},  //  423 /  875 /  4971 / 9067
   { 8616, 0xBA, 0xBB},  //  424 / 8616 / 12712
   { 9067, 0x4A, 0x5A},  //  423 /  875 /  4971 / 9067
   {12712, 0xBA, 0xBB}   //  424 / 8616 / 12712
};

static int SQUARE_BRACKETS_SIZE =
             sizeof(SQUARE_BRACKETS_TABLE) / sizeof(SQUARE_BRACKETS_T);

/* Use values from code page 37 for default values. */
static unsigned char LEFT_SQUARE_BRACKET  = 0xBA;
static unsigned char RIGHT_SQUARE_BRACKET = 0xBB;
#endif

Note #1: Most code pages have an updated code page because of the addition of the Euro currency symbol. i.e. code page 1140 is a duplicate of 37 but with the addition of the Euro currency symbol.

Note #2: The variables LEFT_SQUARE_BRACKET and RIGHT_SQUARE_BRACKET will be the variables used in the code any time the program needs to read or write information to a file.

Here is the subroutine to lookup and set the correct square brackets. Note: You should call this subroutine at the very beginning of your program’s initialization, so that it is done once before the code ever needs to use square brackets.

#if defined(MVS) || defined(OS400)
/**
 * Function Name
 *  LookupAndSetSquareBrackets
 *
 * Description
 *  This function will lookup & set the left & right square brackets.
 *
 * IBM i code for nl_langinfo(CODESET) from:
 * https://www.ibm.com/docs/en/i/7.5?topic=functions-nl-langinfo-retrieve-locale-information
 *
 * z/OS code for nl_langinfo(CODESET) from:
 * https://www.ibm.com/docs/en/zos/2.5.0?topic=functions-nl-langinfo-retrieve-locale-information
 *
 * Input parameters
 *  N/A
 *
 * Output
 *  N/A
 *
 * Return Value
 *  None.
 */
void    LookupAndSetSquareBrackets()
{
   /* --------------------------------------------
    * Variable declarations.
    * --------------------------------------------
    */
   int   low = 0;
   int   mid;
   int   high = SQUARE_BRACKETS_SIZE - 1;
   int   currentCodePage = 0;
   char *pCodeSet;

   /* --------------------------------------------
    * Code section
    * --------------------------------------------
    */
   printf("LookupAndSetSquareBrackets()\n" );

   pCodeSet = nl_langinfo(CODESET);
   currentCodePage = atoi(pCodeSet);

   printf("CodeSet=’%s’ : currentCodePage=%d\n",
          pCodeSet,
          currentCodePage);

   /*
    * 1. currentCodePage of zero means we didn't get a valid value from
    * nl_langinfo call.
    *
    * 2. The default HEX values set for LEFT_SQUARE_BRACKET &
    * RIGHT_SQUARE_BRACKET variables are from CCSID 37. Hence,
    * no point in looking up the values for CCSID 37.
    */
   if ( (currentCodePage != 0) && (currentCodePage != 37) )
   {
      while (low <= high)
      {
         mid = (low + high) / 2;
         if (SQUARE_BRACKETS_TABLE[mid].codePage < currentCodePage)
            low = mid + 1;
         else if (SQUARE_BRACKETS_TABLE[mid].codePage > currentCodePage)
            high = mid - 1;
         else
         {
            /* Found it. */
            LEFT_SQUARE_BRACKET = SQUARE_BRACKETS_TABLE[mid].LeftBracket;
            RIGHT_SQUARE_BRACKET = SQUARE_BRACKETS_TABLE[mid].RightBracket;
            printf("Found it at index=%d\n", mid);
            break;
         }
      }
   }

   printf("Using '[' with the value of 0x'%0X' "\
          "and ']' with the value of 0x'%0X'\n",
          LEFT_SQUARE_BRACKET,
          RIGHT_SQUARE_BRACKET);
}
#endif

So, there you go. Everything you need to correctly handle reading and writing square brackets on IBM i and/or z/OS systems that use EBCDIC code pages.

Therefore, when you want to use square brackets, you simply do the following:

sprintf(sectionMarker,
        "%c%s%c",
        LEFT_SQUARE_BRACKET
        sectionName,
        RIGHT_SQUARE_BRACKET);

I realize that the number of people requiring this type of code is pretty small but hopefully, in the future, it will save someone a few headaches!

Note: If I missed an EBCDIC code page please let me know and I will add it to the lookup table.

Regards,
Roger Lacroix
Capitalware Inc.

This entry was posted in C, Capitalware, IBM i (OS/400), Open Source, Programming, z/OS.

Comments are closed.