Wide Character/Unicode Character

Wide characters, also known as Unicode characters, play a crucial role in programming by enabling the representation of a vast range of characters from various writing systems and languages. In contrast to narrow characters, which typically use a single byte to represent a character, wide characters require multiple bytes, typically two or four, to store each character.

Unicode is an industry standard that assigns unique numeric codes to almost every character in existence, including those from different scripts like Latin, Cyrillic, Chinese, Arabic, and many more. This allows programmers to develop software that can handle multilingual text and provide a consistent user experience across different languages.

Wide characters are particularly useful when dealing with non-ASCII characters, such as accented letters, symbols, or characters from non-Latin scripts. By adopting Unicode, programming languages and frameworks have become more inclusive and capable of supporting globalized applications.

The most commonly used wide character encoding is UTF-8 (Unicode Transformation Format 8-bit), which assigns variable lengths to different characters. UTF-8 is backward compatible with ASCII, as the first 128 characters in Unicode correspond to the ASCII character set.

In programming, wide characters are employed in a variety of applications. They are essential for handling text input and output, processing files, and manipulating strings in multiple languages. They allow developers to build internationalized software, enabling users to interact with programs in their preferred language and correctly display diverse characters.

However, it is important to note that handling wide characters can introduce challenges related to memory usage, string manipulation, and input validation. Care must be taken to properly handle conversions between narrow and wide characters, as well as ensure proper handling of string lengths and boundary conditions.

In summary, wide characters or Unicode characters are fundamental to modern programming, enabling software to handle diverse languages and character sets. They facilitate internationalization and localization efforts, ensuring that software can be used by people worldwide, regardless of their native language or writing system.

WCHAR typedef

unicode wchar_t is 16 bit char variable in COM/OLE, is also known as wide character. Wide character array/string can give Multilanguage language support.

typedef CHAR char;
typedef WCHAR short;

typedef STRING CHAR*;
typedef WSTRING WCHAR*;

Basic String in COM

BSTR, which stands for Basic String, is a fundamental data type used in Component Object Model (COM) programming. COM is a binary interface standard that enables software components to interoperate across different programming languages and platforms. BSTR is specifically designed to handle string data in a COM environment, providing a convenient and efficient way to exchange string information between COM components.

In COM programming, strings are typically represented using the Unicode character encoding scheme, which allows for the representation of a wide range of characters from different languages. BSTR is a Unicode string type that stores string data in a specific memory layout, making it compatible with COM's string handling mechanisms.

BSTR is a pointer-based data type, meaning that it stores the memory address of the actual string data rather than the data itself. This allows for efficient memory management and reduces the overhead associated with string manipulation. BSTR strings are allocated and deallocated using COM memory management functions, such as SysAllocString and SysFreeString, ensuring proper memory cleanup and preventing memory leaks.

One of the key features of BSTR is its automatic memory management. When a BSTR string is created, the necessary memory is allocated and managed by the COM runtime. This relieves the programmer from the burden of manual memory allocation and deallocation, reducing the risk of memory-related bugs.

BSTR also provides built-in support for string manipulation operations, such as concatenation, comparison, and substring extraction. COM provides a set of functions, like SysStringLen and SysStringByteLen, to perform these operations on BSTR strings efficiently and reliably.

Although BSTR is primarily used in COM programming, it has also found application in other Windows-based technologies, such as ActiveX controls and Windows API programming. Its wide adoption is due to its efficient memory management, support for Unicode strings, and compatibility with various programming languages.

In conclusion, BSTR is a specialized string data type used in COM programming to handle Unicode strings efficiently. It offers automatic memory management, built-in string manipulation operations, and interoperability across different languages and platforms. BSTR plays a crucial role in facilitating seamless communication between COM components and has become an integral part of Windows development.

BSTR type

BSTR is a kind of array of wide chars. The difference is BSTR holds the length of the string in the header and the content follows the header. The first 16bit/2 byte holds the length of the string and rest of the bytes contains the wide chars. All the out-of-process COM/DCOM uses this BSTR because when the string has to be transported through RPC the length indicates how many bytes should follow after header of BSTR.

BSTR memory layout
Header 16 bit(length) Body (Length * sizeof(wchar_t)

UNICODE vs BSTR

BSTR and Unicode are both used in Component Object Model (COM) and Distributed Component Object Model (DCOM) to handle string data. However, there are important differences between the two.

BSTR (Basic String) is a string data type specific to COM and DCOM. It is a binary data structure that contains a length prefix followed by the string data in a wide-character format (UTF-16). BSTRs are used to represent strings in COM interfaces and can be passed between components. They provide automatic memory management, as the memory for BSTRs is allocated and deallocated by the COM infrastructure.

Unicode, on the other hand, is a character encoding standard that represents characters from various writing systems using a universal character set. It encompasses a wide range of characters and supports multiple languages. Unicode can be encoded using different encoding schemes like UTF-8, UTF-16, etc. In COM and DCOM, Unicode is typically used as the underlying character encoding for BSTRs.

The key difference between BSTR and Unicode lies in their nature. BSTR is a specific data type that represents strings in COM and DCOM, whereas Unicode is a character encoding standard used to represent characters universally. BSTRs provide additional features like automatic memory management, while Unicode is a broader concept that encompasses different encoding schemes.

In summary, BSTR is a specific implementation of Unicode used in COM and DCOM for string handling, providing features like automatic memory management. Unicode, on the other hand, is a character encoding standard that allows representation of characters from various writing systems and is used as the underlying encoding for BSTRs in COM and DCOM.

Allocate BSTR

BSTR can be allocated using SysAllocString() API using header file oleauto.h. It takes an WIDE string or unicode string and returns the BSTR buffer. A failure case is handled by returning NULL to the caller.

BSTR SysAllocString(
  const OLECHAR *psz
);

Free BSTR

BSTR allocated using SysAllocString() API should be deallocated using SysFreeString(). All the BSTR APIs are part of oleauto.h header file.

void SysFreeString(
  BSTR bstrString
);

Note: Programmer should not use malloc or free calls for BTRs allocation and deallocation process.

Allocate & free BSTR example

Here is an example use of BSTR with webbrowser navigate method. We are allocation the URL BSTR string from an UNICODE string and using in IWebBrowser.Navigate(URL) call. Once the call is made we are deallocating it.

 DISPID named;
  VARIANT *args;
  args = new VARIANT[1];
  named = DISPID_PROPERTYPUT;
  VariantInit(&args[0]);
  args[0].bstrVal = SysAllocString(L"http://www.google.com/");
  dp.rgvarg = args;
  dp.cArgs = 1;
  dp.rgdispidNamedArgs = NULL;
  dp.cNamedArgs = 0;
  
  /* IWebBrowser.Navigate(URL) using Invoke */
  pBrowser->Invoke(IeMethodNavigate,
                   IID_NULL,
                   LOCALE_SYSTEM_DEFAULT,
                   DISPATCH_METHOD,
                   &dp,NULL,NULL,&nErr);

  ..
  SysFreeString(args[0].bstrVal);

Function returning BSTR example

In the above case we are having an example of call by value. The caller is allocating the string and passing to an external method. Now another case is possible where the interface method is allocating the string and that is returned to the caller. This is known as call by reference and returned by reference. The caller is providing the reference BSTR address to the method and the method is allocating it and returning the same. In this case the parameter is known as [out,retval] parameter in DCOM terminology. The caller is responsible for the deallocation of this BSTR buffer. A memory leak might happen if the caller misses to call the SysFreeString.

/* Allocation of BSTR in external DCOM */
HRESULT CExternalInterface::get_SearchUrl(BSTR* pbstr)
{
  HRESULT hr;
  if(!pbstr)
  {
    return E_INVALIDARG;
  }
  
  /* The client is now responsible for freeing pbstr. */
  *pbstr = SysAllocString(L"http://www.google.com/");
  if (*pbstr)
  {
    return(S_OK);
  }
  else
  {
    return E_OUTOFMEMORY;
  } 
}

/* DCOM client */
void GetSearchUrl()
{
  BSTR bstrUrl;

  if ( pExternalInterface->get_SearchUrl(&bstrUrl) == S_OK)
  {
    .....
    .....
    /* Use bstrUrl then free it */
    SysFreeString(bstrUrl);
   }
}

About our authors: Team EQA

You have viewed 1 page out of 67. Your COM/DCOM learning is 0.00% complete. Login to check your learning progress.

#