Chapter 4. An In-Depth Look at Imported Assemblies

In This Chapter

Converting the Library

Converting COM Data Types

Converting Methods and Properties

Converting Interfaces

Converting Classes

Converting Modules

Converting Structures

Converting Unions

Converting Enumerations

Converting Typedefs

Converting ActiveX Controls

Chapter 3, “The Essentials for Using COM in Managed Code,” demonstrated how using a COM component in a .NET application can be just like using a .NET component after you’ve generated an Interop Assembly. Similarly, it showed how using an ActiveX control can be just like using a Windows Forms control after you’ve generated an ActiveX Assembly. With the help of the component’s documentation and Visual Studio .NET’s IntelliSense feature, perhaps many COM components and ActiveX controls can be used without an in-depth understanding of the contents of these assemblies and their relation to the original type libraries. However, it is often necessary to gain a deeper understanding of the type library importer and the ActiveX importer. After all, most COM components are documented from the perspective of an unmanaged C++ or Visual Basic 6 (VB6) user!

This chapter describes in depth the behavior of both the type library importer and the ActiveX importer. Each element that can be found in a type library (classes, interfaces, methods, properties, and so on) is discussed, one at a time. Although using types in an Interop Assembly often feels like using the same types in Visual Basic 6, metadata in an assembly and type information in a type library are quite different. Therefore, it’s best to think of the generated metadata as a transformation of the type library, containing .NET definitions of the original COM types. These .NET definitions serve as type information for the Runtime-Callable Wrappers (RCWs) used at run time.

Rather than containing large examples, this chapter contains several short code listings that demonstrate each transformation. Because both metadata and type library information are stored as binary data, examples need to be shown in representative languages. Most COM examples are shown in Visual Basic 6 and Interface Definition Language (IDL) syntax, and most .NET examples are shown in C# and Visual Basic .NET syntax.

As mentioned in the previous chapter, the type information in an Interop Assembly tells the Interop Marshaler how to marshal data to and from the corresponding COM component. The metadata that the type library importer produces is not much different from the metadata found in any .NET component. It is simply decorated with custom attributes and pseudo-custom attributes that signal that special treatment from the CLR is required. The importer knows how to produce metadata type definitions that correspond to the Interop Marshaler’s rules for converting unmanaged types to managed types. The Interop Marshaler is not general-purpose, but it is somewhat flexible. Thus, the designers of the type library importer have made some choices about how the metadata should look in situations with more than one possibility. In other words, the type definitions can be manually altered and still work without problems. The process of customizing the Interop Assembly is covered in Chapter 7, “Modifying Interop Assemblies.”

Converting the Library

There is a one-to-one mapping between a type library and an Interop Assembly. The Interop Assembly always contains metadata for all the types in one type library. If the input type library references and uses types from additional type libraries, the importer automatically generates an additional assembly for each referenced type library.

Caution

In order for the type library importer to automatically generate additional Interop Assemblies for dependent type libraries, the dependent type libraries must be registered. This is necessary because type libraries are referenced via Library IDs (LIBIDs), and the Windows Registry is the only mechanism capable of locating them. If Primary Interop Assemblies are registered for dependent type libraries, they will be referenced automatically by the imported assembly.

The generated Interop Assembly is always a single-file assembly. When created by TLBIMP.EXE, the default assembly name is equal to the library name, and the default filename is equal to the library name plus .dll. When created inside Visual Studio .NET, the assembly name and filename are Interop.LibraryName and Interop.LibraryName.dll, respectively. In either case, all the types inside the assembly reside in a single namespace equal to the library name.

The library name is not necessarily the filename of the file containing the type library. In Visual Basic 6 it’s known as the project name, and in IDL it’s the name of the library statement (highlighted in the following IDL code):

[
  uuid(C866CA3A-32F7-11D2-9602-00C04F8EE628),
  version(5.0),
  helpstring("Microsoft Speech Object Library")
]
library SpeechLib
{
  importlib("stdole2.tlb");
  ...
  coclass SpVoice {
    [default] interface ISpeechVoice;
    interface ISpVoice;
    [default, source] dispinterface _ISpeechVoiceEvents;
  };
  ...
};

As you saw in Chapter 3, an Interop Assembly called SpeechLib.dll is generated with types such as SpeechLib.SpVoice for a type library containing this information. Using the library name as a namespace should feel natural to VB6 programmers because these same names serve a similar purpose in VB6 code.

Caution

Sometimes COM components have a library name that is the same as the filename containing the type library. This is quite common for components authored in Visual Basic 6, because the default behavior of the IDE makes them the same (for example, Project1). An example of this can be seen by running TLBIMP.EXE on the Microsoft XML type library (MSXML.DLL) from the same directory in which it resides on your computer:

C:windowssystem32>TlbImp MSXML.dll

For this situation, TLBIMP.EXE gives the following message:

TlbImp error: Output file would overwrite input file

The solution is to have TLBIMP.EXE place the output file in a different directory and/or choose a different name for the output file using the /out option:

TlbImp MSXML.dll /out:Interop.MSXML.dllTlbImp MSXML.dll /out:C:MyApplicationMSXML.dll

Be extremely careful with the /out option, however, because the name you choose for the filename will also become the namespace associated with all the type definitions contained within (minus the .dll extension, of course). The bad thing about this is whatever case you use—msxml, MsXmL, MSXML, and so on—will become the case of the namespace. To avoid this confusion, you should set the namespace independently of the output filename by using the /namespace option.

The output assembly, like all assemblies, has a four-part version number (Major.Minor.Build. Revision). Type libraries have a two-part version number (Major.Minor), so the assembly produced has the version Major.Minor.0.0. The culture of the assembly is always marked as neutral, and it doesn’t have a strong name unless it was produced using the TLBIMP.EXE strong-naming options described in Chapter 15, “Creating and Deploying Useful Primary Interop Assemblies.” In Visual Studio .NET, the General property page for Visual C# projects enables users to select a key file or key container used to give Interop Assemblies a strong name. These options are labeled Wrapper Assembly Key File and Wrapper Assembly Key Name. If you decide to use one of these options, be sure to do it before referencing the COM components! This option is not available to Visual Basic .NET or Visual C++ .NET projects. The reasoning for this is that you should attempt to find a Primary Interop Assembly for a COM component you wish to use (which is already strong-named) rather than giving it your own strong name.

Now that we’ve covered the properties of the output assembly, we’ll look in depth at how the contents of the assembly are produced, based on the input type library. Because everything in a type library is implicitly public (otherwise it would simply be omitted from the type information), every type and member is public in the output assembly.

Converting COM Data Types

A type library is typically filled with data types used as method or property parameters, fields of structures, and so on. Every data type occurrence in a type library is converted to an equivalent data type in the Interop Assembly’s metadata. Table 4.1 lists the transformations done by the type library importer. Each COM data type is shown in two representations: IDL and Visual Basic 6. The .NET type is shown as the language-neutral system type. Keep in mind what the system type means in your language of choice, shown in Chapter 1, “Introduction to the .NET Framework.” For example, although System.Int16 can be used in any .NET language, it is typically referred to by the alias short. (When compiled, short becomes System.Int16 in MSIL.)

Caution

When browsing type libraries, OLEVIEW.EXE doesn’t display the COM DECIMAL type correctly, showing wchar_t instead.

Table 4.1. COM Data Types are Converted to .NET Data Types with Similar Semantics

Image

This table is accurate for data types used as parameters, but the conversions are sometimes slightly different for data types used as fields of structures. This is discussed in the “Converting Structures” section later in the chapter.

The IDL data types in Table 4.1 represent the OLEVIEW.EXE IDL representation of data types contained in a type library (with the exception of DECIMAL). However, IDL can contain a variety of distinct types for the same type library type. Thus, some information can be lost when creating a type library from IDL.

Tip

Because IDL files or documentation for COM components might refer to some data types not mentioned in Table 4.1, it’s helpful to know how these additional types look inside a type library. For example:

boolean and small become char

wchar_t becomes unsigned short

hyper and __int64 become int64

[string] char* becomes LPSTR

[string] wchar_t* becomes LPWSTR

byte becomes unsigned char

unsigned hyper and unsigned __int64 become uint64

Combining Several Types in One

In Table 4.1 you can see that, as in the conversion from IDL to a type library, distinct type library types can become a single data type in .NET. This continual loss of information is pictured in Figure 4.1.

Figure 4.1. Information about data types can be lost when moving from an IDL definition to a type library definition, and when moving from a type library definition to a metadata definition.

Image

As mentioned in the previous chapter, some common types used in COM no longer exist in the .NET Framework: CURRENCY, VARIANT, IUnknown, IDispatch, SCODE, and HRESULT. With the exception of HRESULT versus SCODE, the original type library type isn’t lost when converted to metadata, just hidden in a pseudo-custom attribute called MarshalAsAttribute (commonly abbreviated as MarshalAs). MarshalAsAttribute (described further in Chapter 12, “Customizing COM’s View of .NET Components”) has a constructor with an UnmanagedType parameter. UnmanagedType is an enumeration in System.Runtime.InteropServices that defines the range of types found in a type library.

Each potentially ambiguous .NET type that the importer produces (Int32, Decimal, String, and Object) is marked with MarshalAsAttribute. The following sections look more closely at these types, explaining the rationale behind their removal from the .NET Framework, and why the COM types are mapped to .NET types the way they are.

VARIANT, IUnknown*, and IDispatch*

The root of all COM interfaces is IUnknown. For any class to be considered a COM object, it must implement at least IUnknown. (All COM objects written in Visual Basic 6 implement IUnknown; it’s just hidden from the programmer.) For these reasons, an occurrence of IUnknown* in COM has a role like the System.Object type in the .NET Framework, which can be used to represent a reference to any type. It’s not quite that simple in COM, however, because base types (like integers and strings) are not COM objects and can’t implement interfaces. So, IUnknown* really means a pointer to any interface.

COM also has a VARIANT type that can contain anything: interface pointers, base types (such as integers), or even user-defined structures. Because a System.Object instance can be any type, and a VARIANT instance can contain any type, a mapping between these two is natural.

IDispatch is an interface that is implemented by most COM objects to support late binding, although it’s not required the way IUnknown is. IDispatch doesn’t exist in the managed world, because every .NET object automatically has the late binding functionality of IDispatch through reflection mechanisms. Because every .NET object supports late binding, there is also a pretty strong relationship between IDispatch* (a pointer to an interface that supports late binding) and System.Object (a reference to an object that supports late binding). Note that Visual Basic 6 has an Object type that really means IDispatch*.

Rather than carrying forward a distinction between these generic types in .NET, they have all been merged into the single System.Object type. By default, a System.Object in a managed signature is assumed to be a VARIANT in COM. If it’s supposed to represent IUnknown* or IDispatch*, the type in the signature must be marked with MarshalAsAttribute and the appropriate UnmanagedType values. Listing 4.1 demonstrates this by showing the transformation of three method signatures from VB6/IDL to metadata, as viewed by C# and Visual Basic .NET. The type library importer would actually mark the parameter of the VariantParameter method with MarshalAs(UnmanagedType.Struct) (which means VARIANT) but this is omitted from the listing because it’s the default behavior for Object parameters.

Listing 4.1. The COM View and the .NET View of the Same Methods with VARIANT, IUnknown*, and IDispatch* Parameters

Image

Tip

MarshalAsAttribute is always marked on signatures; not on the use of such signatures. Therefore, you never need to use or be aware of the existence of MarshalAsAttribute when calling members of COM objects in managed code. The type library importer takes care of the necessary custom attribute markings for you.

The void* type is commonly used in C++ to represent a pointer to anything, so you might expect this to also be converted to System.Object. As you can see in Table 4.1, it’s converted to System.IntPtr, an integer with the size of a pointer. This conversion must be done because the memory pointed to by the void* can be absolutely anything. So, there can be no standard mechanism to transform it into a System.Object. Fortunately, VARIANT or IUnknown* are typically used in COM methods instead of void*. So, what can you do with a System.IntPtr type in managed code? This is covered in Chapter 6, “Advanced Topics for Using COM Components.”

Tip

The .NET type System.IntPtr is not a pointer to an integer, but an integer of size pointer, just as System.Int16 is an integer of size 16 bits and System.Int32 is an integer of size 32 bits. This is like the Win32 INT_PTR type. It is typically used to hold a pointer to an object, because the size of an object’s address changes depending on the underlying platform (for example, 32 or 64 bits).

CURRENCY and DECIMAL

COM has separate types for CURRENCY and DECIMAL, but the .NET Framework has no public System.Currency type, only System.Decimal. A separate currency type in .NET was eliminated because the Decimal type can represent anything that COM’s CURRENCY type can represent. Thus, .NET applications use the System.Decimal type for storing currency values. Because of this, it’s natural for DECIMAL in COM to become System.Decimal, and for CURRENCY in COM to become System.Decimal marked with the MarshalAsAttribute pseudo-custom attribute. Listing 4.2 shows the transformation.

Listing 4.2. The COM View and the .NET View of the Same Methods with DECIMAL and CURRENCY Parameters

Image

BSTR, LPSTR, and LPWSTR

BSTR (basic string), LPSTR (pointer to string), and LPWSTR (pointer to wide-character string) are three distinct types used to represent a string in unmanaged code, so they all become System.String in metadata. This is consistent with Visual Basic 6, in which all three of these type library types are treated as a String. So, what’s the difference between them?

BSTR, which stands for either basic string or binary string, is the most commonly used string type in COM. It contains Unicode characters and is prefixed with its length. Because the length of the string is always known, a BSTR can contain embedded null characters. String types authored in Visual Basic 6 are BSTRs.

LPSTR is a pointer to an array of ANSI characters. The end of the string is marked with a null character. Although less commonly used in COM than a BSTR, an LPSTR is more convenient to use in unmanaged C++ code.

LPWSTR is a pointer to an array of Unicode characters, with the end marked by a null character. The W stands for wide, because each Unicode character consumes twice as much memory as an ANSI character (two bytes instead of one).

Listing 4.3 shows the transformation with these three types of strings. The type library importer would actually mark the parameter of the BStrParameter method with MarshalAs(UnmanagedType.BStr) but this is omitted from the listing because it’s the default behavior for String parameters.

Listing 4.3. The COM View and the .NET View of the Same Methods with a Variety of String Parameters

Image

If a COM method has an LPSTR or LPWSTR parameter treated as an [in, out] buffer, the importer still represents it as a by-value System.String, which is not appropriate because a .NET string is immutable. Using the techniques described in Chapters 6 and 7, you could change the parameter to a System.IntPtr type instead and use it in an appropriate way.

SCODE and HRESULT

The SCODE and HRESULT types don’t exist in the .NET Framework, so they become 32-bit integers in managed code (again, with the MarshalAsAttribute pseudo-custom attribute).

You saw in the preceding chapter that the HRESULT return values are hidden in managed signatures, just as in Visual Basic 6. Thus, this transformation mainly comes into play with HRESULT and SCODE parameters. It’s rare to see such a thing, but Listing 4.4 provides an example.

Listing 4.4. The COM View and the .NET View of the Same Methods with SCODE and HRESULT Parameters

Image

Unlike with the previously examined types, MarshalAsAttribute is mainly for informational purposes when used with UnmanagedType.Error, as the data is still just marshaled as a 32-bit number.

Complex Types

Data types in a type library can be more complex than the basic ones listed in Table 4.1, such as user-defined classes, interfaces, structs, or an array of any other type. Each complex type is converted to a corresponding .NET complex type, and these are covered throughout the rest of this chapter. The transformation for the use of these types as parameters or fields is usually simple: With two exceptions, an occurrence of any unmanaged type X, Y, or Z as a parameter or field is transformed into the managed type X, Y, or Z. One exception is the set of types listed in Table 4.1. The other exception is for default interfaces that are only implemented by one class inside a type library. More about this is discussed in the “Converting Classes” section.

In this section, we’re first going to take a look at what happens to three special interface types (listed in Table 4.1) that don’t abide by the typical rules. After this, we examine array types.

IDispatchEx, IEnumVARIANT, and ITypeInfo Interface Pointers

Occurrences of IDispatchEx*, IEnumVARIANT*, and ITypeInfo*, which represent pointers to standard COM interfaces (like IUnknown* and IDispatch*), are converted to different .NET types with similar semantics in the managed world. These COM interfaces have the following semantics:

IDispatchEx. This obscure and poorly documented interface is an extension to IDispatch that, besides enabling dynamic invocation, enables dynamic addition and removal of members. The .NET equivalent interface is IExpando. IDispatchEx is mainly used by unmanaged scripting languages, such as JScript, and IExpando is mainly used by managed scripting languages, such as JScript .NET.

IEnumVARIANT. This interface provides the means to enumerate over a collection of VARIANTs. It is almost identical to IEnumerator, the .NET interface that provides the means to enumerate over a collection of Objects. Thus, IEnumVARIANT* parameters and fields are transformed into IEnumerator types.

ITypeInfo. Using this interface is the standard COM way to programmatically view an object’s type information. The managed analog to this interface is the System.Type class, the gateway to reflection.

The transformation of IDispatchEx*, IEnumVARIANT*, and ITypeInfo* parameters and fields to IExpando, IEnumerator, and Type parameters and fields, respectively, are essential in making COM objects seamlessly usable in .NET. That’s because .NET applications are designed to use these new types for the same common tasks. If using a COM collection in managed code required calling methods on IEnumVARIANT, many .NET clients would need to have separate code to deal with .NET collections and COM collections (not to mention an extra step to determine whether it’s dealing with a pure .NET object or a COM object)!

The IEnumVARIANT to IEnumerator transformation is part of what enables foreach in C# and For Each in VB .NET to work on a COM object (as demonstrated in the preceding chapter), because these language constructs call the methods of IEnumerator. The other part that makes this work is covered later in this chapter, in the “Special DISPIDs” section.

Listing 4.5 demonstrates the transformation for an IEnumVARIANT* parameter, which assumes that the System.Runtime.InteropServices.CustomMarshalers namespace is being used/imported in addition to System.Runtime.InteropServices, and also that the CustomMarshalers assembly that ships with the .NET Framework is being referenced.

Listing 4.5. IEnumVARIANT* Parameters are Transformed to IEnumerator Types Providing the Same Enumeration Semantics in a .NET-Centric Manner

Image

Arrays

The transformation of COM arrays from a type library to .NET metadata is an interesting topic because it’s filled with limitations and gotchas. First, we’ll take a look at the most common type of array, a SAFEARRAY. Then we’ll look at a handful of array representations that are lumped into a category known as C-style arrays. A C-style array is simply a pointer to a type, and that type happens to be the first element of an array (a contiguous sequence of types in memory). To differentiate a pointer to a single instance from a pointer to the beginning of an array, IDL contains a plethora of attributes that describe the pointer: length_is, size_is, first_is, last_is, and max_is. (There’s even a min_is attribute, but this is never used on Microsoft platforms!) Different combinations of these attributes affect the way the array is classified, as described in the following sections.

SAFEARRAYS

In COM, an array is typically represented as a SAFEARRAY type. A SAFEARRAY is a self-describing array that can contain any type capable of being placed in a VARIANT. It can have any number of dimensions—each with distinct upper and lower bounds. A SAFEARRAY is the same as Visual Basic 6’s array type.

The default importer behavior when encountering SAFEARRAYs differs depending on whether you use TLBIMP.EXE or Visual Studio .NET. When referencing a type library in Visual Studio .NET, the IDE effectively runs TLBIMP.EXE with its /sysarray option. This means that every occurrence of a SAFEARRAY is converted to a System.Array type. System.Array is the base class of all .NET arrays, and is flexible enough to represent anything a SAFEARRAY can represent (multiple dimensions and custom bounds for each dimension).

The problem with System.Array is that it’s more cumbersome to use in the current .NET languages than more specific array types (reminiscent of using SAFEARRAYs in unmanaged C++). Elements of a System.Array must be accessed via methods such as GetValue and SetValue. Furthermore, the type of a System.Array’s elements is not known at compile time, so you lose a degree of strong typing when using these generic arrays.

Tip

If you’re interacting with a COM object whose imported members use System.Array types, the easiest way to get and set array elements is to first cast the System.Array type to a more specific array, such as int[,] in C#. Casting a System.Array works for arrays of any type or dimension (as long as you’re casting it to the same kind of array) but does not work for an array with any non-zero lower bounds in C#, Visual Basic .NET, or C++. This limitation exists simply because these languages don’t natively support arrays with non-zero lower bounds, so there’s no matching array type to which you can cast the System.Array!

Suppose that a Visual Basic 6 COM object has a method called ReturnArray that returns a two-dimensional SAFEARRAY. In C#, it could be used as a generic System.Array as follows:

Array a = comObj.ReturnArray();

// Don't assume anything about the lower bounds, but assume
// we're dealing with a 2-D array.
for (int i = a.GetLowerBound(0); i <= a.GetUpperBound(0); i++)
{
  for (int j = a.GetLowerBound(1); j <= a.GetUpperBound(1); j++)
  {
    a.SetValue((int)a.GetValue(i, j) * 2, i, j);
  }
}

For a simple case such as this, there are better alternatives such as using foreach to access each element of the array regardless of the number of dimensions. However, the methods of System.Array are used just to demonstrate what dealing with the generic Array type might look like.

If we know that both dimensions have a zero lower bound, however, we could cast it to a more specific array type and interact with that instead:

int [,] a = (int [,]) comObj.ReturnArray();

// We know the lower bounds must be zero if the cast succeeded.
for (int i = 0; i < a.GetLength(0); i++)
{
  for (int j = 0; j < a.GetLength(1); j++)
  {
    a[i,j] = a[i,j] * 2;
  }
}

Because System.Array types can be a hassle to use, TLBIMP.EXE provides a choice for how SAFEARRAYs should be imported into an Interop Assembly. If you use the /sysarray option from a command prompt, the behavior matches Visual Studio .NET. If you don’t use the /sysarray option and accept the default command-line behavior, all SAFEARRAYs are imported as one-dimensional arrays with a zero lower bound. The importer can’t do anything more appropriate because SAFEARRAYs don’t describe their bounds or rank in a type library—only the type of their elements.

If a COM component only uses SAFEARRAYs that are single-dimensional and have a lower bound of zero, the default TLBIMP.EXE behavior works great. For such components that don’t already have a Primary Interop Assembly, it’s a good idea to use TLBIMP.EXE from a command prompt and reference the output assembly in Visual Studio .NET rather than referencing the type library directly in Visual Studio .NET. This way, you can eliminate the use of System.Array in the imported signatures.

Of course, if a COM component intends to communicate with either multi-dimensional SAFEARRAYs or SAFEARRAYs with non-zero lower bounds, the default TLBIMP.EXE transformation would not be acceptable. .NET clients would be prevented from passing anything other than a 1-D zero-lower-bound array by the compiler, and COM clients would be prevented from passing anything other than a 1-D zero-lower-bound array by an exception thrown by the Interop Marshaler.

The /sysarray option in TLBIMP.EXE is a crude switch that affects how every array in a type library is presented to the .NET world. If you desire more fine-grained control, or want certain arrays to be imported as multi-dimensional arrays with a fixed rank, see Chapter 7.

Caution

Although version 1.0 of the Interop Marshaler does not support marshaling VARIANTs containing structures (UDTs), marshaling SAFEARRAYs of structures is supported when passed as a parameter. There’s an important limitation to this support, however: Any structure used as SAFEARRAY elements must be described in a registered type library, and must be marked with a GUID in the original type library! All UDTs defined in Visual Basic 6 are marked with GUIDs, but several COM components do not mark their structs with GUIDs.

A quick way to see whether a struct can be marshaled in a SAFEARRAY is to call Marshal.GetITypeInfoForType, passing the type of the imported value type inside an Interop Assembly. If this succeeds, then it can be marshaled inside a SAFEARRAY successfully. If not, then one workaround is to modify the type library containing the structure definition using OLEVIEW.EXE to get an IDL representation, and using MIDL.EXE to compile an updated IDL file to a new type library.

Fixed-Length Arrays

Fixed-length arrays (or more simply, fixed arrays) don’t use any of the aforementioned IDL attributes that are typically used with C-style arrays. Instead, a constant array capacity is simply specified in its declaration (shown here in IDL):

HRESULT 1DFixedArrayParameter([in] double arr[10]);

HRESULT 2DFixedArrayParameter([in] double arr[2][5]);

Method signatures with fixed-length arrays cannot be authored in Visual Basic 6, but can be called correctly from a Visual Basic 6 program. The type library importer preserves fixed-length arrays in metadata by using a variation of MarshalAsAttribute with an additional SizeConst property set to the constant value. For parameters, the importer marks fixed arrays with UnmanagedType.LPArray (pointer to an array) and for fields of structures, the importer marks fixed arrays with UnmanagedType.ByValArray (all the elements are embedded in the structure).

The .NET array that corresponds to a fixed array is a flattened, one-dimensional version of the original array in row-major order. Unlike the case for SAFEARRAYs, this flattening transformation is done to accommodate an Interop Marshaler limitation rather than a limitation in type library expressiveness. The Interop Marshaler simply doesn’t support fixed-length arrays with more than one dimension in Version 1.0. Listing 4.6 demonstrates the transformation of fixed-length arrays for both parameters and fields.

Listing 4.6. The COM View and the .NET View of the Same Fixed-Length Arrays

Image

Image

Varying Arrays

Varying arrays look similar to fixed-length arrays, but enable you to pass only a contiguous slice of the array. This is sometimes a handy optimization in COM, especially when calls must be marshaled across process or computer boundaries. The size and location of the slice is specified at run time with separate parameters. These parameters contain the number of elements in the transmitted slice and the index of the first element. The relationship between the array parameter and these special parameters is indicated with the IDL length_is and first_is attributes, demonstrated in the following IDL signatures:

HRESULT VaryingArrayParameter1(
  [in, length_is(length)] double arr[256], [in] long length);

HRESULT VaryingArrayParameter2(
  [in, length_is(length), first_is(start)] double arr[256],
  [in] long length, [in] long start);

HRESULT VaryingArrayParameter3(
  [in, length_is(10), first_is(5)] double arr[256]);

The expression inside length_is indicates the number of elements in the array slice, and the expression inside first_is indicates the index of the first element in the slice. If first_is is not specified, the first element of the slice is the first element of the array (index 0). The expressions inside the length_is and first_is attribute statements could be constants, parameters, or mathematical expressions. As with fixed-length arrays, however, the array’s capacity must be specified at compile time.

Hopefully the discussion of all these IDL attributes was enlightening, but the bad news is that none of these attributes are present in a type library. Type libraries weren’t designed to hold this information. If you create a type library from an IDL file containing varying arrays, they end up looking just like plain old fixed-length arrays. This can be verified by opening the type library using OLEVIEW.EXE, where the previous signatures would look like the following:

HRESULT VaryingArrayParameter1([in] double arr[256], [in] long length);

HRESULT VaryingArrayParameter2(
  [in] double arr[256], [in] long length, [in] long start);

HRESULT VaryingArrayParameter3([in] double arr[256]);

Because the type library importer uses a type library as input, and not an IDL file, it has no idea that these arrays were meant to be associated with special semantics. The length parameter in this example is a good hint for a human observer, but it’s meaningless to the importer. Thus, such parameters are imported as regular fixed-length array parameters representing the entire array.

Conformant Arrays

The conformance of an array is its capacity. A conformant array is therefore an array with a dynamic capacity. This is in contrast to a varying array, which is a dynamic slice of a fixed-capacity array. Conformant arrays can have any capacity specified at run time in a separate size parameter. The relationship between the array parameter and size parameter is indicated by the IDL size_is attribute. (The expression inside size_is contains the number of elements in the array.)

As with the other IDL array-related attributes, size_is cannot appear in a type library. The result of this is that the type library importer sees such a parameter as a pointer to a single instance, not an array of any kind. The conversion for conformant arrays is demonstrated in Listing 4.7. The ConformantArray2D IDL signature uses size_is to mark the capacity of each dimension in the two-dimensional array. There are many permutations of size_is with multidimensional arrays, but the example shown is a common usage.

Listing 4.7. The .NET View of Conformant Arrays is Quite Different from the COM View Due to Limitations in the Expressiveness of Type Libraries

Image

The imported ConformantArray1D method is only usable if you plan to always pass one element. Obviously, this almost never suffices. The workaround is to change the method signature manually, as demonstrated in Chapter 7. You might be wondering what happened to the array parameter of ConformantArray2D. It is covered in the “Methods” section of this chapter, but the quick answer is that multiple levels of indirection cause it to be transformed as a raw pointer (System.IntPtr).

Conformant Varying Arrays

As the name suggests, conformant varying arrays combine the properties of varying arrays with the properties of conformant arrays, enabling you to pass a dynamically sized slice of a dynamic-capacity array! In other words, you can use size_is (or max_is), length_is (or last_is), and first_is all at the same time. In COM documentation, conformant varying arrays are often called open arrays. Conformant varying arrays often look like the following in IDL:

HRESULT ConformantVaryingArrayParameter(
  [in, size_is(capacity), length_is(filled)] double *arr,
  [in] long capacity, [in] long filled);

Although all the array examples have been marked [in] only, conformant varying arrays can be useful when the array and length_is parameter are both marked [in, out]. This enables a caller to allocate an array and communicate to the method how large it is. Then, the method can return to the caller how many elements of the array it filled in with data.

This should no longer be surprising to you, but because none of the relevant IDL attributes appear in a type library, the type library importer sees conformant varying arrays as simple pointers (just as it does for any conformant array). Chapter 7 demonstrates how to make conformant varying arrays look like conformant arrays. Unfortunately, the “varying” functionality is not supported by the Interop Marshaler, so the entire array must always be passed.

Converting Methods and Properties

COM interfaces have two types of members: methods and properties. These members contain parameters and return types that are transformed into managed data types (according to the list presented in Table 4.1), but the members themselves also go through some transformations.

Methods

A managed signature produced for a type library’s method looks very similar to the same method’s signature in Visual Basic 6. Much of this is due to the conversion that hides each method’s HRESULT. In this section we’re going to look at the handful of transformations done for a method. There are four areas of focus:

• Hiding the HRESULT

• By-Value versus By-Reference

[in] versus [out] versus [in, out]

• Parameter Arrays

An important thing to notice is that in COM methods, a variety of oddities that aren’t usable from Visual Basic 6 can now be used in Visual Basic .NET (just as they can be in any .NET language).

Hiding the HRESULT

COM methods typically return an HRESULT and can have a special [out, retval] parameter that represents the real return value. As in VB6, an HRESULT return value is hidden and an [out, retval] parameter becomes the return value (if it exists) in a .NET signature. This is demonstrated in Listing 4.8.

Listing 4.8. The COM View and the .NET View of the Same Simple Methods—One with [out, retval] and One  Without

Image

Because the HRESULT (used to communicate failure) is removed from the signature, exceptions are thrown by the RCW wrapping the COM object to indicate failure.

Although it’s rare, sometimes COM methods don’t return HRESULTs. For such methods, the type library importer doesn’t do any kind of return value transformation. Thus, the signature is preserved in managed code and marked with a special pseudo-custom attribute: PreserveSigAttribute. Such methods can’t be defined in Visual Basic 6, but an example defined in IDL is shown in Listing 4.9.

Listing 4.9. The COM View and the .NET View of a Method That Doesn’t Return an HRESULT

Image

By-Value Versus By-Reference

In Visual Basic 6, parameters can be marked by-reference (ByRef) or by-value (ByVal). This designation is preserved in the imported .NET signature. In IDL terms, parameters with an extra level of indirection (an extra *) become by-reference parameters; otherwise they’re passed by value. This is shown in Listing 4.10.

Listing 4.10. The COM View and the .NET View of the Same Methods with By-Value and By-Reference Parameters

Image

Tip

Parameters of interface types are always interface pointers, so they always have at least one level of indirection. Thus, for parameters of interface types, * means ByVal and ** means ByRef.

Parameters with more than one extra level of indirection (***, or more, for interface types and **, or more, for other types) cannot be converted to their corresponding managed types because .NET doesn’t have the notion of a reference to a by-reference parameter. Therefore, any such types are converted to System.IntPtr types containing the raw pointer values. Chapter 6 demonstrates what useful tasks you can accomplish with IntPtr types in managed code. Visual Basic 6 can’t define such parameters, but Listing 4.11 has an example originating in IDL.

Listing 4.11. The COM View and the .NET View of Parameters With More Than One Extra Level of Indirection

Image

TLBIMP.EXE emits a warning such as the following whenever encountering a type that it converts to System.IntPtr:

TlbImp warning: At least one of the arguments for '...' can not be marshaled
by the runtime marshaler.  Such arguments will therefore be passed as a pointer
and may require unsafe code to manipulate.

Furthermore, the importer sometimes marks such members with the ComConversionLossAttribute custom attribute. This custom attribute is for informational purposes only, and is meant to indicate imported entities whose description loses fidelity compared to the information in the original type library.

These warnings and custom attributes can be safely ignored. Although they may shake your confidence in the ability of COM Interoperability to handle the COM component, using IntPtr parameters in managed code is not very difficult. One misleading aspect of these warnings is that TLBIMP.EXE even emits them when converting void* to IntPtr, which is the best thing it could possibly emit (that is also CLS compliant)!

[in] Versus [out] Versus [in, out]

In IDL syntax, a method’s parameters are almost always marked with an attribute that indicates the direction of data flow: [in], [out], or [in, out]. (The Microsoft IDL compiler, MIDL, treats parameters with no directional attribute as [in].) These attributes have the following meaning:

[in]—The data is marshaled from caller to callee.

[out]—The data is marshaled from callee to caller.

[in, out]—The data is marshaled in both directions.

Although a parameter with an extra level of indirection is often marked [in, out] in IDL, it could also be marked just [in] or just [out]. Parameters without a level of indirection cannot be marked as [out] or [in, out] because the callee would have no means of allocating its memory. These attributes don’t affect whether a parameter is by-reference or by-value in the corresponding managed signature, so the IDL signatures in Listing 4.12 produce almost the same results as the Strings method in Listing 4.10.

Caution

The distinction between in-out-ness of a parameter and by-ref-ness of a parameter is often a source of confusion, especially for a common usage of in-only VARIANT pointers. This was seen in the preceding chapter, in which parameters of the CheckSpelling method in the Microsoft Word type library became by-reference Objects. However, using [in] VARIANT* instead of [in] VARIANT in COM is typically done as a performance optimization, not as an indication of by-reference behavior (otherwise it would be marked [in, out]).

Another reason that the VARIANT/Object transformation is special is that VARIANT is a structure (a value type), but System.Object is a reference type. VARIANT is the only type for which the type library importer performs this value/reference transformation.

Listing 4.12. The COM View and the .NET View of Methods with In-Only and Out-Only By-Reference Parameters

Image

Notice the difference in the C# signatures from Listing 4.10. C# treats out-only parameters separately from regular by-reference parameters with its out keyword. With out, C# enforces the fact that you can’t initialize the variable passed to the method before calling it. The signature with the in-only, by-reference parameter doesn’t visibly affect the C# signature, and neither permutation visibly affects the Visual Basic .NET signature.

Parameter Arrays

A parameter array is a special type of array parameter. Such an array parameter doesn’t look like an array to the caller of a method that uses one. Instead, the method appears to take an arbitrary number of arguments that don’t need to be stuffed into an array. Visual Basic’s ParamArray keyword enables this special calling syntax. It looks like the following in Visual Basic 6:

' Method with a ParamArray
Public Sub ParamArrayParameter(ParamArray arr() As Variant)
  ...
End Sub

Public Sub Main
  ' Invoking the method with any number of arguments
  ParamArrayParameter 1
  ParamArrayParameter 1, 2, 3
  ParamArrayParameter "a", "b", 3, "d", 7.2
End Sub

From the callee’s point of view, the caller has passed an array; and from the caller’s point of view, the method accepts any number of arguments. In the first call of the preceding example, the method sees an array with one element containing the number 1. In the last case, the method sees an array with five elements containing strings and numbers. A parameter array must be a one-dimensional array and must be the last parameter listed.

In IDL, parameter arrays are indicated with a [vararg] attribute, which stands for “variable number of arguments.” Parameter arrays also exist in the .NET Framework, so the type library importer is able to preserve this feature in Interop Assemblies. The importer accomplishes this by placing a System.ParamArrayAttribute custom attribute on the parameter in question. .NET compilers can then choose to look for this attribute to interpret the array as a parameter array. Parameter arrays are not in the CLS, so any .NET language that doesn’t support them simply sees them as a regular array parameter. Parameter arrays are marked with ParamArray in VB .NET (as in Visual Basic 6) and with params in C#. The parameter array transformation is shown in Listing 4.13.

Caution

Parameter arrays in Visual Basic 6 methods are not useful as parameter arrays in .NET because Visual Basic 6 enforces that arrays marked with ParamArray are passed by-reference. VB .NET and C#, on the other hand, only recognize an array as a parameter array if it’s passed by value! Still, the importer marks all parameter arrays with the ParamArrayAttribute in case any .NET language comes along that supports by-reference parameter arrays.

Listing 4.13. The COM View and the .NET View of a Method with a Parameter Array

Image

Properties

Properties in COM consist of one, two, or three methods grouped together. In IDL, a property with all three methods looks like this:

[id(0x68030000), propget]
HRESULT Data([out, retval] VARIANT* pRetVal);
[id(0x68030000), propput]
HRESULT Data([in] VARIANT var);
[id(0x68030000), propputref]
HRESULT Data([in] VARIANT var);

What distinguishes these methods from regular methods are the propget, propput, and propputref markings. In unmanaged C++ or Visual Basic 6, implementing a property is simply a matter of implementing these methods, which are commonly referred to as accessor methods. In Visual Basic 6, the Data property shown previously might be implemented as follows:

Private varData As Variant

' The "getter" is the [propget] method
Public Property Get Data() As Variant
  If IsObject(varData) Then
    Set Data = varData
  Else
    Let Data = varData
  End If
End Property

' The "letter" is the [propput] method
Public Property Let Data(ByVal v As Variant)
  Let varData = v
End Property

' The "setter" is the [propputref] method
Public Property Set Data(ByVal v As Variant)
  Set varData = v
End Property

So, what is the purpose of properties? Although implementing properties feels just like implementing methods, properties enable clients to use a simpler syntax than method calls, which provides a nice abstraction. In Visual Basic 6, these methods can’t be called directly. Instead, the propget method is called when the client gets the value, as follows:

d = chart.Data

The propput is called when the client puts (or sets) the value, as follows:

chart.Data = 5

or

Let chart.Data = 5

The propputref is called when the client puts an object reference, as follows:

Set chart.Data = MyObject

A COM property can implement any subset of these three methods, and usually doesn’t implement all three.

A .NET property typically has one or two accessor methods: a getter and a setter, used like propget and propput, respectively. Although C#, Visual Basic .NET, and managed C++ code doesn’t support the creation of properties with more methods, .NET metadata supports an arbitrary number of accessor methods (known as other accessors). Whereas a COM property is just a group of methods with extra attributes, metadata has a notion of a property as a separate element distinct from the accessor methods. A property lists its accessor methods, seen here for the Data property in raw IL Assembler syntax:

.property object Data()
{
  .get instance object TypeName::get_Data()
  .set instance void TypeName::set_Data(object)
  .other instance void TypeName::let_Data(object)
}

This notation means that the property’s get accessor is implemented by a method called get_Data, the property’s set accessor is implemented by a method called set_Data, and it has an other accessor with the name let_Data.

The type library importer performs the following steps to transform a COM property to a .NET property:

1. A .NET property is created with the same name as the COM property.

2. A propget method, if it exists, is converted to a get accessor method with the name get_PropertyName.

3. A propputref method, if it exists, is converted to a set accessor method with the name set_PropertyName.

4. A propput method, if it exists, is converted to a set accessor method with the name set_PropertyName, as long as the property doesn’t also have a propputref method.

5. If a property has both propput and propputref methods, the propput becomes an other accessor with the name let_PropertyName.

No current .NET languages have special syntax to call additional accessors, so clients can just call the let_PropertyName accessor explicitly as a regular method.

Caution

A common mistake is made when porting code that calls a let accessor from Visual Basic 6 to Visual Basic .NET. The source of the problem is that the typical VB6 syntax for invoking a let accessor is identical to the VB .NET syntax for invoking a set accessor! Fortunately, the problem only arises in the rare cases when properties implement all three accessors: Get, Set, and Let. To illustrate, the following VB6 code uses the ADO (ActiveX Data Objects) Recordset.ActiveConnection property:

var = recset.ActiveConnection      ' Get recset.ActiveConnection = "..."    ' LetSet recset.ActiveConnection = conn ' Set

This translates into the following Visual Basic .NET code:

' Get: the same familiar syntax. var = recset.ActiveConnection' Set: the same as the old Let syntax!recset.ActiveConnection = "..."' Let: much different syntax than before!recset.let_ActiveConnection(conn)

If you accidentally call a set accessor when you mean to call a let accessor, unexpected behavior can occur at run time. The error usually manifests as an exception, but it depends on the property’s implementation.

Special DISPIDs

A DISPID, short for dispatch identifier, is a number assigned to methods and properties of an interface derived from IDispatch, which is used to identify them during late binding. DISPIDs are listed in IDL using the id attribute, which we saw in the IDL snippet from the last section:

[id(0x68030000), propget]
HRESULT Data([out, retval] VARIANT* pRetVal);
[id(0x68030000), propput]
HRESULT Data([in] VARIANT var);
[id(0x68030000), propputref]
HRESULT Data([in] VARIANT var);

Notice that property accessors must share the same DISPID value, but each method and group of property accessors must have distinct numbers. DISPIDs on members in a type library are preserved in metadata with the DispIdAttribute, so the CLR can use them when late binding to a COM object. In addition, two special DISPID values that have significance in the world of COM cause the type library importer to make additional transformations. These values are 0 and –4, and are commonly called DISPID_VALUE and DISPID_NEWENUM, respectively.

DISPID_VALUE (0)

A member with a DISPID equal to 0 is considered a default member. This is the mechanism that enables Visual Basic 6 to support default properties. For instance, in the case of an object obj with a default property Text, the two lines of VB6 code are equivalent:

obj.Text = "My Text"

obj = "My Text"

By checking for a DISPID equal to 0, the importer can transform default members in COM to default members in .NET. The .NET way of indicating a default member is to place a System.Reflection.DefaultMemberAttribute on the class or interface containing the member, so that’s what the importer does. (Rather than an attribute on the member itself, this attribute contains the name of the default member in a string property.) .NET languages look for this attribute to determine which member, if any, can be treated specially as the default.

Because the syntax for using default properties can be confusing (as in the last example, where it looks as if obj itself is being set to My Text), Visual Basic .NET and C# only support special syntax for default properties that have one or more parameters. These are known as parameterized properties, and C# calls them indexers. Thus, a default property Text that has an integer parameter can be accessed as follows:

Visual Basic .NET:

obj.Text(2) = "My Text"

obj(2) = "My Text"

C#:

obj[2] = "My Text";

C# won’t even permit you to call an indexer by name. It forces you to use the shorter syntax, which can cause some confusion if you don’t realize that the COM property you’re trying to call is a default member. .NET languages are free to ignore default members and treat them just like regular members, which is what C# and VB .NET do if the default member is either a property with no parameters, or a method.

DISPID_NEWENUM (–4)

A member with a DISPID equal to –4 is found on a collection interface. This special member, often called NewEnum, returns an interface that enables clients to enumerate objects in a collection (for example, the For Each statement in Visual Basic 6). When you add a collection class to a VB6 project, it contains a hidden interface with the DISPID_NEWENUM member, as follows:

[id(-4), propget, hidden]
HRESULT NewEnum([out, retval] IUnknown** pRetVal);

IUnknown is used so that a variety of enumeration interfaces can be accommodated. However, earlier in this chapter we saw that the most commonly used COM enumeration interface is called IEnumVARIANT. Thus, the following signature is also valid for DISPID_NEWENUM:

HRESULT NewEnum([out, retval] IEnumVARIANT** pRetVal);

Whereas VB6 looks for a member with a DISPID of –4 to enable For Each, VB .NET and C# look for a method called GetEnumerator. Thus, the conversion of IEnumVARIANT types to IEnumerator isn’t enough to make foreach/For Each work on a COM object in .NET. This is because the .NET interface would have a member with the original name (such as NewEnum) rather than the required GetEnumerator method. Therefore, the importer transforms appropriate members with a DISPID of –4 to methods called GetEnumerator. An appropriate member is either a method or a propget accessor method that returns an HRESULT and has a single IUnknown** or IEnumVARIANT** [out, retval] parameter. The GetEnumerator method produced looks like the following:

C#:

public IEnumerator GetEnumerator();

Visual Basic .NET:

Public Function GetEnumerator() As IEnumerator

(It also marks the IEnumerator types with MarshalAsAttribute to indicate custom marshaling to an IEnumVARIANT interface, but this isn’t required because IEnumerator always marshals to IEnumVARIANT.) These transformations enable .NET clients to naturally enumerate over COM objects in a natural way, just as the author had intended for COM clients.

The type library importer does one more thing: It indicates that the managed class containing GetEnumerator implements System.Collections.IEnumerable, an interface whose only method is GetEnumerator. This is done because, although VB .NET and C# check for enumeration capability by looking for the GetEnumerator method name, the official way for a class to expose an enumerator (according to the CLS) is to implement IEnumerable. Thus, by marking the class as implementing this interface, the type library importer ensures that any CLS-compliant .NET language can still enumerate over COM collections in a natural way.

Converting Interfaces

When you understand the transformations done for data types, methods, and properties, there’s not much more to know about an interface. Each COM interface becomes a .NET interface containing the transformed methods. The imported interface is marked with several custom attributes, depending on characteristics of the interface, but the two required custom attributes are GuidAttribute and ComImportAttribute. GuidAttribute contains the interface’s IID (interface identifier) listed in the type library, and ComImportAttribute indicates that the interface was originally defined as a COM interface that’s been imported into metadata.

One subtle aspect of imported interfaces involves inheritance. Both COM and .NET have interface inheritance, and the inheritance relationships are preserved in metadata—excluding IUnknown and IDispatch. Any COM interface that directly derives from IUnknown or IDispatch becomes a .NET interface that doesn’t derive from any interface. Instead, information about whether the original COM interface extends from IUnknown or IDispatch is captured in a custom attribute (InterfaceTypeAttribute). A COM interface that directly derives from another interface (besides IUnknown and IDispatch) becomes a .NET interface that continues to derive from that other interface. This is pictured in Figure 4.2.

Figure 4.2. Interface inheritance hierarchies are preserved in .NET metadata up to, but not including, the IDispatch and/or IUnknown interfaces.

Image

Listing 4.14 demonstrates interface inheritance and its transformation to .NET interfaces, using two famous COM interfaces: IProvideClassInfo and IProvideClassInfo2.

Listing 4.14. Two COM Interfaces and Their Transformation to .NET Interfaces

Image

Image

The surprising part about the importer’s transformation is that any base interface methods (GetClassInfo in Listing 4.14) are duplicated on the .NET definition of the derived interface. The type library importer always adds methods from all base interfaces (except IUnknown and IDispatch) to a derived interface in order to preserve the COM interface’s v-table layout. In this case, when a .NET client calls the GetGUID method on the IProvideClassInfo2 interface, it is simply calling slot 5 on the interface v-table of the COM object (the second method of the interface plus the three IUnknown methods). The InterfaceTypeAttribute shown in the listing tells the CLR what the first methods of the v-table are (for example, just three slots for IUnknown, or seven slots for IUnknown plus IDispatch). However, all other methods in the v-table must be present in the interface definition.

This is a subtle difference for .NET clients and doesn’t usually affect the programmer’s use of the interface. For example, whether or not IProvideClassInfo2 directly defines a method called GetClassInfo, managed code could still call it on an IProvideClassInfo2 variable due to the inheritance relationship.

It’s rare, but depending on the layout of the COM interface, the .NET definition of an interface might contain extra funny-looking methods, with names beginning with _VtblGap. These are used to fill what are known as v-table gaps—extra spaces between methods in the v-table. You can simply ignore these methods.

Converting Classes

The transformations done by the type library importer for COM classes are the source of the most confusion for users of Interop Assemblies. Most of the transformations were not done in beta versions of the CLR, but were added due to customer feedback that using COM components in .NET languages was not as easy as it was in Visual Basic 6. Now, the behavior of the type library importer makes the straightforward use of COM components very simple in C# or Visual Basic .NET. However, if you start to dig into the metadata produced by the type library importer, things can get confusing really quickly.

Classes in COM (called coclasses in IDL and class modules in VB6) do not have members that can be called directly. They just have a list of interfaces implemented, which is shown here in IDL for a class in the Microsoft Speech API:

[
  uuid(5FB7EF7D-DFF4-468A-B6B7-2FCBD188F994),
  helpstring("SpMemoryStream Class")
]
coclass SpMemoryStream {
  [default] interface ISpeechMemoryStream;
  interface ISpStream;
};

An instance of a class is obtained from a separate class, known as a class factory. After a client has an instance of the class, a client’s only communication is through interfaces that the object implements. Of course, Visual Basic 6 hides all these details from the programmer, so COM objects can be created with New and it appears as if members of the default interface (ISpeechMemoryStream in this example) can be invoked directly on class types.

To provide a similar abstraction to what Visual Basic 6 provides, the type library importer performs the following steps when encountering a coclass:

1. Create a .NET interface with the same name as the coclass. This interface, known as a coclass interface, has no members itself but derives from the coclass’s default interface. This interface is marked with the same IID as the default interface.

2. Create a .NET class with the same name as the coclass plus a Class suffix. This class represents the RCW, and is described in “The RCW Class.”

3. If the coclass’s default interface is defined in the same type library as the coclass, and if it’s not listed as being implemented by any other coclass in the type library, replace any parameters and fields of the default interface type with the coclass interface type created in step 1.

The importer does additional work when a coclass lists source interfaces (used in event handling) but this is discussed in Chapter 5, “Responding to COM Events.”

Coclass Interfaces and Parameter/Field Replacement

If it weren’t for additional support from the C# and VB .NET compilers, the previously-listed transformations would mean that whenever you wanted to instantiate a coclass called XYZ, you’d need to instantiate a .NET class called XYZClass! And this is exactly what happens in Visual C++ .NET, seen in Chapter 3’s Hello, World example.

To avoid the confusion of dealing with renamed classes, C# and VB .NET enable you to write code that appears to instantiate a coclass interface! That’s right, you can do the following in C#, for example, even though the .NET SpMemoryStream type is really an interface:

SpMemoryStream ms = new SpMemoryStream();

During compilation, the C# compiler pretends that you instead wrote the following line of code:

SpMemoryStream ms = new SpMemoryStreamClass();

In order to make this work, the type library importer marks all coclass interfaces with the CoClassAttribute custom attribute from the System.Runtime.InteropServices namespace. Therefore, the imported SpMemoryStream interface from the preceding example looks like the following in C# syntax:

[
  ComImport,
  Guid("EEB14B68-808B-4ABE-A5EA-B51DA7588008"),
  CoClass(typeof(SpMemoryStreamClass))
]
public interface SpMemoryStream : ISpeechMemoryStream
{
}

The C# and VB .NET compilers accept code that appears to instantiate an interface as long as the interface is marked with CoClassAttribute. The compilers simply swap the interface with the type stored in the interface’s CoClassAttribute. You can call members of the default interface directly on the instantiated type (since the coclass interface derives from the default interface), or you can cast to non-default interfaces in order to call their members.

So what do all these strange importer transformations and compiler magic buy us? By effectively duplicating a coclass’s default interface with the name of the class, .NET clients are lead to believe that they are invoking members on the class directly, as in Visual Basic 6. COM classes authored in Visual Basic 6 have hidden default interfaces beginning with an underscore that many users aren’t even aware of. Therefore, these transformations shield users from dealing with these previously hidden interfaces in C# or VB .NET code.

An equally important part of this transformation is step 3, in which default interface parameters and fields are replaced with coclass interfaces to present .NET clients with effectively the same interface but with a nicer name. Because of the confusion this replacement can cause when it’s not desired, this is only done if there’s a high degree of certainty that the default interface is deeply tied to the class that implements it. Therefore, if the default interface is not in the same type library as the coclass implementing it, or if it’s listed as being implemented by more than one coclass in the same type library, any parameters and fields of the default interface type are left as the raw interface type in .NET signatures.

To understand the benefit of this parameter/field replacement, consider ActiveX Data Objects (ADO) and its widely-used Recordset coclass. The result of step 3 means that a VB6 COM object that returns a Recordset instance still appears to return a Recordset coclass interface in managed code. If you opened the COM component’s type library in OLEVIEW.EXE, however, you’d see that it really returns a _Recordset interface pointer—the hidden default interface for the Recordset coclass. Interacting with this previously-hidden interface would be confusing for programmers used to Visual Basic 6.

Another nice side effect of these transformations is that programmers often have the urge to cast a returned COM object to its class type, but this cannot succeed in general. RCWs can only reliably be cast to interfaces that they implement. When a .NET programmer casts an RCW to a coclass interface, however, he may think he’s casting to a class type but in fact he’s simply casting to the object’s default interface, which is the desired behavior!

Finally, these transformations make event handling on returned COM objects work seamlessly. This mechanism is described in the following chapter.

Caution

Whenever you need to interact with an RCW’s class in any way other than using the new operator in C# or VB .NET, you must remember to include the Class suffix! Nothing else recognizes the CoClassAttribute custom attribute. Besides other .NET languages, this includes language-neutral mechanisms such as reflection. Anywhere you need to provide a string with a class’s name, as in calling Type.GetType or when using ASP.NET, omitting the Class suffix means that the type you specify will be treated as a regular .NET interface.

The RCW Class

The .NET classes generated by the type library importer accurately represent the RCWs fabricated by the CLR. These are rarely used directly in C# or Visual Basic .NET programs, but it can be helpful to understand their characteristics. The type library importer does three things when creating these classes:

• Adds a constructor

• Adds members of all implemented interfaces

• Adds events for all source interfaces

Besides these transformations, the .NET class might have a System.Reflection. DefaultMemberAttribute custom attribute or implement System.Collections.IEnumerable depending on whether it has methods with special DISPIDs, as described earlier in this chapter. These two actions happen based only on members of the default interface. Similarly, DispIdAttributes are only added to members of the class that belong to the default interface. To take advantage of default members or enumerators on non-default interfaces, clients must cast the class type to the specific interface type.

Adding a Constructor

The .NET class is given a public default constructor (that is, a constructor with no parameters), unless the coclass is marked as [noncreatable] in the type library. The constructor enables the creation of the COM object by using the built-in syntax of the managed language, and ends up being called when a C# or VB .NET programmer writes code that appears to instantiate a coclass interface. The constructor call results in a CoCreateInstance call at run time.

If a class is marked [noncreatable], it cannot be instantiated. Instead, an internal (Family in VB .NET) default constructor is generated. This means that the constructor could only be called within the Interop Assembly, making it off-limits to any code. Instances of classes that cannot be created must obtained by calling APIs that return them.

Adding Members of Implemented Interfaces

The importer creates the appropriate metadata that enables all .NET languages to invoke members of all implemented interfaces (not just the default interface) directly on the class. Of course, this excludes any interfaces that the coclass doesn’t list as being implemented in the original type library. This way, using RCWs for coclasses that implement multiple interfaces feels no different from using any other .NET class.

This action of adding multiple interface members to a single class isn’t perfect due to conflicts that might arise in member names. If a class implements multiple interfaces that have a member with the same name and same parameters, it renames members on the class. If members with conflicting names had different parameters, then they are simply treated as overloaded methods and their names are left alone.

Name conflicts are resolved by prefixing member names with the interface name and an underscore. This is done to all duplicated names, except for one “winning” interface’s member that gets to keep its original name. Members on the default interface always win, followed by the order in which the interfaces appear in the type library’s coclass statement. Listing 4.15 illustrates this process.

Listing 4.15. A COM Class That Implements Several Interfaces, and the .NET Class Produced by the Type Library Importer

Image

Image

The C# representation is approximate, as the exact metadata produced by the importer for a class can’t be produced in C# source code. The members on RobotClass in Listing 4.15 are necessary to distinguish, for example, which Play action you want the robot to perform (playing like a baby, a video recorder, or a musician). Notice that the Play method corresponding to the IVideoRecorder interface is not renamed because it has a signature that’s distinct from the other two Play methods.

The name changing process should look familiar to Visual Basic 6 programmers because prefixing member names with InterfaceName_ is required when implementing an interface member. As all these name conversions only occur on a class, you can still cast an RCW class to an interface to call any of the methods with their original names.

The “Converting Interfaces” section explained that the importer adds base interface members to derived interfaces. This normally doesn’t matter to .NET clients, except that it has an unfortunate side effect whenever a coclass implements both a base and a derived interface. This is somewhat common, and can be seen in the WebBrowser coclass from the Microsoft Internet Controls type library (SHDOCVW.DLL in your system32 directory):

[
  uuid(8856F961-340A-11D0-A96B-00C04FD705A2),
  helpstring("WebBrowser Control"),
  control
]
coclass WebBrowser {
    [default] interface IWebBrowser2;
    interface IWebBrowser;
    [default, source] dispinterface DWebBrowserEvents2;
    [source] dispinterface DWebBrowserEvents;
};

The WebBrowser coclass implements IWebBrowser and IWebBrowser2, an interface that ultimately derives from IWebBrowser, as shown in the following IDL:

[...]
interface IWebBrowser : IDispatch {
    [id(0x00000064),
      helpstring("Navigates to the previous item in the history list.")]
    HRESULT GoBack();
    [id(0x00000065),
      helpstring("Navigates to the next item in the history list.")]
    HRESULT GoForward();
    ...plus many more members
};


[...]
interface IWebBrowserApp : IWebBrowser {
    ...
};

[...]
interface IWebBrowser2 : IWebBrowserApp {

    ...
};

This pattern of implementing a base and derived interface is typically done so that older clients (who only know about the original IWebBrowser interface) can still query for this interface, and newer clients (who are aware of IWebBrowser2) can take advantage of its extra functionality.

The imported WebBrowserClass class contains the methods of each of the three imported interfaces that it implements. Because the managed definition IWebBrowser2 contains all IWebBrowser’s methods (GoBack, GoForward, and so forth), the class ends up with two copies of each of these methods! Each method of IWebBrowser gets a modified name (IWebBrowser_GoBack, IWebBrowser_GoForward, and so forth) because each method of the default IWebBrowser2 interface is added to WebBrowserClass first. WebBrowserClass doesn’t end up with any duplicated members belonging to IWebBrowserApp because the coclass doesn’t list it as an implemented interface. All these extra members may be annoying when using object browsers or IntelliSense, but they can be ignored.

Adding Events for Source Interfaces

The type library importer does its most complicated transformation when it encounters source interfaces listed in a coclass statement. Source interfaces are not implemented by the class, but can be invoked by the class to report events. A source interface is marked [source] in IDL, as seen in the WebBrowser coclass presented earlier:

[
  uuid(8856F961-340A-11D0-A96B-00C04FD705A2),
  helpstring("WebBrowser Control"),
  control
]
coclass WebBrowser {
    [default] interface IWebBrowser2;
    interface IWebBrowser;
    [default, source] dispinterface DWebBrowserEvents2;
    [source] dispinterface DWebBrowserEvents;
};

The transformation for source interfaces is covered in Chapter 5.

Converting Modules

Type libraries can have modules that contain global methods and/or constants. These are equivalent to what Visual Basic 6 calls standard modules in referenced COM components.

Because .NET doesn’t intrinsically have a notion of modules, the type library importer turns modules into sealed .NET classes. (In VB .NET terms, this is equivalent to a NotInheritable class.) For each type library module, a .NET class is created that contains the constants as public static fields. Unfortunately, methods in type library modules are omitted in the .NET classes that are produced. Listing 4.16 demonstrates the conversion for two modules, which are defined in the DirectX 8 for Visual Basic type library.

Tip

A module’s methods can be added to the imported .NET class manually using PInvoke. For more information, see Chapter 7 and Part VI, “Platform Invocation Services.”

Listing 4.16. IDL Representation of Two Modules, and Their Transformation Shown in C#

Image

Image

Notice that the D3DCOLORAUX module in Listing 4.16 isn’t even imported, because it only has methods.

Converting Structures

A struct in a type library, known as a user-defined type (UDT) in Visual Basic 6, becomes a value type in metadata. Value types are known as structs in C# and structures in VB .NET. Most of the .NET Framework’s primitive types are value types (such as all the integral types: System.Int..., System.UInt..., System.Decimal, System.DateTime, System.Double, and so on). Besides the value types, two reference types that are often considered primitive types are System.Object and System.String.

Value types are used to efficiently represent small instances whose contents represent their identity. (For example, two integer objects containing the value 5 should be treated as identical objects.) In .NET, the differences between value types and any other types (known as reference types) are:

• They are allocated on the stack rather than a garbage-collected heap.

• They can never be null.

• They are always sealed.

• The contents of a value type parameter cannot be changed unless passed by reference.

Value types are more expressive than COM structs because they can contain methods. Thus, every value type created by the type library importer is relatively simple. Each field in the unmanaged struct is transformed to a public field in the managed value type, as shown in Listing 4.17.

Listing 4.17. A Simple Struct Shown in Visual Basic 6 and IDL, and Then in C# and VB .NET

Image

Image

The data types making up the fields of a struct are converted to .NET data types, just as they are when appearing as method parameters. The main difference is that value type fields with an extra level of indirection are converted to a System.IntPtr. Thus, the fields lose their type identity one level of indirection sooner because there’s no such thing as a by-reference field.

Another noticeable difference is that VARIANT_BOOL fields (Boolean in VB6) are imported as 16-bit integer fields rather than the expected Boolean fields. This is done for historical reasons, because beta versions of the Interop Marshaler did not support marshaling Boolean fields. This marshaling is now supported, but you would need to change an Interop Assembly using the technique described in Chapter 7 in order to take advantage of such marshaling. If you encounter an Int16 field that represents a VARIANT_BOOL, you must be familiar with VARIANT_BOOL’s internal representation to be able to use it correctly. In other words, you must treat -1 as true and 0 as false.

Converting Unions

A union is a special kind of struct containing several fields at the same offset in memory. IDL makes a distinction between two types of unions: encapsulated and nonencapsulated. These use IDL attributes (such as switch_type and switch_is), but none of this information, nor the distinction between the two types of unions, is captured inside a type library. In a type library, a union looks almost identical to a struct.

Because fields of a union overlap, unions are useful in COM for providing different views of the exact same data. Unions don’t exist in the pure .NET world, so unmanaged unions are imported as regular value types. However, the type library importer places a few custom attributes on the imported value type to preserve the union’s memory layout in the .NET definition. This is demonstrated in Listing 4.18 for a union defined in the Microsoft HTML Object Library (MSHTML.TLB in your system32 directory).

Listing 4.18. An Unmanaged Union Is Converted to a .NET Value Type with Custom Attributes to Indicate the Unique Memory Layout

Image

The importer uses a pair of pseudo-custom attributes, defined in System.Runtime. InteropServices, to preserve the memory layout of the union in the value type definition: StructLayoutAttribute and FieldOffsetAttribute. StructLayoutAttribute uses a value of the LayoutKind enumeration to specify what kind of memory layout the value type uses. The importer marks unions with LayoutKind.Explicit, indicating that every field of the struct is marked with a byte offset. This byte offset, specified with FieldOffsetAttribute, is the number of bytes between the beginning of the structure in memory and the beginning of the field. In Listing 4.18, you can see that each field of a union is marked with the same zero offset, meaning that the fields all point to the same location in memory. This is the only way to define a union in .NET.

Caution

If a union contains a field that is a pointer to a type, the imported value type is given no fields whatsoever. Using such value types in managed code requires advanced techniques, such as C# unsafe code, demonstrated in Chapter 6.

The importer marks all value types that don’t represent unions with StructLayoutAttribute and the LayoutKind.Sequential value, but Listing 4.17 omitted this attribute for simplicity (and because it’s the default behavior in C# and VB .NET). See Chapter 19, “Deeper Into PInvoke and Useful Examples,” for more information about structure layout.

Converting Enumerations

An enumeration (commonly called an enum, and not to be confused with enumerating over a collection) is a set of constants of the same integral type, grouped under a common name. There’s no significant difference between managed and unmanaged enums, so one is easily converted to the other, as shown in Listing 4.19.

Listing 4.19. An Unmanaged Enum and a Managed Enum Are Both a Simple Set of Constants

Image

Image

The main difference between managed and unmanaged enums is their use, not their definitions. In most .NET languages, you must qualify managed enum members with the name of the enum (for example, typing Day.Sunday instead of just Sunday in the previous listing).

Converting Typedefs

IDL can have type definitions (typedefs) that give an alternative name, or alias, to a type. The OLE Automation type library (STDOLE2.TLB in your system32 directory) contains many widely used typedefs, such as the following:

typedef [uuid(66504301-BE0F-101A-8BBB-00AA00300CAB), public]
unsigned long OLE_COLOR;

This means that OLE_COLOR is really nothing more than an unsigned long, but methods and properties can use OLE_COLOR parameters to represent colors because the alias makes the intent more obvious. For example, the Visual Basic 6 property browser presents a color chooser for properties of type OLE_COLOR, so users aren’t directly exposed to the numeric representation.

There are no typedefs in the managed world, so the importer does not create a managed definition of typedefs such as OLE_COLOR. It does, however, do something special when encountering a signature that uses a typedef, such as:

HRESULT SetColor([in] OLE_COLOR color);

which becomes the following signature in C#:

void SetColor ([ComAliasName("stdole.OLE_COLOR")] uint color);

Because there is no managed definition of OLE_COLOR, the parameter is left as the real type. Rather than losing the information that the COM object treats the number as an OLE_COLOR, however, the importer marks the parameter with a custom attribute, called ComAliasNameAttribute. It’s not quite as convenient (and not quite as obvious that it should be a color in object browsers, unless they show custom attributes), but you can still check for aliases and take action.

You might be wondering why the type library importer doesn’t convert OLE_COLOR types to System.Drawing.Color types because the conversion seems as natural as converting DATE to System.DateTime. That would be nice, but the type library importer restricts itself to converting base types to the core types defined in the mscorlib assembly. However, the type library exporter, described in detail in Chapter 9, “An In-Depth Look at Exported Type Libraries,” does the reverse conversion from System.Drawing.Color to OLE_COLOR.

Tip

If you’re faced with calling COM methods in managed code that use OLE_COLOR parameters, note that there are two sets of methods in the .NET Framework that convert between a System.Drawing.Color instance and an OLE_COLOR:

System.Drawing.ColorTranslator.ToOle and System.Drawing.ColorTranslator.FromOle.

System.Windows.Forms.AxHost.GetOleColorFromColor and System.Windows.Forms.AxHost.GetColorFromOleColor.

Both sets of methods do the same thing, although I prefer the latter methods because they represent OLE_COLOR as an unsigned integer, which matches the types produced by the type library importer. Therefore, the SetColor method shown previously could be called as follows in C#:

comObj.SetColor(AxHost.GetOleColorFromColor(Color.BurlyWood));

For typedefs of structures or enumerations, the type library importer creates two versions of the same type but with two different names. For example, the Microsoft Smart Tag SDK defines the following typedef and enum pair in its type library (shown in IDL):

typedef [public]
__MIDL___MIDL_itf_mstag_0000_0001 IF_TYPE;

typedef enum {
  IF_TYPE_CHAR = 1,
  IF_TYPE_SINGLE_WD = 2,

  IF_TYPE_REGEXP = 4,
  IF_TYPE_PARA = 8,
  IF_TYPE_CELL = 16
} __MIDL___MIDL_itf_mstag_0000_0001;

The Interop Assembly corresponding to this type library contains an enumeration named IF_TYPE and another enumeration named __MIDL___MIDL_itf_mstag_0000_0001. Fortunately, any signatures that use the typedef (IF_TYPE) are imported as signatures that use the value type with the matching name.

Converting ActiveX Controls

ActiveX controls are coclasses that are typically marked with the IDL control attribute (as seen in the familiar WebBrowser coclass from the Microsoft Internet Controls type library):

[
  uuid(8856F961-340A-11D0-A96B-00C04FD705A2),
  helpstring("WebBrowser Control"),
  control
]
coclass WebBrowser {
    [default] interface IWebBrowser2;
    interface IWebBrowser;
    [default, source] dispinterface DWebBrowserEvents2;
    [source] dispinterface DWebBrowserEvents;
};

The control attribute is optional. As discussed in Chapter 3, the official indicator that a COM class is an ActiveX control is the presence of the following registry value:

HKEY_CLASSES_ROOTCLSID{clsid}Control

Chapter 3 also introduced a separate mechanism, specific to Windows Forms, that creates an ActiveX Assembly. This enables the control to be treated just like a Windows Forms control. Because this is separate from the type library importer, the importer treats a coclass, such as WebBrowser, no differently than any other coclass. The corresponding .NET class in the Interop Assembly can be used as a regular class, and should be used if you don’t require extra functionality specific to ActiveX controls.

Thus, each ActiveX control definition can appear in three forms:

1. The coclass refers to the original definition of the class in a type library.

2. The raw class refers to the .NET class definition generated by the type library importer.

3. The control class refers to the .NET class definition generated by the ActiveX importer with special behavior specific to ActiveX and Windows Forms controls.

So, what exactly does the ActiveX importer produce in the ActiveX Assembly, and how does it relate to the Interop Assembly produced by the type library importer? The steps of the ActiveX importer can be summarized as follows:

1. If a Primary Interop Assembly isn’t registered for the input type library, invoke the type library importer on the input type library to generate the Interop Assembly and any dependent Interop Assemblies. These assemblies are no different from the ones that would have been produced by any other means, such as running TLBIMP.EXE.

2. For each coclass registered as an ActiveX control, create a control class with the name AxClassName that derives from System.Windows.Forms.AxHost, and place it in an assembly called AxLibraryName.

3. Add a public default constructor to each control class that constructs the base AxHost class by passing the CLSID of the corresponding coclass.

4. Add members of each coclass’s default interface to each control class definition. This is in contrast to the type library importer, which adds the methods of all implemented interfaces to the raw class definition. Any methods or properties that conflict with members on base classes (such as AxHost) are given a Ctl prefix. Any events that conflict with members on base classes are given an Event suffix. Be aware that some properties might be added to the control class as raw accessor methods that can’t be used with property syntax without modifications.

5. For any properties or methods that return a type marked with ComAliasName("stdole.OLE_COLOR"), convert the type to a System.Drawing.Color type. This enables COM color properties to be treated as .NET color properties (in the Visual Studio .NET property browser, for example). This transformation is not done for method parameters.

6. For any properties or methods that return an IFont or IFontDisp interface, convert the type to System.Drawing.Font. Similarly, convert IPicture to System.Drawing.Image. This enables COM fonts and COM images to be treated as .NET fonts and .NET images. As with System.Drawing.Color, these transformations are not done for method parameters.

7. Make any control classes implement System.Collections.IEnumerable if the corresponding raw class implements it. Besides this interface, no other interfaces are listed as being implemented by the control class besides interfaces implemented by the AxHost base class. (Not even the default COM interface that the control class effectively implements is listed as an implemented interface, due to potential changes in member signatures.)

8. Add some members to each control class to handle events and underlying plumbing.

9. Add additional types to the ActiveX Assembly to enable event handling in the Windows Forms style.

Tip

Because the AXIMP.EXE utility has a /source option to generate C# source code, you can clearly see what the contents of an ActiveX Assembly are (the result of the previous nine steps), and you can customize it for your purposes.

Tip

A .NET control class does not directly wrap a COM object as an RCW does. If you’re using a control class in managed code but want to interact with the corresponding RCW instead, call the control’s GetOcx method to obtain the desired object. GetOcx is defined on the System.Windows.Forms.AxHost class and inherited by all control classes. This needs to be done if you require casting the RCW to a non-default interface.

Steps 8 and 9 are covered in Chapter 5. The members added in step 4 are similar to the members that appear on the raw class, except no pseudo-custom attributes and almost no regular custom attributes are placed on the control class. The rationale is that the ActiveX importer does only what’s necessary to make ActiveX controls appear as Windows Forms controls—no more and no less.

Conclusion

This chapter covered all the details of how .NET type information is generated from COM type information. These transformations are the key to COM objects being easily usable in .NET applications without requiring extensive modifications. Of course, the transformations work best for components that follow COM guidelines and conventions. For example, components that use a custom enumeration scheme, or custom data types that serve the same purpose as standard ones, cannot be transformed as nicely. (Such components would be hard to use from Visual Basic 6, so they should be quite rare.)

As you read the rest of this book, you may find yourself referring back to this chapter for a variety of reasons. For example, the importer’s behavior is important if you ever want to

• Modify the type information produced by the type library importer (Chapter 7, “Modifying Interop Assemblies”).

• Write your own type information manually instead of using the type library importer (Chapter 21, “Manually Defining COM Types in Source Code”).

• Design COM components that may be used in .NET applications (Part V, “Designing Great COM Components for .NET Clients”).

In this chapter, you also saw some of the ways in which a type library is less descriptive than IDL files. Chapter 7 demonstrates how to “add back” the missing information. I often get asked about why the type library importer omits a certain type or certain information from an Interop Assembly, and the reason is almost always that the information is not in the type library to begin with. For example, types defined in an IDL file outside of a library block do not get placed in a type library unless used by other types inside the library block. Viewing an input type library with OLEVIEW.EXE is the best way to understand its contents.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset