| FullerData.com - News |
|
General News: BBC News Guardian News Telegraph News BBC UK BBC Technology BBC Business World Press Telegraph Opinion Scotsman Opinion Yahoo Opinion BBC In Depth BBC Magazine Techie News: Slashdot Kuro5hin Slashdot Developers Slashdot Games Slashdot Science PhysicsWeb Wired Technology PCWorld DevMaster GamaSutra Articles The Register TheServerSide .NET TheServerSide J2EE Sports News: Sport Headlines Football Motor Sport Cricket US Sport Microsoft: MSDN Architecture MSDN Patterns MSDN MSDN Magazine MSDN Web Services MSDN C# MSDN .Net Framework MSDN ASP.Net C# FAQ Database: MSDN SQL Server Oracle ASk Tom - Popular Oracle Ask Tom - Recent Oracle Blogs Other Techie: Code Project C# Stuff Live @ Sax.net Help .Net SQL Junkies DotNet Junkies 4GuysFromRolla.com Netcraft Blogs: Chris Brumme Martin Fowler Chris Sells Scott Watermasysk Sam Gentile Eric J. Smith Herb Sutter The Old New Thing Sam Ruby Tim Bray Tom Miller (MDX) Rico Mariani causticTech Johns Perf Blog |
HerbSutter: (With apologies to Billy Joel.) This is it: Hopefully the last time I move this blog for a while. I've moved over to a planned-to-be long-term blog home over at PluralSight. Thanks to Don for prodding, Fritz for hosting, and retroactive thanks to all the folks at MSDN and GDN for hosting my initial blog home here. I'll probably move my existing blog articles there. No ETA on when, though. Things are crazy, and I'm just about done the indexing on the C++ Coding Standards book. When that's done, and the C++ standards meeting a couple of weeks from now is over, and some upcoming talks including this one at OOPSLA are behind me, there'll be more time. Probably. (I always think there'll be more time someday. I guess it's good to be optimistic.) See you on PluralSight. Dan wrote:
First, note that this code as written doesn't do "finally," because the idea behind a finally block is that it gets executed whether an exception is thrown or not. But it's close: You can indeed get a similar effect if you add a throw statement (e.g., throw 1;) at the end of your try block, so that it always exits via an exception, and then catch(...). Even after applying the quick fix to make this do what was intended, this (ab)use of the try/catch mechanism inferior to having finally (or better still, in most cases, a destructor) because it interacts poorly with other exceptions. For example, what if other parts of the body could throw specific exceptions we want to catch? It would be repetitive and fragile to duplicate the delete statement in every catch block. Also, it's going to be more expensive in runtime cost because an exception will always be thrown, and actually throwing an exception has nonzero cost on every implementation I've heard of; having finally just has the compiler ensure the given code runs in all cases without having to throw an extra exception if none was otherwise being thrown. Mark wrote:
That reminds me to announce (belatedly): Try out the free Express compiler and play around with the new syntax. Make sure you also get the tool refresh so that you get all the latest parts of the syntax now implemented (e.g., putting .NET Framework types on the stack and having them automatically Disposed for you). Cool stuff. Check it out. Ioannis Vranos wrote:
Just a note, we now expect to complete work on the C++/CLI standard in March and have it approved by Ecma in June. We wanted the extra time to complete and polish the C++/CLI spec, but another reason for the slip was that the CLI standard (undergoing its second round of work in parallel with the first round of C++/CLI) also slipped. In case you're wondering "why 6 months," the answer is that Ecma's schedule, you get two shots a year to submit a document for approval (June and December), so any slip is 6 months (or a multiple thereof).
BTW, another way of saying this that people seem to find useful is that, as of our Whidbey (VC++ 2005) release, C++ is the systems programming language for .NET. On comp.lang.c++.moderated, "Howard" <alicebt@hotmail.com> wrote:
Actually, that's valid C++/CLI code -- C++/CLI also supports finally with exactly that syntax. This is described in the current draft spec available via: http://www.msdn.microsoft.com/visualc/homepageheadlines/ecma/default.aspx . See section 16.4, and search for "finally" throughout (there are other spots that specify how it interacts with other language features like goto). This is in addition to destructors -- as another responder, Mike Wahler, noted, you can also get a similar effect with destructors:
One advantage of finally is visual locality of reference -- the code is right there in the function instead of off in a different place (a destructor definition). One advantage of destructors is that RAII objects encapsulate and automatically manage ownership. In your example above, I would prefer to use a destructor (by writing auto_ptr<Resource> instead of Resource*, you don't need a finally at all). You can use either facility, of course. To me, this is yet another case where C++ gives you the best of both worlds. Having both is better than having either alone (in languages that have only finally or only destructors). Note this is very closely related to the Dispose pattern I mentioned in a recent post, because the Dispose pattern relies on finally for things that destructors do better, so it's good to have destructors. For other things, finally is better, so it's good to have finally. Herb An interim updated snapshot of the C++/CLI draft standard is now available: C++/CLI Language Specification (Working Draft 1.5, June 2004) Related links: Kenny Kerr announced a cool article he wrote about C++/CLI on MSDN: C++/CLI: The Most Powerful Language for .NET Programming Earlier today I wrote that: “The C++ destructor model is exactly the same as the Dispose and using patterns, except that it is far easier to use and a direct language feature and correct by default, instead of a coding pattern that is off by default and causing correctness or performance problems when it is forgotten.“ Niclas Lindgren agreed with the post but added:
I was referring to the Dispose coding pattern. For the benefit of those who aren't familiar with the Java Dispose pattern, let me give two examples: first a simple one with just one resource, then a more complex one with three resources. The examples use .NET Framework types. Consider this C++/CLI code:
The minimal C# equivalent is the “using“ patterns which semiautomates the Java Dispose pattern (I'm showing the {} around the block to be explicit that there's a block, although since there's only one statement you don't really need them):
The minimal Java equivalent is the Dispose pattern (this is what C#'s “using“ generates under the covers):
I called this Dispose pattern “a coding pattern that is off by default and causing correctness or performance problems when it is forgotten.“ That is all true. You have to remember to write it, and you have to write it correctly. In constrast, in C++ you just put stuff in scopes or delete it when you're done, whether said stuff has a nontrivial destructor or not. But that example is still flattering to the Dispose pattern, because there's only one resource. Consider as a second example this C++ code that opens a message queue and echoes one message to two target queues (a main queue and a backup queue) -- and automatically correctly closes the queues:
The minimal correct Java equivalent is:
In this case, the queues will be freed up eventually when the garbage collector runs (which it probably will). In the meantime, we've tied up scarce resources. Worse, we've tied up resources on other machines. So the C++ destructor model is exactly the same as the Dispose and using patterns, except that it is far easier to use and a direct language feature and correct by default, instead of a coding pattern that is off by default and causing correctness or performance problems when it is forgotten. Keith Duggar asked: Ok, I'm missing a basic point here. So, out of curiousity:
No.
Yes: A % can only exist on the stack (a CLI limitation). Cases where a C++ program puts a & on the heap are relatively rare -- you'd have to a have a reference member of an object, which is problematic in general -- but % can't be a drop-in replacement for that use of &. This post brings together three related questions and answers posted recently on comp.lang.c++.moderated. They all have to do with the relationship between % (tracking reference) and & (unmanaged reference). First, Dietmar Kuehl asked about this C++/CLI code:
The solution is to change f's parameter from T& to T% . When referring to ab object on the GC heap, just as instead of an unmanaged pointer you need a handle ^ (which tracks during GC) you also instead of an unmanaged reference need a tracking reference % (which tracks during GC). Trying to pass a deref'd ^ (which tracks) to a & (which doesn't track) is the same kind of error as trying to pass a const object (which is const) to a non-const & (which isn't const):
So in general, to be able to bind also to objects on the GC heap, a function taking a & parameter only needs to change the & to % instead (the body generally won't care). In our STL.NET project we're providing the standard algorithms with % instead of & parameter types, which doesn't affect the meaning but allows them to be used with all the types they work with already plus also managed types. If you can't change the function and must call a function with a plain & (including if you just want to call the standard algorithms directly for some reason), you can still do it by using the RAII pin_ptr so that tracking isn't needed for the duration:
Note that normally you don't need that extra &* -- here it's needed because we're dealing with a boxed value type and want to reach into the box. Separately, Sebastian Kaliszewski asked about this code:
This code does work, except that you dereference a ^ with * (just like you dereference all pointerlike abstractions), so change the call to f(*ptr); and it works just fine. Finally, Thorsten Ottosen asked whether the following couldn't be possible:
The answer is that all of those work as written, except for #1 which only needs a pin_ptr. A pin_ptr is an RAII helper that pins an object on the GC heap for the lifetime of the pin_ptr. Pinning ensures that the GC doesn't move the object around, during which time it's safe to take a plain old unmanaged pointer to it: foo( *pin_ptr<int>(&*gc) ); // 1 (fixed) Or, a little more clearly: pin_ptr<int> pp = &*gc; // take a pin The % can bind to any object whether the object moves (i.e., is on the gc heap) or not -- thus % can bind to a strict superset of what a & can bind to. (The & can bind directly to any object not on the gc heap, but requires a pin to bind to one on the gc heap.) Thorsten also asked:
In short, to provide pointerlike and referencelike abstractions that work for the general case of garbage-collected memory, which includes compacting GCs that move objects around (so the new pointer/reference abstractions have to be able to track), as close to unmanaged pointers/references in function and syntax as possible. I’m going to echo various interesting pieces of newsgroup discussion here. Check out the newsgroups for the full threads. On comp.lang.c++.moderated, Dietmar Kuehl <dietmar_kuehl@yahoo.com> wrote: I have thought about integrating GC into C++ for quite some time. My personal conclusion was, however, that deterministic destruction and garbage collection don't really mix. The major issue is that in a truly garbage collected system you could simple use whatever reference you get hold of. This promise does not hold if deterministic destructors are involved. I would dispute that characterization of a "truly garbage-collected system" -- it certainly doesn't describe either Java or .NET. In Java and C#, you routinely have objects to which you still have references but whose lifetimes have ended via the Dispose and using patterns, and you have to know (or the member functions have to check) that you're not using an already-disposed object. The C++ destructor model is exactly the same as the Dispose and using patterns, except that it is far easier to use and a direct language feature and correct by default, instead of a coding pattern that is off by default and causing correctness or performance problems when it is forgotten. So yes, in ISO C++ or Java or C# or C++ with C++/CLI, if you use a * or ^ or object reference to an object which has been Disposed/destroyed, you have exactly the same issue in all of those languages and environments: You need to be aware when pointers/references are invalidated. It's just the same as C++'s plain old pointer invalidation problem. The short answer to the question in the title is: In Whidbey, ^ and gcnew are mostly all you're likely to use, if all you're doing with .NET is consuming (using) .NET Frameworks types and services. Post-Whidbey, you could even dispense with those and use * and new, because we intend to support allocating CLI objects on the native heap with new and pointing at them with *'s. Alisdair Meredith recently asked me:
This is a very good point, and I'll use it as a hook to talk about the two major categories of use I see for the C++/CLI extensions. I do think the current design serves both markets well. In particular, note that nearly all of the extensions will only be used by people authoring CLI types, so:
What's more, in the post-Whidbey timeframe when we complete the support for allocating any object (including CLI objects) on the native heap using new, people could actually consume CLI / .NET Frameworks types seamlessly in their application without any new syntax. Of course, under the covers we'll be creating a proxy object each time, so there's a probably-slight performance impact to doing it this way, but it's worth noting that you'll be able to do this eventually. Finally, I should also add that we intend to emit warnings by default when any of the extensions are used with native (i.e., normal C++) types, so that people will know when they are using a nonstandard (well, non-ISO-C++-standard at any ware) extension in a class that would otherwise be portable. Edward Diener made the observation (emphasis is mine):
In my previous blog entry, I listed two reasons not to conflate the type category with where objects of that type can be allocated, namely to let us in the future support allocating native types on the GC heap, and CLI types on the native heap. The decision to garbage-collect or not should not be per-type (as it was in MC++), but per-object. To that, I should also add that "CLI types" already does not mean "garbage-collected types." In particular, the CLI knows about two kinds of types: reference types, and value types. First, value types are never garbage-collected in their unboxed form. Second, nearly all programs have objects of reference and boxed value type that are intentionally never collected because they are used right through to the termination of the program; they are one reason why the CLI GC is generational, and why the CLI has features for controlling whether or not such objects ever get the finalizers run, for example.
That's possible, but in our experience so far (having tried to do it that way for two releases now) we've found that users are far more apt to be confused by failing to surface an important distinction. This is the balancing act of language design: it is just as bad to fail to surface an important distinction (which creates a trap and a usability barrier) as it is to needlessly surface a distinction that could have been hidden from the programmer (which creates clutter and a different kind of usability barrier). Edward Diener asked:
That's one way to design it. Interestingly, the existing "Managed C++" does use just plain old new for everything, with pretty much the semantics described above: In MC++, the expression new T allocates a T object on the CLI gc heap if T is a CLI type, and on the native heap if T is a native type. Briefly, this model was limiting, and it was confusing to users. Just to give a taste for what happens when you go down this path, the next question you hit is "what is a T*?" In particular, consider: T* t = new T; Is t a pointer to an object in gc'd memory, which means that the object can move? Or is it a pointer to an object in native memory that doesn't move? "That's easy!" one might say. "We could deduce that t points into gc'd memory if and only if T is a CLI type." In the above code statement, that's true, and Managed C++ used "defaulting rules" to make it mean exactly that. In particular, in MC++ the T* above means the same as the following longer version if T is a CLI type: T __gc * t = __gc new T; But it turns out that when going along this path you do need that __gc pointer decoration (or its moral equivalent) sometimes, even if much of the time you can make it optional by defaulting it based on the pointed-at type. In particular, you need it for combinations of pointers, including the simplest case: int** ppi; // what is this? Consider: Where is the int* that the int** is pointing to? The answer cannot be deduced from the type, because an int* can exist on the gc'd heap or on the native heap. Therefore both int* __gc * (a pointer to an int* on the gc heap) and int** (a pointer to an int* on the native heap) are valid pointer types, and therefore you need to be able to distinguish between them. In practice, this has been a great source of user confusion because programmers are never really sure where to put the __gc. So what we have observed many users do, time and again, is simply add __gc's until the code compiles -- whether the __gc is needed (or correct) or not. So you can use defaulting rules to hide the __gc some of the time, but not all of the time. And the "some of the time" itself is at a cost, namely that it arbitrarily restricts native types from ever being allocated on the gc heap (i.e., from being garbage collected), and restricts CLI types from ever being allocated on the native heap. He continued:
Bingo. In addition to the clarity and usability issue above, you hit on a primary reason for not just using unadorned new: It conflates two ideas that ought to be independent, namely the idea of what kind of type T is and the idea of where T objects are allocated. In particular, we will definitely in the future allow allocating an object of any type on the gc heap or on the native heap. Why do we feel the need to support allocating an object of native type on the GC heap? Because customers ask for it. "You've got a great garbage collector in there," they say, "so why can't I use it to garbage-collect the objects I already have today?" That's a reasonable question. Why do we feel the need to support allocating an object of CLI type on the native heap? Because there are lots of native templated libraries out there today, many of which that internally allocate objects on the native heap (because, after all, they know nothing about the GC heap). We ought to be able to leverage all those existing libraries and use them unmodified also with the CLI types. True, for some such libraries it may be enough simply to give a different meaning to new depending on the type of T as originally proposed above. But note that such libraries might not only allocate objects using new, but may rely on the resulting pointers to support things like pointer arithmetic that MC++'s __gc*'s and C++/CLI's ^'s cannot support, and so they would still be broken for CLI types. Of the two, I see gc'ing native objects as the more compelling and mainstream use case. C++/CLI specifies several keywords as extensions to ISO C++. The way they are handled falls into five major categories, where only the first impacts the meaning of existing ISO C++ programs. 1. Outright reserved words As of this writing (November 22, 2003, the day after we released the candidate base document), C++/CLI is down to only three reserved words: gcnew generic nullptr An existing program that uses these words as identifiers and wants to use C++/CLI would have to rename the identifiers. I'll return to these three again at the end. All the other keywords, below, are contextual keywords that do not conflict with identifiers. Any legal ISO C++ program that already uses the names below as identifiers will continue to work as before; these keywords are not reserved words. 2. Spaced keywords One implementation technique we are using is to specify some keywords that include embedded whitespace. These are safe: They can't possibly conflict with any user identifiers because no C++ program can create an identifier that contains whitespace characters. [I'll omit the obligatory reference to Bjarne's classic April Fool's joke article on the whitespace operator. :-) But what I'm saying here is true, not a joke.] Currently these are:
for each For example, "ref class" is a single token in the lexer, and programs that have a type or variable or namespace named ref are entirely unaffected. (Somewhat amazingly, even most macros named ref are unaffected and don't affect C++/CLI, unless coincidentally the next token in the macro's definition line happens to be class or struct; more on this near the end.) 3. Contextual keywords that can never appear where an identifier could appear Another technique we used was to define some keywords that can only appear in positions in the language grammar where today nothing may appear. These too are safe: They can't conflict with any user identifiers because no identifiers could appear where the keyword appears, and vice versa. Currently these are: abstract finally in override sealed where For example, abstract as a C++/CLI keyword can only appear in a class definition after the class name and before the base class list, where nothing can appear today:
ref class X abstract : B1, B2 { // ok, can only be the keyword 4. Contextual keywords that can appear where an identifier could appear Some keywords can appear in a grammar position where an identifier could also appear, and this is the case that needs some extra attention. There are currently five keywords in this category: delegate event initonly literal property In such grammar positions, when the compiler encounters a token that is spelled the same as one of these keywords, the compiler can't know whether the token means the keyword or whether it means an identifier until it first does some further lookahead to consider later tokens. For example, consider the following inside a class scope:
property int x; // ok, here property is the contextual
keyword Now imagine you're a compiler: What do you do when you hit the token property as the first token of the next class member declaration? There's not enough information to decide for sure whether it's an identifier or a keyword without looking further ahead, and C++/CLI has to specify the decision procedure -- the rules for deciding whether it's a keyword or an identifier. As long as the user doesn't make a mistake (i.e., as long as it's a legal program with or without C++/CLI) the answer is clear, because there's no ambiguity. But now the "quality of diagnostics" issue rears its head, in this category of contextual keywords and this category only: What if the user makes a mistake? For example: property x; // error, if no type "property" exists Let's say that we set up a disambiguation rule with the following general structure (I'll get specific in just a moment):
1. Assume one case and try to parse what comes next that way. In the case of property x; when there's no type in scope named property, both #1 and #2 will fail and the question is: When we get to the diagnostic in case #3, what error message is the user likely to see? The answer almost certainly is, a message that applies to the second "other" case. Why? Because the compiler already tried the first case, failed, backed up and tried the second "other" case -- and it's still in that latter mode with all that context when it finally realizes that didn't work either and now it has to issue the diagnostic. So by default, absent some (often prodigious) amount of extra work inside the compiler, the diagnostic that you'll get is the one that's easiest to give, namely the one for the case the compiler was most recently pursuing, namely the "other" case mentioned in #2 -- because the compiler already gave up on the first case, and went down the other path instead. So let's get specific. Let's say that the rule we picked was:
1. Assume that it's an identifier and try to parse it that way Under that rule, what's the diagnostic the user gets on an illegal declaration of property x;? One that's in the context of #2 (keyword), something like "illegal property declaration," perhaps with a "the type 'x' was not defined" or a "you forgot to specify the type for property 'x'" in there somewhere. On the other hand, let's say that the rule we picked was:
1. Assume that it's the keyword and try to parse it that way. Under this rule, the diagnostic that's easy to give is something like "the type 'property' was not defined." Which is better? This illustrates why it's very important to consider common mistakes and whether the diagnostic the user will get really applies to what he was probably trying to do. In this case, it's probably better to emit something like "no type named 'property' exists" than "you forgot to specify a type for your property named 'x'" -- the former is more likely to address what the user was trying to do, and it also happens to preserve the diagnostics for ISO C++ programs. More broadly, of course, there are other rules you can use than the two "try one way then try the other" variants shown above. But I hope this helps to give the flavor for the 'quality of diagnostics' problem.
I feel compelled to add that the collaboration and input over the past year-plus from Bjarne Stroustrup and the folks at EDG (Steve Adamczyk, John Spicer, and Daveed Vandevoorde) has been wonderful and invaluable in this regard specifically. It has really helped to have input from other experienced compiler writers, including in Bjarne's case the creator of the first C++ compiler and in EDG's case the folks who have one of the world's strongest current C++ compilers. On several occasions all of their input has helped get rid of inadvertent assumptions about "what's implementable" and "what's diagnosable" based on just VC++'s own compiler implementation and its source base. What's easy for one compiler implementation is not necessarily so for another, and it's been extremely useful to draw on the experience of comparing notes from two current popular ones to make sure that features can be implemented readily on various compiler architectures and source bases (not just VC++'s) and with quality user diagnostics. 5. Not keywords, but in a namespace scope Finally, there are a few "namespaced" keywords. These make the most sense for pseudo-library features (ones that look and feel like library types/functions but really are special names known to the compiler because the compiler does special things when handling them). They appear in the stdcli namespace and are: array interior_ptr pin_ptr safe_cast That's it.
Is it worth it to push all the way down to zero reserved words in C++/CLI? There are pros and cons to doing so, but I've certainly always been sympathetic to the goal of zero reserved words; Brandon and others will surely tell you of my stubborn campaigning to kill off reserved words (I think I've killed off over a half dozen already since I took the reins of this effort in January, but I haven't kept an exact body count). I think the right time to decide whether to push for zero reserved words is probably near the end of the C++/CLI standards process (summer-ish 2004). At that point, when all other changes and refinements have been made and everything else is in its final form, we will have a complete (and I hope still very short) list of places where C++/CLI could change the meaning of an existing C++ program, and that will be the best time to consider them as a package and to make a decision whether to eliminate some or all of them in a drive-it-to-zero cleanup push. I am looking forward to seeing what the other participants in all C++ standards arenas, and the broader community, think is the right thing to do as we get there. Putting it all together, what's the impact on a legal ISO C++ program? Only:
Let me illustrate the macro cases with two main examples that affect the spaced keywords:
// Example 1: this has a different meaning in ISO C++ and C++/CLI In ISO C++, this means change every instance of interface to struct. In C++/CLI, because "interface struct" is a single token, the macro means instead to change every instance of "interface struct" to nothing. Here's the simplest workaround:
// Workaround 1: this has the same meaning in both Here's another example of a macro that can change the meaning of a program in ISO C++ and C++/CLI:
// Example 2: this has a different meaning in ISO C++ and C++/CLI In ISO C++, ref goes to const and the last line defines a class C and simultaneously declares a const object of that type named c. This is legal code, albeit uncommon. In C++/CLI, the macro has no effect on the class declaration because "ref class" is a single token (whereas the macro is looking for the token ref alone, not "ref class") and so the last line defines a ref class C and simultaneously declares a (non-const) object of that type named c. Here's the simplest workaround:
// Workaround 2: this has the same meaning in both But hey, macro names are supposed to be uppercase anyway. :-) I hope these cases are somewhere between obscure and pathological. At any rate, macros with short and common names are generally unusual in the wild because they just break so much stuff. I would rate example 1 above as fairly obscure (although windows.h has exactly that line in it, alas) and example 2 as probably outright pathological (as I would rate all macros with short and common names). Whew. That's all for tonight. On comp.lang.c++.moderated, Peter Lundblad wrote:
There are several reasons why we didn't use a form of placement new. One reason is that we wanted to leave a door open in case in the future we wanted to allow placement and class-specific forms of gcnew. Having a parallel gcnew expression and operator best serves leaving that door open. Another reason is that existing libraries, including GC libraries, already use placement forms of new, and so many of the possible placement names are taken. In particular, new (gc) X; is already taken by the Boehm collector. Yes, I know you suggested CLI::gc instead of plain gc, but in practice I'm still concerned that enough people are liable to frequently write using namespace CLI; (actually stdcli) to make this problematic. Still another is one you cite: It's easier to teach that "the type of a new-expression (and operator new) is a *" today, and that "the type of a gcnew-expression is a ^". Finally, a minor reason is that gcnew is slightly less typing than new (gc) or new (cli), and moderately less typing than new (stdcli::gc). Today the C++/CLI candidate base document was posted, and it's freely available for download. This is the spec that Microsoft is contributing to the newly-formed ECMA TC39/TG5 standards committee for consideration for the C++/CLI standards process. It covers all the main proposed features, and it gives a pretty thorough look at the scope and shape of what's being contemplated. There are still places that need to be filled in, though, as well as some technical decisions that TG5 will need to decide (in addition to any existing decisions that they may decide to review or change). Note that this is the last version of the document that will bear a Microsoft copyright, so we've taken this opportunity to make it publicly available while we still own it. If ECMA TC39/TG5 adopts this as their base document, it will henceforth be an ECMA document maintained by that ECMA group. That means it will be up to TG5 to decide what changes to make and when to make future drafts publicly available. (From my informal conversations, I wouldn't be surprised if interim drafts were published every three months or so, but that's just my personal best guess right now. We'll have to wait and see when the whole group feels the spec is in shape for TG5 to feel ready to distribute its own first updated snapshot.) Whew! It's been a long year, a long month, and a long week. Enjoy! And please let us know what you think of this. Comments are welcome, and those of us on the team who are blogging (see my Links) will be answering as many as we can get to while we spend our days continuing to work on the Whidbey product. I'll probably blog fairly lightly over the next two weeks. Next week is a short week of course, with the U.S. Thanksgiving holidays closing most offices on Thursday and Friday. The following week, on Dec 4-5, is the first ECMA TC39/TG5 meeting already, down in College Station, Texas -- it sure has come up fast. I'll have more to report after that. Nicola Musatti asked the following excellent question:
I agree that those are alternatives. Everyone, including me, first pushes hard for a library-only (or at least library-like) solution when they first start out on this problem. I think an argument can be made for it, and at one time I did so too. To me, the killer argument in favor of a new declarator with usage R^ instead of a library-like cli::handle<R> is its pervasiveness: It will be by far the most widely used part of all these extensions, as it's the common use case the vast majority of the time for CLI types (as objects, as parameters, etc.). This extremely wide use amplifies two particular negative consequences we'd like to avoid: First, the long spelling (here "handle") could in practice effectively become a reserved word just because people are liable to widely apply using to avoid being forced to write the qualification every time (this is worse if the name chosen is a common name likely to be used for other identifiers or even macros, and "handle" is a very common name). Second, and worse, the long spelling would also make the language several times more verbose in a very common case than even the Managed Extensions syntax was, and that in turn was already verbose compared to other CLI languages. Compare five alternatives side by side: cli::handle<R> r = cli::gcnew<R>(); // 1: above suggestion handle<R> r = gcnew<R>(); // 2: ditto, with "using"s R __gc* r = new R; // 3: original MC++ syntax R^ r = gcnew R; // 4: C++/CLI syntax R r = new R(); // 5: C#/Java syntax I think you could make a case for any one of these, depending on your tradeoffs. But I think a tradeoff that favors usability will favor the last few options. There are also other issues where having ^ and % declarators/operators that roughly correspond to * and & enables a more elegant type calculus. I (or someone on the team) will have to write those up someday, but consider at some future time when we have full mixed types too: When we can have a type that inherits from both native and CLR base classes/interfaces, we will want to be able to pass a pointer to such an object to existing ISO C++ APIs that take a Base1* and a handle to the same object to existing CLI APIs that take an Base2^. Both will be common operations and therefore both should be distinctly expressible with a terse syntax: class NativeBase { };
// a mixed type
void NativeFunc( NativeBase* );
R r;
// object on the stack In this way, % is to ^ pretty much just as & is to *. If R^ were instead spelled using a templatelike syntax, what would be the corresponding code to get at it? Finally, consider the agnostic template case:
template<typename T> I'll write more about the full pointer system in the future. For other design considerations about handles I'll point to at Brandon's Behind the Design: Handles blog entry again, and to my own earlier this week on why pointers aren't enough by themselves. On comp.lang.c++.moderated, Andrew Browne wrote:
That's one of the alternatives I attempted, and I wasn't the first. I think almost everyone starts here, and I held on for a while before I became convinced I had to let go because it wasn't leading to the right places. Let me share some of the problems and objections that crop up when you work your way down this path: 1. (Minor) Verbose The above alternative is a lot of typing compared to any of the alternatives (Managed C++ syntax, proposed C++/CLI syntax, and other CLI languages). There's a pretty easy solution for this one, using keyword shortcuts:
class R : ref {/*...*/}; // CLR reference type An inconvenience with this is that there could already be a class named ref, and so the syntax would have to be embroided somehow to account for disambiguating this; this is unfortunate but surmountable. But, more importantly, this shorthand still doesn't address the other drawbacks, below, of this general approach. 2. Forward declarations Consider: class X; Is this a ref class, value class, interface class, or native class? There are a few cases where this needs to be known from the forward declaration. 3. Indirect: The header hunt Consider: class X : public Y { }; Is this a ref class, value class, interface class, or native class? Under the alternative, the only way to know would be to inspect Y and all base classes until you can determine whether any of them directly or indirectly inherit from Object or ValueType (or not). There are shortcuts (e.g., it's simpler for value types because they're always sealed and so the inheritance has to be direct), but the hunt remains. That may not seem like a huge issue, except that the types really are behaviorally different in small but important ways; for example, in one case a virtual call in a ctor or dtor will be deep, in the other it will be shallow. What metadata will eventually be emitted, if any? 4. Closes doors Speaking specifically to the last part of the example:
Unfortunately, this conflates the ideas of the type category (ref/value/native) with the form of genericity (generic/template). It says that CLI types can only be genericized, and native types can only be templated, leaving no way to express the other two useful concepts:
Templated CLI types in particular are very useful and are supported in C++/CLI, which lets the template/generic choice and the class category choice vary independently. 5. Other closed doors: Distinguishing mixed types (Future) In the future, C++/CLI is intended to eventually allow for full mixing and cross-inheritance of arbitrary types. Using the alternative inheritance-based syntax alone does not allow the programmer to distinguish between the following two distinct things that the proposed C++/CLI design lets the programmer express as follows: ref class Ref : public ANative { int x; }; class Native : public ARef { int x; }; This distinction can't be expressed using the proposed alternative above. Both types have System::Object as a base class, but one is a reference class that other CLI languages could use directly and where virtual calls during construction are deep, and one is a native class that other CLI languages can only use via a handle or reference to the ARef base class and where virtual calls during construction are shallow. Last week on comp.lang.c++.moderated, Nicola Musatti wondered why C++/CLI would use keywords that don't follow the __keyword naming convention for conforming extensions:
Right, and that's what Managed C++ used, for just that reason: to respect compatibility. Unfortunately, there was a lot of resistance and it is considered a failure. For one thing, programmers have complained loudly that all the underscores are not only ugly, but a real pain because they're much more common throughout the code than other extensions such as __declspec have been. In particular, __gc gets littered throughout the programmer's code. At least as importantly, the __keywords littered throughout the code can make the language feel second-class, particularly when people look at equivalent C++ and C# or VB source code side-by-side. This comparative ugliness has been a contributing, if not essential, factor why some programmers have left C++ for other languages. Consider:
//-------------------------------------------------------
R r = new R;
//-------------------------------------------------------
R __gc * r = new R; Oddly, numerous programmers find the former more attractive. Particularly after the 2,000th time they type __gc. But now we can do better:
//-------------------------------------------------------
R^ r = gcnew R; I should note there's actually also a shorter form for this common case, to have the compiler automatically generate the property's getter, setter, and backing store. While I'm at it, I'll also put the R instance on the stack which is also a new feature of the revised syntax:
//-------------------------------------------------------
R r; C# is adding something similar as a property shorthand. But C# doesn't have stack-based semantics for reference types and is unlikely to ever have them, though using is a partial automation of the stack-based lifetime control that C++ programmers take for granted. I'll have more to say about using another time. A few days ago on news:comp.lang.c++.moderated, Nicola Musatti wrote:
Not for a pure definition of "pure," they don't. :-) To explain why C++ pointers are insufficient (unless their semantics were to be changed at least a little, which would mean breaking existing code), consider two counterexamples: 1. Not for a compacting GC. Certainly a bald pointer can't point directly to an object that moves around in memory, because C++ pointers are required to be stable, to always have the same value while pointing to the same object. Changing the semantics of a pointer to make it track will break lots of code, starting with set<T*>, because such tracking pointers cannot be ordered (their values will after all be changed arbitrarily at unpredictable times by the GC). There are also other restrictions, but that's one of the most noticeable. [Aside: Such a tracking pointerlike abstraction is needed, and is provided in C++/CLI. It just can't be spelled * without fundamentally scuttling ISO C++ conformance, is all.] 2. Not for a non-compacting GC, either. This case can be got a lot closer, but even Great Circle / Boehm style collectors impose restrictions that break some conforming C++ programs. In particular, they restrict, if only slightly, the operations that Standard C++ allows on pointers. Consider the following well-formed ISO C++ program with well-defined semantics:
int* pi = new int(42); // line 1 // ... do other work ...
pi = (int*)((int)pi ^ 0xaaaaaaaa); Add-on GCs can't see such disguised pointers, and are liable to reclaim the memory allocated in line 1 before its later use, resulting in an attempt to access freed memory. Boom. This isn't perverse or theoretical, by the way. Consider "two-way pointers" as one example of a well-known implementation technique where two pointers are XOR'd together like this for a perfectly reasonable and legal use. In particular, a motivation behind two-way pointers is that you can have a more space-efficient doubly linked list if you store only one (not two) pointer's worth of storage in each node. But how can the list still be traversable in both directions? The idea is that each node stores, not a pointer to one other node, but a pointer to the previous node XOR'd with a pointer to the next node. To traverse the list in either direction, at each node you get a pointer to the next node by simply XORing the current node's two-way pointer value with the address of the last node you visited, which yields the address of the next node you want to visit. For more details, see:
"Running Circles Round You, Logically" I don't think the article is available online, alas, but Steve's website has some source code demonstrating the technique. This perfectly standards-conforming and useful technique won't work correctly with any GC implementation I know of that does not extend the language so that pointers can retain their full standard meaning. Steve's technique works perfectly fine and unbroken, however, under C++/CLI. It works because C++/CLI preserves exactly the full semantics of * pointers without any limitations. To do so, C++/CLI needed to add a new abstraction for GC semantics instead of pretending that raw pointers are by themselves a complete solution for safe use in a GC environment (they aren't, only because they were never designed to be). For more about the design motivations behind the ^ declarator (aka a "handle"), see also Brandon Bray's excellent blog entry Behind the Design: Handles posted earlier today. A few days ago on news:comp.lang.c++.moderated, "Chris" asked:
No. Doing that would mean throwing away all the ISO conformance work that Visual C++ just spent nearly the whole last release cycle adding to the product. VC++ is now 98%-ish conformant to C++03 (the 1998 ISO C++ standard + its first technical corrigendum) and VC++ will continue to work on the remaining 2%, plus track the coming C++0x additions as they are created by the ISO and ANSI committees. Of course, the CLI extensions will be needed where programs specifically take advantage of CLI (i.e., .NET) data types and features, such the types in the .NET Frameworks libraries, and garbage collection and reflection. But programs that don't need those can ignore the extensions and compile just fine to either native binaries or to .NET IL. Note that last bit, because it seems to be not widely known: C++ code can still be compiled to IL and run in the .NET virtual machine (Common Language Runtime, or CLR) without using any extensions; the extensions are needed only for additionally using CLI data types and features like garbage collection. So there are three major scenarios:
Unless you're actually authoring your own new CLI types, you're unlikely to directly use much more than gcnew and ^, plus maybe an occasional sprinkling of nullptr or %. Welcome! My primary day job these days is that I'm an Architect on the Visual C++ team at Microsoft, currently responsible for leading the redesign of the C++ Managed Extensions for .NET (aka "Managed C++"). I also do a fair amount of other C++ writing and speaking (including right now busily writing two new books due out in the spring), and I chair the ISO C++ standards committee. You can find out more about me on my website. At first, I'll mostly use this blog to begin answering frequently asked questions about the language extensions redesign. The VC++ team has learned a lot about what worked and what didn't work with the current Managed Extensions for C++ (aka "Managed C++"). The redesign is an evolution of those extensions but it isn't being called "Managed C++" any more. The new syntax is about to undergo standardization in the ECMA and ISO worlds under the name "C++/CLI," a binding from C++ to the CLI, so I'll often refer to the extensions by that name. I get questions about this every day or two, and I'll primarily answer them here. In the meantime, you can find a general overview blurb about this work on my website's Microsoft page. |