
Don't Repeat Yourself.
That's a great rule for writing code. If you find that you repeat yourself in code, then you're probably doing something wrong. Writing code should be about expressing what's unique about something, not filling out standard forms of data. (Copy-and-Paste coding is the worst version of this)
Unfortunately, when it comes to providing serialization and editing information about classes in C++, the language falls down. In C#, and other .NET languages, and even in Java, reflection is rich and allows you to build nice, automated editors and serializers using minimal mark-up. Also, if you need mark-up, that mark-up can be done in-place where members are defined, typically using custom attributes.
The C++ compiler actually builds all the information needed to do reflection internally. However, while the language designers provided a type_info object to represent the runtime type of a class, they stopped short of actually making that type useful. The only thing you can use a type_info for is to find the name of a type as a string, and to test equality of two types. Would it have been so hard to define some set of structures/tables to describe the layout of an object, one wonders? (multiple virtual inheritance, pointer-to-member-function members, and other such features would complicate the necessary structures, but only marginally -- it's totally doable in a portable way)
So, everyone who works on networking code or high-performance or compact file loading/saving end up writing their own metadata package. Typically, you declare the structure and the members of the structure in some set of macros. Often, those macros have to live in some C++ file that is separate from the header that declares the structure, which leads to version mismatch problems -- add a field in a struct, forget to add it to the C++, spend hours looking for a bug. Othertimes, the mark-up is too verbose and complex to get it right, leading to sometimes subtle bugs being introduced, and hard to track down, because everything is as set of macros, and you can't easily see what the compiler is doing behind your back.
So, here's a package I developed to do C++ metadata for serialization. It can be expanded to support things like STL strings, pointer-to-types, etc if you wish; perhaps I will return with an updated version later.
You declare your struct types like so:
struct foo { int x, y, z; RTTI(foo, MEMBER(x) MEMBER(y) MEMBER(z)) };
And you can use them like so:
int main() { printf("type: %s\n", foo::info().name()); printf("size: %ld\n", foo::info().size()); for (size_t i = 0; i != foo::info().memberCount(); ++i) { printf(" %s: offset %ld size %ld type %s\n", foo::info().members()[i].name, foo::info().members()[i].offset, foo::info().members()[i].type->size(), foo::info().members()[i].type->name()); } return 0; }
Clearly, the strength of this approach comes when you use template functions, or functions taking type information, within the implementation of your code. For example, you can write a generic "marshal" and "demarshal" function that uses the information about each struct to propery pack and unpack to a stream. The current implementation just uses binary copy, but you can easily add support for specific types by using template specialization of the Type template type:
template<typename T> struct Type : TypeBase { static Type<T> instance; // custom marshaling is handled by template specialization void Marshal(void *dst, void const *src) const { memcpy(dst, src, sizeof(T)); } void Demarshal(void const *src, void *dst) const { memcpy(dst, src, sizeof(T)); } char const *name() const { return typeid(T).name(); } size_t size() const { return sizeof(T); } };
Note that this is for the base member types -- int, float and such -- you don't need to write this for the higher-level structures that you use RTTI for. You can also use type tags for the structs to provide recursive marshaling of structs-as-members-in-structs, but I'm leaving it at this for now.
The rest of the code, in a form that compiles and runs using gcc 4.2.4, can be downloaded in the archive attached to this post. Please let me know what you think of it in the comments section.
Happy hacking!
| Attachment | Size |
|---|---|
| reflect.zip | 1.38 KB |
Comments
Compile Error
I can't compile the code. I get the following error using Visual Studio 2002 in Windows XP:
error C2143: syntax error : missing ';' before '<'
Could you help?
Thanks
I recommend upgrading to
I recommend upgrading to visual studio 2008 or later. (2010 is out now!)
POD
Hey Jon - were you able to get your solution to work with non-POD types? I noticed even a simple struct that has a single non-POD member will warn under gcc with "warning: invalid access to non-static data member of NULL object". This makes sense, as computing the offset is undefined in this circumstance... any thoughts?
One option would be to make
One option would be to make the struct contain only the data, and then declare a subtype that adds the polymorphic member. The RTTI could work on the base (POD) struct.
If you really want to do it with arbitrary class types, then you have to be more dynamic about what you're doing, which may impact performance and network behavior, but can also give you benefits in other areas. In this case, you simply turn RTTI(typename, members) into something like:
At this point, each class implementing RTTI gets a "visit()" function which will call a visitor object with each member, passing a reference to the member as well as the name of it. This can be used to marshal in and out, to build hash tables to get members by name, etc. You can actually still do this with a NULL value if you want:
This will define the static registration for each type (assuming you can arrange for the static function to force linkage), in turn assuming RttiType is a class that can register data for a class instance based on the members passed to its operator(), and put it into a registry of some sort (depending on what you need to do with the type). And, yes, this "visits" a NULL object, but that will end working with all current compilers, as long as you don't actually read the value of the member; only store the offset and the name.
There's obviously all kinds of directions you can take this; the reason I posted this particular direction was that I wanted to show that a lightweight, intrusive marshaling solution is often actually better for some particular problems than a generic, non-intrusive system like that used by boost. Pick the right tool for the job!
get the value
Is there a lean possibility to get the attributes value within the for loop ?
I tried to modify the makro but I did not get it.
e.g. f.info().members()[i].value,
The problem is that the value
The problem is that the value of each member may be of a different type. Thus, you must vector through the type-specific handler for each member. This is unavoidable because of the static typing of C++.
An alternative approach to serialization and meta-data is to create a template visitor worker function that visits each member in some order; you could then re-write your for loop as a simple visitor functor, and it would be invoked with the right type for each member. However, the up-front macro declarations to set that up are less elegant than the macros in this code; requiring separating declaration from definition.
Your code can then call Visit(somePdu, MyVisitor) and MyVisitor will be invoked with the correct name, argument value/type, and offset within the PDU. However, you'd have to solve the registry of types separately, and the flavor of working with this kind of serialization is very different from the flavor I present in the article. There's also, as pointed out, the option of going with code parsing, or code generation from IDL. There are times when each kind will work just fine; you have to figure out the specific requirements of your project, and pick the alternative that matches your requirements the best.
Thank you for your
Thank you for your hints,
unfortunately I did not get them to run.
Could you perhaps help and enhance your first example with
the for loop like
I think when I have something like that my feature will run like I want it to run.
Thank you.
The dynamic argument passing
The dynamic argument passing you want to do cannot be done in C++, because C++ is not a dynamically typed language.
If you want to add the capability "format as string" to the type info class, that might be possible by extending the marshaling support with more functions. That is very different from "get the value of a field that is arbitrarily selected at runtime."
Preprocessor...
I wrote a similar tool a few years ago, based on the boost::wave preprocessing library.
Instead of requiring markup in a struct or class, my tool would preprocess the source files, generating template instantiations of a special template that I used for reflection, among other things.
For instance:
For each of the run-time functions (e.g. slots()) there was an equivalent compile-time type that could be iterated using boost::mpl. For instance, footype::_slots would be an mpl::vector of the slot type and name information.
To use the tool, it is only necessary to adjust the compile rules for your build system to use a custom preprocessor, and to enable preprocessing prior to compilation. Or, from the command-line:
If you want pre-processing,
If you want pre-processing, then there are many, many options open to you. You can even integrate it with Visual Studio, so it's not all that cumbersome to change your structures. However, you do end up with a more "static" solution, that in some sense separates declaration and behavior more than I intended with this illustrated solution.
For one of many ways to do marshaling and reflection using pre-processing, you might also want to compare my IDL-like solution that generates reflection visitors from source files as a pre-process step.
The separation of declaration and behavior is ideal.
If all the preprocessor is adding is static reflection information, then it is still up to the user to decide how to use this information. If the user never uses any of the reflection API, then the overhead is optimized away by the compiler, since it emits code that is pure template. Since the preprocessor adds reflection information to every type, the user never has to modify types to take advantage of it.
The solution is automatic. If structures change, a user does not need to adjust macros. Inheritance and containment relationships work automatically, since the preprocessor emits C++ code. Since the preprocessor extends C++, there is no intermediate language to learn or use, and the illusion of reflection being a language feature is preserved. In many ways, the solution looks a lot like what one would expect C++ reflection to look like, if it were supported by the compiler. It integrates well with STL and MPL. It is atomic, as a complete feature should be.
Marshaling is handled by a pure C++ library, using the reflection structures defined by the preprocessor. The advantage of this approach is that there is a clear separation of concerns between the reflection API and the serialization library. In fact, one could roll their own serialization library fairly quickly using just the reflection API. The reflection API even provides enough information in its base implementation to build a unique hash for a particular data model that changes the moment a new field is added to the type. In fact, the serialization API that I wrote in conjunction with this preprocessor provides all of the features that Java and .NET serialization provides... without the runtime overhead of reflection on these platforms. Since Boost MPL can be used to iterate over types exposed through the reflection API, it is possible to build serialization functions that are optimized at compile time to work with the structure.
Runtime reflection allows users to build functionality that can work with types unknown at compile time, meaning that it is possible to perform advanced patterns found in Java and .NET such as dependency injection or auto discovery. Granted, there is overhead involved with using runtime reflection, but it is a far more flexible feature than the typical abstract factory registration pattern typically used in C++.
I really should polish the implementation and open source it.
Excellent work
nice snippet, however under osx the default code will not compile due to g++ being a bit pedantic.
Once this is defined, ENDRTTI(foo) can be put after the "struct foo" declaration and will solve the _type::_theInfo undefined symbol link error.
That's right, I noticed that
That's right, I noticed that too.
An alternative is to turn the _theInfo member into something you get with instance<> instead of a member. Then you have to make sure that the getter function is referenced explicitly somewhere in your program (not just through static construction registration). In general, I find that that's not hard to do, because you do create instances of the structs to send them at the very least.
The other benefit of using the getter function approach is no additional macro needed.