Performance of more advanced features

Aug 24, 2012 at 10:43 PM

I am aware of the benchmarks for Simple Injector which are very favorable but are those done with simple registrations or how much does the batch registration (those registrations where you use reflection to get a list of types) influence the performance/registration time/application startup?


Aug 25, 2012 at 6:51 PM
Edited Dec 29, 2012 at 3:25 PM

To be honest, most of the DI container benchmarks you'll find on the internet are very naïve. They often test one or two basic ways of registrations services, and don’t test batch registration, open generic type mapping, and resolve unrealistically shallow object graphs. Often there are hidden costs in the containers that are hard to determine for a normal users.

Simple Injector has been designed carefully with performance in mind. It tries to prevent hidden costs as reflection, unneeded GC allocations, and even method calls during the happy path of resolving instances. This holds for both the core library as the extensions project. This doesn’t mean there is no reflection going on. On the contrary. However, after a type is requested for the first time, a delegate is compiled that will handle the creation of that type (and its dependencies) in the most efficient way. This delegate is cached within the container and any following request for that type will use that cached delegate. In most cases, you will see the performance to be roughly the same as newing up a complete type hierarchy manually (however, in practice you would never create complete type hierarchies manually, because this would lead to a maintenance nightmare in your Composition Root).

This doesn’t mean however, that everything is equally fast. The most obvious example is the difference between resolving a transient (Register) and a singleton (RegisterSingle). Since a singleton will be created just once, it can be returned directly, while transient object must always be created. When compiling a delegate for a transient type with a singleton dependency, that dependency will be stored as constant in the delegate, to prevent it from being requested from the container unneededly. This improves performance even further when building up deep object graphs.

Less obvious is the performance difference between registering a transient object using the generic parameterless Register<TService, TImplementation>() method and the overload that takes a Func<T> delegate: Register<TService>(Func<TService>). The former is faster, especially when it is part of an object graph, since the container has all information it needs to optimize performance. If you write all registrations using the Register<TService>(Func<TService>), you’ll see slight drop in performance. In that case will be about as fast as working with the Funq DI container, since Funq only allows registering delegates. This however, will still be pretty fast, especially compared to the performance of most other containers.

An even less obvious difference is that of the registration of initializers using the RegisterInitializer<TService>(Action<TService>) method. Simple Injector allows you to register an Action<T> delegate for a certain type. After the container created that type, the created instance is passed to the supplied delegate(s), before returning that instance.  This allows you to do some custom initialization, such as property injection. Simple Injector will find the registered Action delegates that can be applied to a certain type and all actions are called. Take for instance the registration of an Action<DbConnection> and Action<SqlConnection> delegate. Since SqlConnection inherits from DbConnection, Simple Injector will apply both delegates when the container creates a new SqlConnection instance. It determines which delegates to apply once using the information that was available during registration. When one or more delegates apply to a type, the Simple Injector will wrap the creation and calling of those Action<T> delegates in a Func<T> delegate. This delegate is cached, but because Simple Injector now uses a delegate, this means that the performance of this registration is about the same as that of the Register<TService>(Func<TService>) registrations (without taking the costs of calling the delegates into consideration). You could amplify this behavior in a benchmark to register an Action<object> as follows:

container.RegisterInitializer<object>(instance => { });

Since object is the base type in the CLR type hierarchy, this delegate will be hooked to all registrations. You will see the performance go down in a benchmark, but still for most applications, even this would probably not cause a delay with any significance to that application.

This design that tries to prevent hidden costs however, does have its consequences. Taking the initializers for instance, Simple Injector will not use runtime information to determine what delegates should be applied to a type. When for instance the following registration is made:

container.Register<IService>(() => condition ? new Service1() : new Service2());

Simple Injector could in theory apply an Action<Service2> initializer delegate when Service2 is returned. However, this will not happen, since this means that Simple Injector would have to check the returned type on each call, which would occur hidden costs. Instead, only Action<IService> initializers (and initializers for base types of IService) will be applied in this case, since this s the only information that is  available at registration time (Simple Injector knows nothing about Service1 and Service2 since they are newed up manually). Most containers take a different stand and use more runtime information, which allows more flexibility, but with the consequence of being slower.

This doesn’t mean however, that there aren’t any hidden costs in Simple Injector. GetCurrentRegistrations() and GetInitializer<TService>() are examples of two methods that have a performance that is not constant. There cost is O(n), which means that they become slower linearly with the amount of registrations and initializers in the container. They however, are not expected to be called in the happy path, and that’s why they aren’t optimized.

A significant but one time cost is that of registration and compilation, which is what your question is about. Obviously, when you need to register many types, registration takes longer. Even so, when you have many types, building and compiling delegates for all of them takes time, and this could even impact the start-up time of your application. Especially a call to container.Verify() can take an considerable amount of time, since Verify usually triggers the compilation of all delegates. For service applications, this will probably not be a problem, since they tend to be long running processes, but for desktop (or Silverlight) applications, this could cause a delay that is too long. It’s outside the scope of this answer to talk about this, but there are (simple) ways around this, if you are experiencing slow app startups. And remember, these are one time costs. Once a type is requested, the delegate is cached and reused for the lifetime of that container instance.

Since delegates are compiled and cached inside the container, creating new container instances can be very bad for performance. For every new container instance, the delegates need to be build and compiled, since for obvious reasons, these delegates cannot be cached at App Domain level. When you for instance create a new container for each web request, you might experience a significant performance drop (especially when you have a big application). You should typically create one Container instance per application / app domain.

You question was particularly about batch registration. The most prominent way to do batch registration is using the RegisterManyForOpenGeneric extension method, which allows you to register many implementations of a single open generic interface. This method will search through the given set of assemblies looking for concrete types that implement the given interface. This of course uses reflection to do so, but after this, it just calls back into the container and calls container.Register<TService, TImplementation>() per found type. The performance of resolving these types is therefore exactly the same as registering them manually. There will be some extra registration time overhead compared to manual registration, since the metadata is queried of the supplied assemblies. This overhead will be fairly small. However, there is often a reason for doing batch registration. Since registering all types manually gives simply too much maintenance overhead. So there often is no good alternative for batch registration.

A more interesting method is the RegisterOpenGeneric extension method, which allows you use a single open generic implementation when closed generic versions of a given interface are requested. As an example, take a look at the following registration:

container.RegisterOpenGeneric(typeof(IValidator<>), typeof(EmptyValidator<>));

This registration will ensure that when an IValidator<Person> is requested, an EmptyValidator<Person> is returned. This method does this by hooking onto the container’s ResolveUnregisteredType event. Every time the container gets a request for a type that isn’t registered explicitly, this event is raised. RegisterOpenGeneric will see if this is a request for an IValidator<T>, and if so, it will create an expression tree for the creation of the EmptyValidator<T>. This expression is passed to the container and this specific type is then registered in the container. Again, every following call for this same IValidator<T> will hit the cache first, which ensures the performance hit is one time. When you resolve many different IValidator<T> versions (such as IValidator<Person>, IValidator<Customer> etc), you will see a one time performance hit for each specific type. But again these registrations have a performance equals to manually registering using:

container.Register<IValidator<Person>, EmptyValidator<Person>>().
container.Register<IValidator<Customer>, EmptyValidator<Customer>>().
// etc

And even a feature as RegisterDecorator extension method that wraps instances has no significant performance cost, except of course the cost of the creation of the decorator (and its dependencies). Take a look at this registration:


This registration will ensure that every time a (closed) ICommandHandler<T> type (such as ICommandHandler<CreateUserCommand>) is requested, it is wrapped with a new ValidationCommandHandlerDecorator<T>. Again, this is done by generating expression trees and compiling delegates out of those expressions. A container like Microsoft’s Unity takes a different approach, and allows you to apply decorators using interception. Although some generation and compilation of IL is going on under the covers there, there is always some reflection happening on each call.

Does this mean that there are no ways to shoot yourself in the foot while using the Simple Injector? Of course there are ways, but hopefully, it is much less likely to cause performance problems while using the Simple Injector.

Marked as answer by dot_NET_Junkie on 11/4/2013 at 2:00 AM