Thursday, January 8, 2009

Limiting disk store usage with Active MQ

This week I have been onsite with Gary Tully, one of my colleagues from our Dublin office. We were visiting a customer having a very simple sounding requirement for Active MQ. His company contributes contents to various web pages and portals all over the internet and
he has a Tomcat infrastructure in place for providing the content. The next step is to note
whether a visitor to a site has recently visited other pages where he also provided some content.

Getting the information is quite simple by placing a cookie on the machine of the visitor so that the page visitor can be recognized again whenever he visits another page. Now whenever the visitor hits another page and we find the cookie, we also want to link that cookie with the newly hit URL. At the end of the day we would have a cookie related to a set of web pages. To keep the performance of the backend as high as possible, the web application shall only fire a notification of the correlation between cookie and web page and then deliver the content as fast as possible. The database maintenance should be done asynchronously.

The asynchronus link between the web application and the database is realized using the Fuse release of Active MQ 5.2. One requirement that sounded very odd in the beginning was to use persistent messages, but at the same time allow to drop messages in cases of heavy load. Translated that basically means to gather as much statistical data as possible, but it doesn't matter if we can't keep up with the load entirely.

Now what has this to do with messaging and ActiveMQ? - Well, we have a quite typical slow consumer problem here. We are producing messages much faster than we can consume them, because creating a message is more or less a string concatenation while consuming the message involves a database insert. As a result any buffer - on disk or in memory - might be filled up during longer periods of high traffic. The normal messaging behavior - also with ActiveMQ - is to slow down the message producer so that the consumers get a chance to catch up and ensure that no message is lost. In our case that would mean that the response times for delivering the content would decline, which should not happen. The solution we are looking for is to give the message broker a well defined amount of space for buffering messages and simply throw back an exception when the buffer is full.

That sounds easy enough and we were thinking along the lines of putting in place a proper system usage section in the broker configuration. We started looking at the default activemq.xml that is provided with an ActiveMQ download and in there we found the following section:



  <systemUsage>
<systemUsage>
<memoryUsage>
<memoryUsage limit="20 mb"/>
</memoryUsage>
<storeUsage>
<storeUsage limit="1 gb" name="foo"/>
</storeUsage>
<tempUsage>
<tempUsage limit="100 mb"/>
</tempUsage>
</systemUsage>
</systemUsage>

The memory usage is easy enough to understand and from Gary I learned, that the temporary storage is used for spooling out messages that have been sent NON_PERSISTENT while the storage is used for those messages that have been sent PERSISTENT. As we were sending only persistent messages to keep as much of the statistical data as possible even in case of broker restarts we simply modified the storage section.

To turn the behavior in case of reaching the buffer limits, the systemUsage xbean understands a boolean property sendFailIfNoSpace. Setting this to true should simply throw a JMSException instead of blocking the send call.

So we made those changes and they were happily ignored by our ActiveMQ broker. It took us a bit to understand why that was the case. The bottom line is that the system usage as seen above is not associated with a persistence adapter and therefore it is not associated with a store either. When the ActiveMQ broker is started, we saw a persistence adapter and a store being initialized, but we learned that the broker creates a default persistence adapter if none is in the configuration. This default one is not referenced by the storage settings.

As a result, it is important to explicitly specify a persistence adapter within the configuration file and reference it from the broker and the storage definition as shown in the configuration below.



  <beans>
xmlns="http://www.springframework.org/schema/beans"
xmlns:amq="http://activemq.apache.org/schema/core"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
http://activemq.apache.org/schema/core http://activemq.apache.org/schema/core/activemq-core.xsd
http://activemq.apache.org/camel/schema/spring http://activemq.apache.org/camel/schema/spring/camel-spring.xsd">

<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"/>

<bean id="store" class="org.apache.activemq.store.amq.AMQPersistenceAdapter" >
<property name="directory" value="target/amqdata" />
<property name="maxFileLength" value="1000000" />
<property name="checkpointInterval" value="5000" />
<property name="cleanupInterval" value="5000" />
</bean>

<broker xmlns="http://activemq.apache.org/schema/core"
persistent="true"
advisorySupport="false"
dataDirectory="target/amqdata"
deleteAllMessagesOnStartup="true"
useJmx="true"
brokerName="localhost"
monitorConnectionSplits="false"
splitSystemUsageForProducersConsumers="false"
start="false"
persistenceAdapter="#store">

<!-- Use the following to configure how ActiveMQ is exposed in JMX -->
<managementContext>
<managementContext createConnector="false"/>
</managementContext>

<!-- The maximum about of space the broker will use before slowing down producers -->
<systemUsage>
<systemUsage sendFailIfNoSpace="true" >
<memoryUsage>
<memoryUsage limit="400kb" />
</memoryUsage>
<storeUsage>
<storeUsage limit="10mb" store="#store" />
</storeUsage>
<tempUsage>
<tempUsage limit="64mb" />
</tempUsage>
</systemUsage>
</systemUsage>

<!-- The transport connectors ActiveMQ will listen to -->
<transportConnectors>
<transportConnector name="openwire" uri="tcp://localhost:0" />
</transportConnectors>
</broker>
</beans>

One caveat in the set up is to make sure that the maximumFileLength defined in the persistence adapter is less than half the maximum storage size. Otherwise only one data file is used by the underlying store and that will never get cleaned up. Having maximumFileLength set correctly ensures that we will have at least 2 data files.

When we finally were on our way home we decided that we need to get our findings in writing so that we could reference it in the future. I am working with Gary to get the test cases we produced and the configuration files we created into the ActiveMQ project.

Going Live at Apache Con EU 2009

After I have settled in in my new role at Progress within the Open Source Center of Competence I will "go live" in that new role at ApacheCon Europe 2009 in Amsterdam with 3 presentations.

In a session called Distributed Team Building we will have a closer look at developing a distributed application based on an Enterprise Service Bus. As the components and services are developed quite independently from each other,the specification for each component and it's documentation must be fit for retrieval and reuse. New challenges for development teams include knowledge sharing and knowledge reuse as well as shorter and more agile development iterations. Teams also have to overcome their fears when it comes down to changing their way of working. We will have a look at some strategies to address those challenges and overcome the fears in order to compose a team working successfully in an agile and distributed environment.

Another session called Servicemix Topologies has been inspired by the work I have done for a large retail chain in Europe. For that company I had to deploy round about 2000 containers collaborating with each other across eastern Europe. So in this talk we will look at some deployment topologies for ESB applications based on Apache ServiceMix. To illustrate the topologies we will deploy a sample application into various constellations and discuss the pro's and con's of each approach. Especially we will look at fail over and load balancing capabilities, throughput, deployment complexity and discuss the packaging of an application feasible for distributed deployment. Finally we will come up with a potential approach for deploying such a distributed retail application with Apache ServiceMix.

Finally we will turn to the topic of Tooling for ServiceMix where we will walk through the development and deployment of a sample application based on Apache Service Mix 4. After a short overview of the Service Mix 4 architecture and principles we will use the Eclipse based FUSE Integration Designer to design and implement a sample workflow. We will then package this workflow up into OSGi bundles and demonstrate how to deploy those via a central Maven repository into a given Service Mix instance instance.

Hope to see you in Amsterdam.