Thursday, January 8, 2009

Limiting disk store usage with Active MQ

This week I have been onsite with Gary Tully, one of my colleagues from our Dublin office. We were visiting a customer having a very simple sounding requirement for Active MQ. His company contributes contents to various web pages and portals all over the internet and
he has a Tomcat infrastructure in place for providing the content. The next step is to note
whether a visitor to a site has recently visited other pages where he also provided some content.

Getting the information is quite simple by placing a cookie on the machine of the visitor so that the page visitor can be recognized again whenever he visits another page. Now whenever the visitor hits another page and we find the cookie, we also want to link that cookie with the newly hit URL. At the end of the day we would have a cookie related to a set of web pages. To keep the performance of the backend as high as possible, the web application shall only fire a notification of the correlation between cookie and web page and then deliver the content as fast as possible. The database maintenance should be done asynchronously.

The asynchronus link between the web application and the database is realized using the Fuse release of Active MQ 5.2. One requirement that sounded very odd in the beginning was to use persistent messages, but at the same time allow to drop messages in cases of heavy load. Translated that basically means to gather as much statistical data as possible, but it doesn't matter if we can't keep up with the load entirely.

Now what has this to do with messaging and ActiveMQ? - Well, we have a quite typical slow consumer problem here. We are producing messages much faster than we can consume them, because creating a message is more or less a string concatenation while consuming the message involves a database insert. As a result any buffer - on disk or in memory - might be filled up during longer periods of high traffic. The normal messaging behavior - also with ActiveMQ - is to slow down the message producer so that the consumers get a chance to catch up and ensure that no message is lost. In our case that would mean that the response times for delivering the content would decline, which should not happen. The solution we are looking for is to give the message broker a well defined amount of space for buffering messages and simply throw back an exception when the buffer is full.

That sounds easy enough and we were thinking along the lines of putting in place a proper system usage section in the broker configuration. We started looking at the default activemq.xml that is provided with an ActiveMQ download and in there we found the following section:



  <systemUsage>
<systemUsage>
<memoryUsage>
<memoryUsage limit="20 mb"/>
</memoryUsage>
<storeUsage>
<storeUsage limit="1 gb" name="foo"/>
</storeUsage>
<tempUsage>
<tempUsage limit="100 mb"/>
</tempUsage>
</systemUsage>
</systemUsage>

The memory usage is easy enough to understand and from Gary I learned, that the temporary storage is used for spooling out messages that have been sent NON_PERSISTENT while the storage is used for those messages that have been sent PERSISTENT. As we were sending only persistent messages to keep as much of the statistical data as possible even in case of broker restarts we simply modified the storage section.

To turn the behavior in case of reaching the buffer limits, the systemUsage xbean understands a boolean property sendFailIfNoSpace. Setting this to true should simply throw a JMSException instead of blocking the send call.

So we made those changes and they were happily ignored by our ActiveMQ broker. It took us a bit to understand why that was the case. The bottom line is that the system usage as seen above is not associated with a persistence adapter and therefore it is not associated with a store either. When the ActiveMQ broker is started, we saw a persistence adapter and a store being initialized, but we learned that the broker creates a default persistence adapter if none is in the configuration. This default one is not referenced by the storage settings.

As a result, it is important to explicitly specify a persistence adapter within the configuration file and reference it from the broker and the storage definition as shown in the configuration below.



  <beans>
xmlns="http://www.springframework.org/schema/beans"
xmlns:amq="http://activemq.apache.org/schema/core"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
http://activemq.apache.org/schema/core http://activemq.apache.org/schema/core/activemq-core.xsd
http://activemq.apache.org/camel/schema/spring http://activemq.apache.org/camel/schema/spring/camel-spring.xsd">

<bean class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"/>

<bean id="store" class="org.apache.activemq.store.amq.AMQPersistenceAdapter" >
<property name="directory" value="target/amqdata" />
<property name="maxFileLength" value="1000000" />
<property name="checkpointInterval" value="5000" />
<property name="cleanupInterval" value="5000" />
</bean>

<broker xmlns="http://activemq.apache.org/schema/core"
persistent="true"
advisorySupport="false"
dataDirectory="target/amqdata"
deleteAllMessagesOnStartup="true"
useJmx="true"
brokerName="localhost"
monitorConnectionSplits="false"
splitSystemUsageForProducersConsumers="false"
start="false"
persistenceAdapter="#store">

<!-- Use the following to configure how ActiveMQ is exposed in JMX -->
<managementContext>
<managementContext createConnector="false"/>
</managementContext>

<!-- The maximum about of space the broker will use before slowing down producers -->
<systemUsage>
<systemUsage sendFailIfNoSpace="true" >
<memoryUsage>
<memoryUsage limit="400kb" />
</memoryUsage>
<storeUsage>
<storeUsage limit="10mb" store="#store" />
</storeUsage>
<tempUsage>
<tempUsage limit="64mb" />
</tempUsage>
</systemUsage>
</systemUsage>

<!-- The transport connectors ActiveMQ will listen to -->
<transportConnectors>
<transportConnector name="openwire" uri="tcp://localhost:0" />
</transportConnectors>
</broker>
</beans>

One caveat in the set up is to make sure that the maximumFileLength defined in the persistence adapter is less than half the maximum storage size. Otherwise only one data file is used by the underlying store and that will never get cleaned up. Having maximumFileLength set correctly ensures that we will have at least 2 data files.

When we finally were on our way home we decided that we need to get our findings in writing so that we could reference it in the future. I am working with Gary to get the test cases we produced and the configuration files we created into the ActiveMQ project.

12 comments:

Torsten Mielke said...

Very interesting post Andreas!
Typically the persistenceAdapter is defined within the < broker > configuration, e.g:

< persistenceAdapter >
< kahaPersistenceAdapter directory="../activemq-data"
maxDataFileLength="33554432"/ >
< /persistenceAdapter >

Can this also be referenced instead of creating my own < bean > definition for the persistence adapter? Perhaps the ActiveMQ documentation needs an update too as well as the default activemq.xml configuration, which should perhaps have a comment that the system usage has no effect unless linked to a particular persistence adapter.

Unknown said...

Hi Andreas/Torsten,

i just had a client that has a similar use case where they have backlogs of 120K messages for a particular queue. they want to find out if there's a persistence store sizing guide they can use so that they can fill in the right values in persistence adapter and store usage elements in the config file.

my simple calc is this:

1) data = size of msg * backlog
2) index = data * 10%

total the 2 and you get the baseline value for persistence store. index can't be as big as data, so i thought 10% is a good estimate.

does this make sense? thanks




erwin pader
sr solutions consultant
progress

Andreas Gies said...

Hi Erwin,

i am not aware of a sizing guide as such. However, if i understand correctly, your client wants to limit the store size for a given destination.

In that case you can configure those using policies for individual queues. The estimates you set seem reasonable, i can't tell the exact numbers.

Be aware, that if you configure a broker wide usage and a queue specific usage, all queues will use the broker wide usage concurrently for all the space they can get until the broker limit or the individual queue limit is reached - whatever happens first.

This is particular important if you have other unlimited queues with heavy traffic. Those could eat up the system level store before your limited queues reach their maximum and therefore wouldn't grow to their full capacity.

I hope that helps
Andreas

Unknown said...

Hi Andreas,

each broker will only have 1 queue destination so per destination policy may be a minor issue. another related issue i want to task is indexbinsize. just what exactly is this attribute? does this represent the size of the persistent index? thanks!

Brett said...

Andreas,
Thanks for your insightful post. You said to Erwin:


"In that case you can configure those using policies for individual queues. The estimates you set seem reasonable, i can't tell the exact numbers."


I can't seem to figure out how to set this per queue. I'm using 5.2.0 the only parameter I seem to be able to find on a per queue basis is the memoryLimit setting on a policyEntry. Can you provide an example or reference here on how to set store usage per queue?

Thanks a lot.

-Brett Humphreys

Andreas Gies said...

Hi Brett,

well spotted. I have written the comment from memory, which in this case has failed me. The Destinations inherit the broker system usage and for the disk usage have no possibility to be overwritten by a policy.

What I have said in my earlier post only applies to the memory usage.

Apologies...shouldn't trust my memory too much ;)

Brett said...

Andreas,
Thanks for your input. I'm glad to hear that I wasn't just missing it as I scoured javadoc and XSD documentation. I guess the only real option then is to create a separate broker (in the same process) if I want true partitioning in the way persistence is handled. Would you agree?

-Brett

Andreas Gies said...

Hi Brett,

I don't know your exact use case, but yes you could create separate brokers and use this configuration to limit the disk usage per broker.

However, creating a broker introduces an overhead of threads and memory consumption which might be ok for a small number of queues, but with a growing number I would rather stick with one broker.

Perhaps it makes sense to file an enhancement rewuest with your use case at activemq.apache.org ?

If you would like to create a number of brokers, I would probably create a broker service using the java api rather than using spring, so that you could reuse the same broker setup and maintain the settings in one place.

Best regards
Andreas

Stefan Moser said...

I'm curious if Torsten's question has been answered? Is it possible to reference a persistence adapter that is defined with the broker and not in a separate bean?

cmoulliard said...

Hi Andreas,

Many thanks for the tip. Can we also use it with KahaDB store ?

I see in the javadoc that KahaPersistenceAdapter uses the following set Field (maxDataFileLength). Is it the same as you use with AMQ Data store ? Is 1000000 equivalent to 1mb ?

Kind regards,

Charles

Allinwebsolutions said...

Excellent Blog. I really want to admire the quality of this post. I like the way of your presentation of ideas, views and valuable content. No doubt you are doing great work. I'll be waiting for your next post. Thanks .Keep it up!


Regards:
Yahoo Store Design and Development 

Unknown said...

Good job. This is very nice post. I like your post. Its fantastic. Thanks a lot for share and that's my request please share me more post in future. rubber flooring uk