28 November 2009

If you've ever had to do batch processing, then you know how tedious it can be to write all the infrastructure code surrounding retries and error recovery and usefully handling long running processing and all the other tedium that surrounds a typical batch application. For these types of applications, I use Spring Batch, a batch processing framework from Dave Syer and the fine people at SpringSource.

The basic idea is that you setup jobs that have steps, that have tasklets. This the normal use case, but by no means the only one. You use jobs and steps to string together sequences of processing input and writing to output via a reader and a writer. Spring Batch has implementations for both reading and writing that will likely meet most of your needs: XML, files, streams, databases, etc. There's so much interesting stuff here, so of course I humbly recommend you take a crack at the documentation or read my book, Spring Enterprise Recipes.

That said all said, there's no obvious way to read from an input source and then write to multiple files. The use case here, in my case, is Google's Sitemaps. These are XML files that describe the pages on your site. You list every URL possible. If you have more than 50,000 links, then you must create many files and list those files in a Sitemap index file. So, I wanted to read from a database and derive all the URLs possible for content, and then write those to sitemap XML files, where each sitemap could not exceed 50,000 entries. Spring Batch ships with an adapter writer that serves exactly this purpose. It's called org.springframework.batch.item.file.MultiResourceItemWriter. You define it just like you might any other writer, except that you wrap another writer with it.

Here's the salient bits from my configuration. Most of this is boilerplate. I don't include the configuration of the Spring Batch environment, or the configuration of the reader, because those are pretty typical. Note that here we configure the writer for the job and in turn configure its delegate property, where we have the real writer implementation. In this case, there's no need to configure the delegate writer's resource property.

 
<beans:beans xmlns="http://www.springframework.org/schema/batch" 
             xmlns:beans="http://www.springframework.org/schema/beans" 
             xmlns:aop="http://www.springframework.org/schema/aop" 
             xmlns:tx="http://www.springframework.org/schema/tx" 
             xmlns:p="http://www.springframework.org/schema/p" 
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
             xsi:schemaLocation=" 
http://www.springframework.org/schema/beans 
http://www.springframework.org/schema/beans/spring-beans-2.0.xsd 
http://www.springframework.org/schema/batch 
http://www.springframework.org/schema/batch/spring-batch-2.0.xsd 
http://www.springframework.org/schema/aop 
http://www.springframework.org/schema/aop/spring-aop-2.0.xsd 
http://www.springframework.org/schema/tx 
http://www.springframework.org/schema/tx/spring-tx-2.0.xsd"> 
    <beans:import resource="batch.xml"/> 
    <job id="batchForCreatingSitemaps"> 
        <step id="sitemap"> 
            <tasklet> 
                <chunk reader="reader" writer="writer" 
commit-interval="${job.commit.interval}"/> 
            </tasklet> 
        </step> 
    </job> 
    <beans:bean id="siteMapLineAggregator" 
class="com...sitemapscreator.SiteMapLineAggregator"> 
        <beans:property name="domain" value="${sitemaps-domain}"/> 
    </beans:bean> 
    <beans:bean 
class="com...sitemapscreator2.ResourceSuffixCreator" 
id="resourceSuffixCreator"/> 
    <beans:bean id="writer" scope="step" 
class="org.springframework.batch.item.file.MultiResourceItemWriter"> 
        <beans:property name="resource" 
value="file:#{jobParameters[outputResourcePrefix]}"/> 
        <beans:property name="resourceSuffixCreator" 
ref="resourceSuffixCreator"/> 
        <beans:property name="saveState" value="true"/> 
        <beans:property name="itemCountLimitPerResource" value="50000"/> 
        <beans:property name="delegate"> 
 
            <beans:bean 
class="org.springframework.batch.item.file.FlatFileItemWriter"> 
                <beans:property name="encoding" value="UTF-8"/> 
                <beans:property name="shouldDeleteIfExists" value="true"/> 
                <beans:property name="lineAggregator" 
ref="siteMapLineAggregator"/> 
            </beans:bean> 
 
        </beans:property> 
    </beans:bean> 
    <beans:bean id="siteMapUrlRowMapper" 
class="com...sitemapscreator.SiteMapUrlRowMapper"/> 
    ... 
</beans:beans>