Right Outer Join

8 July 2014

MDM in the Cloud (on Amazon AWS Marketplace)

Semarchy MDM on AWS Marketplace

Semarchy shows off its 5 star reviews as the most popular MDM solution on Amazon’s AWS Marketplace

MDM in the Cloud

One of the biggest impediments to Master Data Management (MDM) projects is that they can be hard to get started. An enterprise has lots of people and lots of groups who all stand to benefit from improved data quality, structured data governance, and systematic master data management. But the very fact that so many people stand to gain from it is also a reason why it’s slow to start. Gathering requirements and opinions from everyone takes time.

One of the best ways to get quick agreement about what the scope for the first iteration of an MDM project is to generate a quick proof-of-concept or proof-of-value prototype. And one of the quickest ways to get started on an MDM prototype is by using software that’s completely pre-installed and pre-configured. This can lead to better alignment about what will be possible in an MDM project ensuring that a project will be more successful.

The cloud is a natural fit for this.

Amazon’s AWS Marketplace provides an environment where it’s easy to find software that’s relevant to your needs and get it launched instantly without any up-front costs. When I worked at Jaspersoft I invested quite a bit of time into getting a pre-configured JasperReports Server instance available and in making it easy for people to use Business Intelligence (BI) in the cloud. It was a natural fit especially for anyone who already had data in Amazon RDS or Redshift. The time we invested in that paid off nicely as customers flocked to it. Sales are way up; the reviews are great; and it should serve as a model and an inspiration to other vendors considering cloud offerings.

Semarchy in the Cloud

While business intelligence offerings in the cloud are legion, traditional Master Data Management vendors have been much too slow to embrace the cloud. The industry has taken baby steps. For example, Informatica purchased Data Scout and sells this as their SaaS MDM Salesforce.com plug-in solution. It’s a great utility for salesforce.com, but I don’t put it into the same class as enterprise MDM. Other SaaS MDM solutions are similar.

At Semarchy I see the cloud as an excellent vehicle for putting enterprise MDM into the hands of more users. You can have a fully functional MDM server running in an Amazon Virtual Private Cloud (VPC) in less than an hour. It’s accessible to only people from your company, and it’s ready for you to model your master data management requirements and to start fuzzy-matching and de-duplicating your data.

I expect other vendors to follow eventually. The net result will be improved solutions available to data management professionals everywhere. I’m pleased that Semarchy is leading the way.


20 April 2012

Making Micro Cloud Foundry Better

Jaspersoft on VMware Micro Cloud Foundry

Cloud Foundry is VMware’s Platform as a Server (PaaS) offering. It’s good. It’s potentially a very big deal. Infrastructure as a Service (IaaS) systems like Amazon EC2 and Eucalyptus are better know. But PaaS has the potential to be bigger is several ways. (That’s a topic for another article.)

Jaspersoft’s JasperReports Server has been available on Cloud Foundry for a while. See the webinar, read the blog, then try it out.

But as we were updating JasperReports Server Professional 4.5 for Cloud Foundry we ran into some problems. Here is a summary of the problems and solutions.

Problems on cloudfoundry.com

  1. Not enough memory
  2. Not enough files

It’s free to get started on cloudfoundry.com. It’s hard to complain about free… but it’s not impossible. Here I go. Two gigabytes is not enough. I would rather pay for four or eight than be stuck with two free gigabytes. I’m sure the Cloud Foundry team will be happy enough to accommodate this request soon. Someday they’ll offer a paid production version. But for now it’s only available for free and only with 2 Gb of memory. This is not enough. Specifically, it’s not enough for JasperReports Server. It says so right in the documentation. [warning: that link only works if you have access to the premium documentation, but for $100/year it’s a fantastic deal.]

Actually, 2 Gb really is enough to run JasperReports Server. It works fine for me. But the whole reason you probably want JasperReports Server on Cloud Foundry is because you already created an application there, and now you want to do some analytics or reporting. That means you have a maximum of 1 Gb remaining for JasperReports Server if you allocate any memory to the other application(s). A single gigabyte is really not enough.

While everyone understands “not enough memory”, the complaint “not enough files” is a little weird. This stems from the fact that Cloud Foundry only allows a process to have 256 open files. 256 files is not enough.¹ This is a bug, and it will be solved someday, I’m sure. But for now it’s a problem.

Solutions on Micro Cloud Foundry

  1. Unlimited memory
  2. Plenty of files

Brief non-whiny version

You should probably read the full article once to understand what’s going on. But for future reference, here is the very brief version of the required changes.

  1. Edit micro.vmx. Set virtualhw.version = “6”
  2. Set memory for the Micro Cloud Foundry instance to 4.5 Gb or more.
  3. Install vmc v0.3.16 or later: gem install vmc –pre
  4. Edit /var/vcap/packages/dea/dea/lib/dea/agent.rb.
    num_fds = 1024
  5. vmc env-add ‘my_application’ “JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=256m”

Those steps are too brief to be meaningful the first time through… so you’re stuck reading the full article at least once. Tough luck.

Unlimited memory

Micro Cloud Foundry puts all of the power of PaaS into your own hands. In theory this means I should be able to fix anything that I don’t like about cloudfoundry.com. In practice… it’s true! (Well done, guys.) But it’s not always simple. It requires a mix of different skills. Here are the details needed to get JasperReports Server and other applications running well on Cloud Foundry.

When you download Micro Cloud Foundry you find that the default amount of memory allotted to the VM is just 1 Gb. Pshaw. I laugh at your 1 Gb of memory. We need at least 2 Gb. But this means we need at least 4 Gb. That’s because Cloud Foundry will only make half of the total memory available to a standard user. Increasing this limit is not quite intuitive to new users of VMware. Here’s what is needed.

First edit the file micro.vmx in any text editor. You can read all of the details about why this is true in the VMware knowledge base. In short, you need to make this change:

original: virtualhw.version = "4"
modified: virtualhw.version = "6"

You need to make that change before launching the instance. If you have already launched it, then you can first confirm the piddly little 1 Gb.

C:\>vmc info
VMware's Cloud Application Platform
For support visit http://support.cloudfoundry.com

Target:   http://api.jaspersoft-webinar.cloudfoundry.me (v0.999)
Client:   v0.3.16.beta.6

User:     mdahlman@jaspersoft.com
Usage:    Memory   (0B of 1.0G total)  <-- Doh! This is not enough for anything fun.
          Services (0 of 16 total)
          Apps     (0 of 16 total)

Then stop the instance and make the change.

After editing micro.vmx, then you need to use VMware Player or VMware Workstation to allocate more memory to the instance. You need a bit more than 4 Gb because not quite the entire 4 Gb is made available to the instance when it’s running.

Then on launching (or re-launching) the instance you’ll see option 3 “Reconfigure Memory” clearly highlighted in orange. Choose 3. Let it use all of the memory you have just allocated, and restart. Now all is good:

C:\>vmc info
VMware's Cloud Application Platform
For support visit http://support.cloudfoundry.com

Target:   http://api.jaspersoft-webinar.cloudfoundry.me (v0.999)
Client:   v0.3.16.beta.6

User:     mdahlman@jaspersoft.com
Usage:    Memory   (0B of 5.5G total)

Well… things are not quite good enough yet. There is a bug in the current (as of 24 April 2012) stable version of vmc which causes deployment to fail when you have extra memory allocated. Never fear, this has already been fixed. But you need to upgrade to a beta release of vmc to get the fix. You need v0.3.16.beta.6 or later. Installing vmc is covered in detail in Cloud Foundry’s documentation. Updating to the pre-release version (beta version) is simple:

gem install vmc --pre

If you don’t upgrade to the latest and greatest vmc, then you’re destined for an error like this:

vmc push
Would you like to deploy from the current directory? [Yn]: y
Application Name: jrs-sample
Application Deployed URL [jrs-sample.jaspersoft.cloudfoundry.me]: y
can't dup NilClass
C:/Ruby192/lib/ruby/gems/1.9.1/gems/rubyzip2-2.0.1/lib/zip/zip.rb:1131:in `dup'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/rubyzip2-2.0.1/lib/zip/zip.rb:1131:in `block in dup'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/rubyzip2-2.0.1/lib/zip/zip.rb:1131:in `map'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/rubyzip2-2.0.1/lib/zip/zip.rb:1131:in `dup'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/rubyzip2-2.0.1/lib/zip/zip.rb:1367:in `initialize'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/rubyzip2-2.0.1/lib/zip/zip.rb:1378:in `new'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/rubyzip2-2.0.1/lib/zip/zip.rb:1378:in `open'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/rubyzip2-2.0.1/lib/zip/zip.rb:1400:in `foreach'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/vmc-0.3.15/lib/cli/zip_util.rb:29:in `entry_lines'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/vmc-0.3.15/lib/cli/frameworks.rb:44:in `block in detect'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/vmc-0.3.15/lib/cli/frameworks.rb:33:in `chdir'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/vmc-0.3.15/lib/cli/frameworks.rb:33:in `detect'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/vmc-0.3.15/lib/cli/commands/apps.rb:434:in `push'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/vmc-0.3.15/lib/cli/runner.rb:440:in `run'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/vmc-0.3.15/lib/cli/runner.rb:14:in `run'
C:/Ruby192/lib/ruby/gems/1.9.1/gems/vmc-0.3.15/bin/vmc:5:in `'
C:/Ruby192/bin/vmc:19:in `load'
C:/Ruby192/bin/vmc:19:in `'

Plenty of files

A Java application based on Spring needs lots of files. You can SSH to your Micro Cloud Foundry instance to see how many files may be opened at once:

root@micro:~# ulimit -n

Somewhat surprisingly, the above result is totally irrelevant. Your tomcat instance can still only open 256 files. This is built-in to the Cloud Foundry Ruby code. How can you prove this to yourself? Like this:

root@micro:~# ps aux | grep tomcat
22001     2993  0.8  7.9 1425380 454164 ?      SNl  18:00   0:20 /var/vcap/packages/dea_jvm/bin/java [... lots of stuff ...] org.apache.catalina.startup.Bootstrap start

root@micro:~# cat /proc/2993/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max open files            256                  256                  files

Fortunately, Cloud Foundry has a way to modify this value. You need to set the parameter DEFAULT_APP_NUM_FDS.

Unfortunately, Cloud Foundry has a bug and it ignores the value you set.

Fortunately, you can ignore this ignored parameter and just hard-code the value you want.

Edit this file: /var/vcap/packages/dea/dea/lib/dea/agent.rb

if limits = message_json['limits']
 mem = limits['mem'] if limits['mem']
 num_fds = limits['fds'] if limits['fds']
 disk = limits['disk'] if limits['disk']
# Jaspersoft decides to ignore the above code
# Instead we overwrite the value:
num_fds = 1024

With this hard-coded hack in place, the file limit is increased:

root@micro:~# ps aux | grep tomcat
22001     2993  0.8  7.9 1425380 454164 ?      SNl  18:00   0:20 /var/vcap/packages/dea_jvm/bin/java [... lots of stuff ...] org.apache.catalina.startup.Bootstrap start

root@micro:~# cat /proc/2993/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max open files            1024                 1024                 files

Surely the need for this hack will go away someday soon. I’ll update this article when it does.

But you aren’t quite done yet. The reason you need more files is because Spring-base applications need lots of files. And the reason they need lots of files closely related to the fact that they instantiate lots of different classes. The use of many classes implies the use of a lot of PermGen space. So you need to increase this from its tiny little default value. Cloud Foundry makes this easy. (This part isn’t a hack like the last part was.)

vmc stop jrs-pro-451
vmc env-add 'jrs-pro-451' "JAVA_OPTS=$JAVA_OPTS -XX:MaxPermSize=256m"
vmc start jrs-pro-451

Without this change, JasperReports Server will boot up successfully. But it runs into problems quickly. It’s an important change. It may not be obvious why the application is having troubles. When you look at the log it’s clear:

vmc logs jrs-pro-451
INFO: Server startup in 17742 ms
Apr 19, 2012 6:32:45 PM org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler process
SEVERE: Error reading request, ignored
java.lang.OutOfMemoryError: PermGen space

Once you have all the memory you could want and all the files you can shake a stick at, you’re on your way to taking full advantage of Cloud Foundry PaaS.


Although cloudfoundry.com does not today (24 April 2012) accommodate full-sized applications the way it should, it’s entirely to the Cloud Foundry team’s credit that Micro Cloud Foundry gives as much power as required to do what we need.

I’m sure cloudfoundry.com will catchup someday soon. I’ll update this article when it does.


Create a free website or blog at WordPress.com.