Right Outer Join

8 July 2014

MDM in the Cloud (on Amazon AWS Marketplace)

Semarchy MDM on AWS Marketplace

Semarchy shows off its 5 star reviews as the most popular MDM solution on Amazon’s AWS Marketplace

MDM in the Cloud

One of the biggest impediments to Master Data Management (MDM) projects is that they can be hard to get started. An enterprise has lots of people and lots of groups who all stand to benefit from improved data quality, structured data governance, and systematic master data management. But the very fact that so many people stand to gain from it is also a reason why it’s slow to start. Gathering requirements and opinions from everyone takes time.

One of the best ways to get quick agreement about what the scope for the first iteration of an MDM project is to generate a quick proof-of-concept or proof-of-value prototype. And one of the quickest ways to get started on an MDM prototype is by using software that’s completely pre-installed and pre-configured. This can lead to better alignment about what will be possible in an MDM project ensuring that a project will be more successful.

The cloud is a natural fit for this.

Amazon’s AWS Marketplace provides an environment where it’s easy to find software that’s relevant to your needs and get it launched instantly without any up-front costs. When I worked at Jaspersoft I invested quite a bit of time into getting a pre-configured JasperReports Server instance available and in making it easy for people to use Business Intelligence (BI) in the cloud. It was a natural fit especially for anyone who already had data in Amazon RDS or Redshift. The time we invested in that paid off nicely as customers flocked to it. Sales are way up; the reviews are great; and it should serve as a model and an inspiration to other vendors considering cloud offerings.

Semarchy in the Cloud

While business intelligence offerings in the cloud are legion, traditional Master Data Management vendors have been much too slow to embrace the cloud. The industry has taken baby steps. For example, Informatica purchased Data Scout and sells this as their SaaS MDM Salesforce.com plug-in solution. It’s a great utility for salesforce.com, but I don’t put it into the same class as enterprise MDM. Other SaaS MDM solutions are similar.

At Semarchy I see the cloud as an excellent vehicle for putting enterprise MDM into the hands of more users. You can have a fully functional MDM server running in an Amazon Virtual Private Cloud (VPC) in less than an hour. It’s accessible to only people from your company, and it’s ready for you to model your master data management requirements and to start fuzzy-matching and de-duplicating your data.

I expect other vendors to follow eventually. The net result will be improved solutions available to data management professionals everywhere. I’m pleased that Semarchy is leading the way.

10 July 2013

Oracle on Ubuntu

Filed under: Oracle, Semarchy — Tags: , , — mdahlman @ 11:52


I mostly use Mac OS X, but I needed to install Oracle to work with Semarchy MDM. I created a VM with Ubuntu 12.04 LTS to run Oracle. But installing Oracle XE was significantly harder than I had expected, so I’m documenting what I went through for both my own future reference and to help others.

Extra background details

You can skip this if you don’t care why I chose the different pieces that I chose.

  • Oracle. Semarchy requires Oracle. This is just a demo environment, so I want Oracle Express Edition (XE). I want the current version of Oracle (11g R2 as I write this). I would like Oracle to run on Mac. It did, but it doesn’t:
    “Oracle Database 10g Release 2 [… is] fully certified on Mac OS X.” –Oracle Technology Network
    Oracle 11g Client Tools are available for Mac, but there’s no server install.” -Myself (since it’s hard to find articles or press releases announcing lack of support for something)
    Downloads for Mac OS are conspicuously absent on the Oracle XE download page. But Linux is supported: “Oracle Database Express Edition 11g Release 2 for Linux x64”.
  • Ubuntu. It’s among the most popular distributions (perhaps the most popular). I considered Ubuntu 13.04, but I settled on Ubuntu 12.04 LTS because it’s a long-term support release. I don’t have any need to be on the cutting edge with this project. I want a GUI to have flexibility to install other tools, so I chose Ubuntu Desktop over Ubuntu Server. These instructions should work equally well on Ubuntu Server. I chose 64-bit because… well, this is 2013 and 32-bit just seems wrong. Actually, Oracle XE isn’t available for 32-bit, so that wasn’t really an option. CentOS could be a reasonable choice. But in my experience they’re much more server focused. It’s great on AWS, but I don’t know how it is on the desktop. More folks in my company use Ubuntu than other releases, so I’ll benefit from their experience.
    Oracle XE does not work in Windows x64, so Windows feels like a choice destined to cause extra problems in the future.
  • VMWare Fusion. Other choices like VirtualBox would surely be fine. But I already knew Fusion, and I already had a license for it.
    Update: I exported an .ova from Fusion and imported into VirtualBox. It worked just like it’s supposed to.

Ubuntu Installation

Armed with downloads and instructions, I installed Ubuntu. Then I discovered some eccentricities of Oracle that forced me modify my Ubuntu configuration. I ended up breaking things, so it because easier to simply re-install Ubuntu. Then I had to do it again. In the end I fully documented it:

  • The best way to install Ubuntu 12.04 LTS and configure to VMware Fusion to be ready to install Oracle.
  • All or most of the article applies to all or most versions of Ubuntu… but I only tested Ubuntu 12.04.

Oracle Installation – Zoinks

This was harder than I expected. The pain of installing and configuring Oracle on Ubuntu is what motivated this article.

First, I want to give credit to this outrageously helpful article about installing Oracle on Ubuntu. I have no idea who “Dude” is, but he’s clearly The Dude. I could not possibly installed Oracle on Ubuntu without this guide. Much of my article is based directly on that one.

But… that article is really hard to read. This is partly due to formatting. Presumably the forum changed some things after the initial post, so it wasn’t Dude’s fault. But it’s really hard to read now. It also lacks some updates for Ubuntu 12.04. Dude added the relevant details in this follow-up post. But it would be much better for Ubuntu 12.04 users to have that fix incorporated into the original code.

Also, the article contains too much detail in some places. It contains multiple ways of solving some problems. As a reference, that can be useful. But in this case I prefer just a single canonical Ubuntu solution. (Get it? Canonical. That’s a little Ubuntu joke.) Most of all, it documents how to fix your Ubuntu setup when you need to make changes to accommodate Oracle. But I was in a position to anticipate these issues and simply install Ubuntu with appropriate configuration settings. So the “Oracle Install” part of my document can focus on installing Oracle. I moved the “Configuring Ubuntu” sections into a separate Installing Ubuntu article.

So with a well-configured Ubuntu 12.04 machine as your starting point, let’s install Oracle 11g XE.

Launch Terminal

Oracle 11g Express Edition requires additional software that is not installed by default:

sudo apt-get install alien libaio1 unixodbc

If you followed my Ubuntu install you have plenty of swap space. Whether you installed like that or not, you should confirm that you actually have plenty of swap space.

cat /proc/meminfo | grep -i swap
SwapCached: 0 kB
SwapTotal: 3879932 kB
SwapFree: 3879932 kB

Yep. 3879932 kB is nearly 4 GB. Oracle XE requires 2 GB.
If you do not have at least 2 GB swap space, then fix it before proceeding. The awesome post I mentioned above is a good source to help with that.

Modify Kernel Parameters

Oracle 11gR2 Express Edition requires the following kernel parameters.

Log in as root:

sudo su -

Cut & paste the following directly into a command shell (not a text editor):

cat > /etc/sysctl.d/60-oracle.conf <<-EOF
# Oracle 11g XE kernel parameters
net.ipv4.ip_local_port_range=9000 65500
kernel.sem=250 32000 100 128
# kernel.shmmax=429496729

Log out from being root:


Load and verify the new kernel parameters:

sudo service procps start
sudo sysctl -q fs.file-max
sudo sysctl -q kernel.shmmax
sudo sysctl -q net.ipv4.ip_local_port_range
sudo sysctl -q kernel.sem

The SHMMAX kernel parameter defines the upper memory limit of a process. It is a safeguard to stop a bad process from using all memory and causing RAM starvation. The Linux default is 32 MB. The official Oracle XE installation documentation suggests a value of 4 GB – 1 byte (429496729 bytes). Since Oracle 11g XE has a 1 GB memory limit, a smaller footprint will be a better safeguard for the complete system. Setting the SHMMAX parameter to 107374183 will be sufficient.

Oracle Home Directory

This should already be well configured. Confirm that this is really true:

df -h /u01

This df command should have a result similar to this:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2       3.7G   72M  3.5G   3% /u01

This is important because if /u01 isn’t there, then the Oracle Installer will have big problems. If you don’t have it, then start over and re-install Ubuntu with a better configuration or else fix it. Re-installing Ubuntu is easier. But if that’s not an option for you then the fully documented, if quite complex, process to fix the existing Ubuntu instance is available.


Oracle 11gR2 XE under Ubuntu 12.04 will result in “ORA-00845: MEMORY_TARGET not support on this system” either at Oracle database startup or during the initial installation. Ubuntu 12.04 uses a newer version of the “systemd” system and session manager and has migrated away from /dev/shm and other common directories in favor of /run.

Here’s how to avoid the problem.
Login as root:

sudo su -

Cut & paste the following directly into a command shell (not a text editor):

cat > /etc/init.d/oracle-shm <<-EOF
#! /bin/sh
# /etc/init.d/oracle-shm
case "\$1" in
    echo "Starting script /etc/init.d/oracle-shm"
    # Run only once at system startup
    if [ -e /dev/shm/.oracle-shm ]; then
      echo "/dev/shm is already mounted, nothing to do"
      rm -f /dev/shm
      mkdir /dev/shm
      # Good for Ubuntu 11. Bad for 12. Instead use this:
      # mount -B /run/shm /dev/shm
      mount --move /run/shm /dev/shm
      mount -B /dev/shm /run/shm
      touch /dev/shm/.oracle-shm
    echo "Stopping script /etc/init.d/oracle-shm"
    echo "Nothing to do"
    echo "Usage: /etc/init.d/oracle-shm {start|stop}"
    exit 1
# Provides:          oracle-shm
# Required-Start:    $remote_fs $syslog
# Required-Stop:     $remote_fs $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6 
# Short-Description: Bind /run/shm to /dev/shm at system startup.
# Description:       Fix to allow Oracle 11g use AMM.

Log out from being root:


Install the oracle-shm init script that you just created:

sudo chmod 755 /etc/init.d/oracle-shm
sudo update-rc.d oracle-shm defaults 01 99

You will see a result like this:

 Adding system startup for /etc/init.d/oracle-shm ...
   /etc/rc0.d/K99oracle-shm -> ../init.d/oracle-shm
   /etc/rc1.d/K99oracle-shm -> ../init.d/oracle-shm
   /etc/rc6.d/K99oracle-shm -> ../init.d/oracle-shm
   /etc/rc2.d/S01oracle-shm -> ../init.d/oracle-shm
   /etc/rc3.d/S01oracle-shm -> ../init.d/oracle-shm
   /etc/rc4.d/S01oracle-shm -> ../init.d/oracle-shm
   /etc/rc5.d/S01oracle-shm -> ../init.d/oracle-shm


Once the machine restarts, verify that all went well:

sudo cat /etc/mtab | grep shm

Expected (desired) result:

none /run/shm tmpfs rw,nosuid,nodev 0 0
/run/shm /dev/shm none rw,bind 0 0

If you run the command before rebooting you would see the first line with tmpfs but not the second line.

Verify the available shared memory:

sudo df -h /run/shm

The upper limit of shared memory under Linux is set to 50% of the installed RAM by default. So a good result on a machine with 2 GB of RAM allocated is the following:

Filesystem      Size  Used Avail Use% Mounted on
none            999M  152K  998M   1% /dev/shm

Machine configuration

There are a few configuration changes needed to accommodate what the Oracle installer expects to find.

The following needs to be set for compatibility:

sudo ln -s /usr/bin/awk /bin/awk

Ubuntu uses different tools to manage services and system startup scripts. The “chkconfig” tool required by the Oracle installer is not available in Ubuntu. The following will create a file to simulate the “chkconfig” tool.

Log in as root:

sudo su -

Cut & paste the following directly into a command shell (not a text editor):

cat > /sbin/chkconfig <<-EOF
# Oracle 11gR2 XE installer chkconfig hack for Debian based Linux (by dude)
# Only run once.
echo "Simulating /sbin/chkconfig..."
if [[ ! \`tail -n1 /etc/init.d/oracle-xe | grep INIT\` ]]; then
cat >> /etc/init.d/oracle-xe <<-EOM
# Provides:              OracleXE
# Required-Start:        \\\$remote_fs \\\$syslog
# Required-Stop:         \\\$remote_fs \\\$syslog
# Default-Start:         2 3 4 5
# Default-Stop:          0 1 6
# Short-Description:     Oracle 11g Express Edition
update-rc.d oracle-xe defaults 80 01

Log out from being root:


Set execute  privileges for the script that you just created:

sudo chmod 755 /sbin/chkconfig

The preparation steps are complete! It’s finally time to begin the actual installation of Oracle 11g XE.

Install Oracle

Begin by unzipping the installer zip file. Most folks will do this by right-clicking and choosing “Extract Here“. But here’s the command line version for the sake of completeness (and for folks on Ubuntu Server).

cd ~/Downloads
unzip oracle-xe-11.2.0-1.0.x86_64.rpm.zip

The Debian Linux based package management of Ubuntu is not compatible with the Red Hat package manager this installer is delivered in. The Oracle installer needs to be converted using the following commands:

cd ~/Downloads/Disk1
sudo alien --to-deb --scripts oracle-xe-11.2.0-1.0.x86_64.rpm

This alien command took 3 minutes to run on my instance. You can delete the zip and the original installer to save space:

rm ~/Downloads/oracle-xe-11.2.0-1.0.x86_64.rpm.zip
rm ~/Downloads/Disk1/oracle-xe-11.2.0-1.0.x86_64.rpm

Install Oracle 11gR2 XE:

cd ~/Downloads/Disk1
sudo dpkg --install ./oracle-xe_11.2.0-2_amd64.deb

This dpkg command took 30 seconds to run on my instance. If you look through the output you’ll see the line, “You must run ‘/etc/init.d/oracle-xe configure’ as the root user to configure the database.” Do that next:

sudo /etc/init.d/oracle-xe configure
  • HTTP port: During the interactive configuration I set the port for Oracle Application Express to 8181 instead of the default value of 8080. I plan to install Tomcat, and I prefer to use 8080 for Tomcat instead.
  • Database listener: Keep the default value of 1521
  • Password for SYS and SYSTEM: MANAGER (It’s traditional. It’s totally insecure, but it’s simple. For my demo machine it’s perfect.)
  • Start on boot: Yes

This configure procedure took 2 minutes to run on my instance.

Set a password for the Oracle account:

sudo passwd oracle

In order to use sqlplus and other tools, the Oracle account requires specific environment variables.

Log in as oracle:

su - oracle

Copy the default account skeleton files and add the Oracle env script to .profile:

cp /etc/skel/.bash_logout ./
cp /etc/skel/.bashrc ./
cp /etc/skel/.profile ./
echo "" >>./.profile
echo '. /u01/app/oracle/product/11.2.0/xe/bin/oracle_env.sh' >>./.profile

Log out from being oracle. (This is important because you need to log out and log back in before the sqlplus command below will work.)


Enable remote logins to the XE GUI. Log in as oracle:

su - oracle

Login as SYSDBA then execute the relevant stored procedure:

sqlplus / as sysdba
SQL> exit
[this exits from SQL*Plus]
[this logs out from being oracle]

Lots more information is available here: Oracle® Database Express Edition Getting Started Guide

According to the Oracle documentation, the password for the INTERNAL and ADMIN Oracle Application Express user accounts is initially the same as the SYS and SYSTEM administrative user accounts. Well, I tried several times without success. [“I” refers to “Dude” in this sentence. I copied, reused, rearranged, and reformatted much of Dude’s work. But I’m not attempting to claim his work as my own. (“I” refers to “Matt” in these last two sentences.)] Reset the Apex Admin password:
Log in as oracle:

su - oracle

Log in as SYSDBA:

sqlplus / as sysdba

At the SQL prompt, execute apxxepwd.sql using the following command. You will be prompted to change the password:

SQL> @?/apex/apxxepwd.sql
SQL> exit

Now you can log in to the Apex Admin from a remote machine with this url:

Of course you can always log in locally like this:

It will prompt you to reset the password. It uses the most restrictive password rules that I have ever encountered. My first attempt at a password for ADMIN was ‘ADMIN’. It failed no fewer than six complexity rules. I eventually settled on this as the simplest acceptable password I could find:


For me, one of the most annoying things about sqlplus is that by default I cannot simply hit the up arrow to get back to the last command. Fortunately, the problem is easily solved.

First get the readline wrapper utility:

sudo apt-get install rlwrap

Then create an alias to use rlwrap with SQL*Plus:

su - oracle
cat >> ~/.bashrc <<-EOF
alias sqlplus="rlwrap sqlplus"

Log out from being oracle. (This is important because you need to log out and log back in before the new alias will take effect and sqlplus will have that wonderful up-arrow-for-retrieving-history functionality.)


Your Oracle instance should now be installed and configured and ready for you to use. This post made it possible for me to write my article. I hope I have not introduced too many errors. Please let me know if you found the article helpful.

It’s time for me to do some Master Data Management.

Create a free website or blog at WordPress.com.