Right Outer Join

5 December 2013

Copy files between s3 buckets

Filed under: AWS, Linux — Tags: , , , , — mdahlman @ 15:06

The problem

I needed to copy files between Amazon AWS S3 buckets. This should be easy. Right?

To be clear, I wanted the equivalent of this:

cp s3://sourceBucket/file_prefix* s3://targetBucket/

The solution (short version)

No, it’s not easy.

Or rather, in the end it turned out to be pretty easy; but it was entirely unintuitive.

s3cmd cp --recursive --exclude=* --include=file_prefix* s3://sourceBucket/ s3://targetBucket/

The explanation (long version)

Get s3cmd

The best command line utility for working with S3 is s3cmd. You can get it from s3tools.org. If you’re on Amazon Linux (or CentOS or RHEL, etc) then this is the easiest way to install it.

# Note the absence of s3tools.repo in your list of repositories like this:
ls /etc/yum.repos.d/
# Put s3tools.repo in your list of repositories like this:
sudo wget http://s3tools.org/repo/RHEL_6/s3tools.repo -O /etc/yum.repos.d/s3tools.repo
# Confirm that you did it correctly:
ls /etc/yum.repos.d/

# Install s3cmd:
sudo yum install s3cmd

# Configure s3cmd:
s3cmd --configure

False start 1

s3cmd has a copy command, “cp”. Try that:

# This should do the trick:
s3cmd s3://sourceBucket/file_prefix* s3://targetBucket/

One file copies successfully… but then it crashes:

File s3://sourceBucket/file_prefix_name1.txt copied to s3://targetBucket/file_prefix_name1.txt

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    An unexpected error has occurred.
  Please report the following lines to:
   s3tools-bugs@lists.sourceforge.net
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Problem: KeyError: 'dest_name'
S3cmd:   1.0.0

Traceback (most recent call last):
  File "/usr/bin/s3cmd", line 2006, in 
    main()
  File "/usr/bin/s3cmd", line 1950, in main
    cmd_func(args)
  File "/usr/bin/s3cmd", line 614, in cmd_cp
    subcmd_cp_mv(args, s3.object_copy, "copy", "File %(src)s copied to %(dst)s")
  File "/usr/bin/s3cmd", line 604, in subcmd_cp_mv
    dst_uri = S3Uri(item['dest_name'])
KeyError: 'dest_name'

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    An unexpected error has occurred.
    Please report the above lines to:
   s3tools-bugs@lists.sourceforge.net
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Argh!! This stackoverflow answer confirms that s3cmd cp cannot handle this. (It is wrong, but for a long time I believed it.)

False start 2

This stackoverflow answer suggests “sync” as the command to use.

It is correct. But sync is not the same as copy, so this has bad side effects if what you really want to achieve is copying files. For example, sync will remove files in the target folder (to keep things in sync, duh). So syncing from source1 and source2 into a single target will cause grief. For copying all files from one location to another it’s great. I wanted to copy files, and I did not want any of the side effects of sync.

Bad alternatives

You can write your own script using boto and python or muck around with awk and getting lists of files to copy one-by-one. In principle these will work, but yuck.

You could download the files from s3 then put them back up into the intended target bucket. This is a terrible solution. It will succeed… but what a waste of time and bandwidth. What makes it so tempting is that s3cmd works exactly like you want it to work with “get” and “put”.

s3cmd put /localDirectory/file_prefix* s3://targetBucket/

If “put” is so easy, why is “cp” so hard?

Enlightenment

I studied the s3cmd options over and over. Eventually I realized “cp” had more flexibility if you look deep enough.

  • –recursive
    In my mind, my requirement is clearly not recursive. I simple want multiple files. But recursive in this context just tells s3cmd cp to handle multiple files. Great.
  • –exclude
    It’s an odd way to think of the problem. Begin by recursively selecting all files. Next, exclude all files. Wait, what?
  • –include
    Now we’re talking. Indicate the file prefix (or suffix or whatever pattern) that you want to include.
  • s3://sourceBucket/  s3://targetBucket/
    This part is intuitive enough. Though technically it seems to violate the documented example from s3cmd help which indicates that a source object must be specified:
    s3cmd cp s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]

I posted a brief version of my answer to the most elegant of technical websites. You should vote it up. But that didn’t seem like the best place to elaborate on the answer as I’ve done here.

Postscript

Amazon offers a command line interface (CLI) tool to do the same thing. AWS Command Line Interface. I swear that I looked extensively and repeatedly for exactly this saying, “I just can’t believe that Amazon wouldn’t have this by now.” Well, they do. I have no idea why I could not find it, but I’m mentioning it here for my own future reference and for anyone else who is using s3cmd as an alternative to the Amazon utility that they couldn’t find.

I have no idea if the Amazon CLI is [ better | worse | different ] than s3cmd in any interesting way regarding S3. (It’s certainly different in the respect that it interacts with many other AWS services besides S3.) If I ever need to compare them, then I’ll write it up.

 

16 September 2013

Listen on port 80

Filed under: Linux — Tags: , , , , , , — mdahlman @ 11:44

Problem

I have an application server running on port 8080. I want it to listen on port 80. In my case it was Tomcat, but this applies to any application server.

I know this problem is somewhat common problem. I get lots of Google hits on it. But I have found that the answers are surprisingly non-great. They often assume a set of knowledge that doesn’t match with my personal knowledge. They [probably] tell me everything I need to know, but they tell me a lot more as well. This is not better; it’s hard to find what’s really important. This iptables answer on serverfault.com was really quite good. But it offers a little too much detail without offering firm enough guidance about what the best and simplest solution is. I want just one perfect answer if I can find it.

Answer

sudo iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 8080

It’s that easy. Now your app server can continue to run on port 8080. If port 8080 is open to the outside world then you’re free to connect directly to it… but you can also connect on the traditional port 80.

But… you’ll lose the change if your machine reboots. So there’s one more step. An Amazon Linux I used the following. It should be fine on CentOS and RHEL etc.

sudo service iptables save

On Ubuntu I found it easiest to persist the change like this:

sudo apt-get install iptables-persistent

Alternative Answers

There are, of course, an infinite number of alternatives. I’m more interested in having one easy-to-understand solution than having lots of alternatives. But sometime it’s useful to consider the alternatives explicitly… even if it’s only to mock and ridicule them afterwards.

Run your app server on port 80. I declare this to be a bad solution. But hey, maybe you’ve got a valid use case for this. We tracked down how to do it in the past. I found it to be difficult (grabbing those ports below 1024 is intended to be tough), and I found it to have bad side effects (some things broke on upgrades). The side effects were surely our own fault… but the ‘iptables’ solution above is much less prone to side effects. And running your application server as root in order to access port 80 opens security issues as well.

Run a web server on port 80 in front of the application server and route requests to the application server as appropriate. This is a fine solution. In fact, it’s vastly better in a bunch of ways. I have used it myself several times. It’s just overkill for many needs. Administering httpd isn’t so difficult… but it’s harder than not administering httpd.

Edit the file /etc/sysconfig/iptables manually. Yuck. Sure… you could… but why? The command ‘iptables’ exists to make your life easier. Let it.

6 August 2013

Oracle SQL*Loader on Amazon Linux

Filed under: Linux, Oracle — Tags: , , , , , , — mdahlman @ 14:22

SQL*Loader on Amazon Linux

I’m using Oracle on Amazon RDS. I want to load some data into it from an EC2 instance. SQL*Loader (sqlldr) is a reasonable way to load data into Oracle. Amazon agrees. But for someone who’s a little rusty with Oracle installation procedures, it’s a bit harder to get the SQL*Loader client installed than I had hoped.

Find the Oracle Client

I didn’t think this section would need to exist. But it was harder to find than one might expect.

First, don’t be fooled into thinking Oracle Database Instant Client will be instantly useful. Or even eventually useful. It doesn’t have SQL*Loader.

11g is listed under 12c

11g is listed under 12c

Second, don’t be fooled into thinking Oracle’s download page for Oracle 12c includes downloads for Oracle 12c. Well… it does… but it also includes the downloads for Oracle 11g. Go figure.

Third, don’t be fooled into thinking the lack of links to anything labeled “client” is a problem. Just follow the link “See All” to get to the client downloads. Of course. It’s even explained in the improbably punctuated note below the links, “- See All, page contains unzip instructions plus Database Client, Gateways, Grid Infrastructure, more”.

The “See All” link corresponding to “Oracle Database 11g Release 2 Client (11.2.0.1.0) for Linux x86-64” got me to the correct spot:

Get the Oracle Client

With the link in hand it’s trivially easy to download the installer. Not so fast. It’s possible to download the installer to my laptop and then upload it to my EC2 instance. But that’s slow, and it’s a terrible waste of bandwidth. I want to download it directly onto the EC2 instance.

The problem is that the download page requires me to accept the license terms before the download link will work, but the EC2 instance has no GUI in which to easily do this.

A naive attempt like this will fail:

wget http://www.oracle.com/correct/download/link.zip

The issue addressed in a blog on My Oracle Support. I’m optimistic that it ought to work as indicated. But the solution was old, seemingly brittle (failed based on locale), and strangely unofficial feeling. I don’t need to automate the process, so it’s much easier to understand with a quick manual process.

  1. Login. Accept the license agreement. (This is done while browsing from your local machine. (This step is genuinely easy! Hooray!))
  2. Get the relevant cookie. This was harder than I expected. Chrome and Firefox store their cookies in a SQLite database. Various browser extensions and database clients allow you to get at them. But I found the Chrome extension Cookie.txt export to be the simplest way to get the info. Just click the button that the extension creates and copy the complete contents of the popup.
  3. Save the cookie information. On the Amazon EC2 instance create a new file called cookies.txt. Paste in the text copied in the previous step. (Details are left as an exercise for the reader. Use vi or cat or whatever. If you get stuck here feel free to post a comment.)
  4. Run wget using the new cookie file:
wget -x --load-cookies cookies.txt -O linux.x64_11gR2_client.zip http://download.oracle.com/otn/linux/oracle11g/R2/linux.x64_11gR2_client.zip

Run the Oracle Client Installation

This final step sounds trivial… but once again I realized I needed a few sub-steps. I’m using Amazon Linux which is decidedly un-GUI. I had forgotten that the Oracle Client doesn’t have a simple interactive text version. It’s all-or-nothing silent install or GUI install.

Install x11:

sudo yum install xorg-x11-xauth
exit

Then log back in. But… don’t forget to use the -X option. I’m on Mac, so this part works easily. On Windows you can do the equivalent with PuTTY, but you’ll need to look up the details.

ssh -X -i mykey.pem ec2-user@ec2-123-456-246-579.us-west-1.compute.amazonaws.com

Test that x11 will work as intended:

sudo yum install xclock
xclock

If xclock pops up, then the Oracle Client installation should be good as well:

./client/runInstaller
Oracle Client Installer

Oracle Client Installer

And finally, don’t forget to choose the Administrator installation type. After all, the whole point was to get SQL*Loader, and that’s the only option where it’s included.

Bonus Appendix

Once you have SQLPlus installed, you’ll want rlwrap installed. It will allow you to hit the up arrow to get your command history. SQLPlus is miserable without it. The Amazon Linux repositories do not have rlwrap. But EPEL does. So here’s how to install it with a single line:

sudo yum -y install rlwrap --enablerepo=epel

Here’s a good way to transparently launch SQLPlus with rlwrap giving you access to your command history.

#Add these lines to .bashrc for both ec2-user and oracle:
alias sqlplus="rlwrap sqlplus"

Error Appendix

The first time I tried to run the install I got this error:

ubuntu@ip-10-48-138-63:~/wget_test/download.oracle.com/otn/linux/oracle11g/R2/client$ ./runInstaller
Starting Oracle Universal Installer...
...
>>> Ignoring required pre-requisite failures. Continuing...
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2013-08-06_07-26-42PM. Please wait ...ubuntu@ip-10-48-138-63:~/wget_test/download.oracle.com/otn/linux/oracle11g/R2/client$ Exception in thread "main" java.lang.NoClassDefFoundError
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:164)

This is clearly a Java problem. It clearly has nothing to do with X11. Except… well… it does. Installing xterm or x11 (installing xorg-x11-xauth as indicated above) solved it for me.

9 July 2013

Ubuntu 12.04 on VMware Fusion

Filed under: Linux, Oracle — Tags: , , , , — mdahlman @ 07:20

Summary

Installing Ubuntu 12.04 using VMware Fusion was easy enough. But it wasn’t easy to get it exactly the way I wanted it. So I ran through the process several times to get exactly what I wanted. It might be useful for other folks investigating Ubuntu. I wrote this as the first of two articles explaining how to install Ubuntu and Oracle. The subsequent article covers installing Oracle on Ubuntu.

Background

  • Download the relevant iso file: Ubuntu 12.04 LTS 64-bit
  • The standard install is indeed very simple. But it was insufficient for nicely preparing the machine for an Oracle installation.
  • I started with this Fantastic guide to installing Ubuntu on VMware Fusion. It was very helpful, and I want to give full credit to Eirik Didriksen and Hans Petter Langtangen for this excellent document.
    One early step was confusing because Fusion has changed, but that’s a pretty minor complaint.
  • But in the end it was mainly the need for custom partions that drove me to document this Oracle-specific version of an Ubuntu installation.

Everything should apply outside of Fusion as well. If you want to install Ubuntu on a brand new machine, the same things apply. Just skip the Fusion sections.

Fusion Stuff

  • Obtain and install VMware Fusion. (Or VMware Workstation. But I tested with Fusion.)
  • Create New Virtual Machine.

New

New Virtual Machine

  • Continue without disc

Continue without disc

Continue without disc

  • Use operating system installation disc or image: ubuntu-12.04.2-desktop-amd64.iso (Note that Eirik & Hans say “Create a custom virtual machine”. This is probably an alternative path to the same destination.)

Installation media

Installation media

  • Linux / Ubuntu 64-bit

Choose operating system

Choose operating system

  • NOT “Use Easy Install”

Not easy install

Not easy install

  • NOT “Finish” but rather “Customize Settings
  • Save As: NOT the proposed name of “Ubuntu 64-bit” but rather “Ubuntu64Oracle11gXE
    The default proposed name is “Ubuntu 64-bit”. This caused me some pain later. In hindsight I found it important to avoid spaces: “Ubuntu64Oracle11gXE”.
  • This pops up the general Fusion “Settings” window.
  • Change memory from 1024 MB to 2048 MB
  • Info: My hard disk size defaulted to 20 GB; I kept this default.
  • Info: My Networking defaulted to “Share with my Mac”; I kept this default.
  • Info: I did not specify a startup device, so it starts from the [unspecified-by-me] default device.
Customize settings

Customize settings

Customize memory

Customize memory

Customize shared folders

Customize shared folders

  • Startup the VM for the first time

Start the VM

Start the VM

At this point you are done with the “Fusion” configuration. The following steps are Ubuntu configuration, so they will apply equally to anyone using Ubuntu whether it’s in a virtual machine of some sort or not.

Ubuntu Installation Stuff

  • “Install Ubuntu” (not “Try Ubuntu”. I chose English.)

Install Ubuntu

Install Ubuntu

  • “Download updates while installing” (You might as well update to the latest stuff while getting started.)
Preparing: default settings

Preparing: default settings

Preparing: my choices

Preparing: my choices

  • CRITICAL (for Oracle): “Something else”
    It’s possible to make changes later. It’s nicely documented how to fix partitions in this forum post. But I’m attempting to install everything correctly to prevent problems rather than waiting for the problems to arrive and then working around or solving them.

Installation type: something else

Installation type: something else

  • Set up partitions as shown in the detailed partitioning section below. The key idea is creating an ext3 partition.
Partitions: default setup

Partitions: default setup

Partitions: good for Oracle XE

Partitions: good for Oracle

  • “Install Now” and it does its thing. It prompts for location, keyboard, etc.
  • NOT the proposed machine name of “vmadmin-virtual-machine” but rather “Ubuntu64Oracle11gXE”.
  • For example, I used these values:
    VMadmin
    Ubuntu64Oracle11gXE
    vmadmin
    vmpass  (Ubuntu lets you know that this is weak)
Install now

Install now

Who are you

Who are you

 

  • Depending on network speed, spend a long time seeing screen with this at the bottom:
    Retrieving file 12 of 57 (26s remaining)
  • Go get coffee. I went for lunch. The install followed by the reboot takes a while.

Welcome (retrieving)

Welcome (retrieving)

  • After the initial reboot, log in and run the Update Manager. In my case there were 256 updates available, and I installed all of them. It requires another reboot.

Update manager

Update manager

Detailed Partitioning Info

Oracle 11g does not support ext4. Ubuntu 12.04’s default filesystem is ext4. Doh!
I don’t know the exact implications of using ext4 with Oracle. Maybe it would mostly work just fine. Maybe it will fail at the most inconvenient possible moment and reduce your machine to smoking pile of rubble. Using a supported file system seems like a good idea. It’s easy enough to do; it’s a lot easier to get it right when you get started rather than fixing it later, so I did that. (But fixing it later is possible.)

  • By default everything goes into one big partition: /dev/sda
    Instead of accepting that, we’ll create a few separate partitions.
  • “New Partition Table…”, Continue
Partitions: default

Partitions: default

Create new partition table

Create new partition table

  • Now you see “free space”

Free space

Free space

  • Select “free space”, “Add…”
  • 13500 MB Ext4 at / (primary)
Free space: Add...

Free space: Add…

Add Primary ext4 partition

Add Primary ext4 partition

  • Select “free space”, “Add…”
  • 4000 MB Ext3 at /u01 (primary)
    Is it important to choose ‘primary’ rather than logical? Sergey says, “it does not matter much
    If you choose logical partition then it will mount slightly differently: /dev/sda5
    I don’t know why it uses sda5 instead of sda2. I suspect that either way is perfectly fine.Is it important to choose ‘ext3’? Yes!
    That’s the main reason we’re going through this partitioning.Is it important to choose ‘u01’ for the mount point? YES!!
    It’s vitally important to use ‘u01’. The Oracle installer will rely on this.

Add Primary ext3 partition

Add Primary ext3 partition

  • Select “free space”, “Add…”
  • All remaining space (3974 MB) for the swap partition (primary, swap area, no mount point)

Add Primary swap partition

Add Primary swap partition

  • Change “Device for boot loader installation” to /dev/sda1
  • Return to the main workflow above.
    Find the step: “Install Now” and it does its thing. It prompts for location, keyboard, etc.

Perfectly partitioned

Perfectly partitioned

More Fusion Stuff

Install VMWare Tools

The VMWare tools are useful for copy files, copy/pasting with the clipboard, etc. It’s documented reasonably well in Knowledge Base Article 1022525. I won’t repeat the steps here; just don’t forget to do it.

Use a static IP address

I want my instance to always reboot with the same IP address. That makes it much easier to connect to services running on the instance. In my case I have a Tomcat instance on my Mac running Semarchy MDM pointing to this Oracle instance. By default the VM will get an IP address from the VMWare host. Depending on config settings it might get the same address as last time. In practice I found the my instance incremented its IP address by one each time. Lots of articles already exist describing this issue: this one, that one, another one, and even vmware knowledge base articles. They all say basically the same thing. And they’re all basically correct. But…

I found they were universally imprecise about what the “vm-hostname” actually is. And they universally ignored the issue of hostnames with spaces in them. If you worked through this full article, then you saw that I used a name with no spaces or special characters, Ubuntu64Oracle11gXE, even though alternative names were proposed at a few points during the process. That’s because I ran into problems with the suggested DHCP configuration options working correctly when I was on one network but getting ignored when I was on another network. I couldn’t ever track down precisely what the issue was. So  I wanted get everything correct the first time through and complete avoid trying to trouble shoot it later.

  • Find the MAC address used by the guest VM for network connectivity ifconfig
  • Update dhcpd.conf like this:
sudo vi /Library/Preferences/VMware\ Fusion/vmnet8/dhcpd.conf
####### VMNET DHCP Configuration. End of "DO NOT MODIFY SECTION" #######
# START added by mdahlman
host Ubuntu64Oracle11gXE {
    hardware ethernet 00:0c:29:cc:7e:ab;
    fixed-address 192.168.191.11;
}
# END added by mdahlman
  • Edit /etc/hosts like this:
sudo vi /etc/hosts
  • Add this line:
192.168.191.11  Ubuntu64Oracle11gXE
  • Reboot the machine and confirm that it comes up with the address 192.168.191.11
  • Now it’s time to install Oracle. (Or just enjoy your Ubuntu instance if you don’t need Oracle.)

Blog at WordPress.com.