Right Outer Join

20 November 2013

Citi doesn’t get it

Filed under: JasperReports — Tags: , , , — mdahlman @ 22:12

I received an email today with this quote:

Once you register a purchase online with Citi Price Rewind, we will search our database of online merchants for a lower price for 30 days after the purchase date. If we find a price that is at least $25 less than what you paid, you can be eligible for a refund of the difference, up to $250 per item.

This is an email from someone who deeply “doesn’t get it”. Allow me to elaborate.

“Once you register a purchase…” Citi already knows all of my Citi purchases. All of them. They bill me for them, so they have to know them. But they still make me register a purchase. This is a waste of time. It’s silly.

“… search our database of online merchants …” If I buy a $300 mixer at Target, they know. If a million other people buy the same mixer, they know. But they don’t consider these purchases. They only look at online purchases. This is intentionally incomplete.

“… a price that is at least $25 less than what you paid …” They’ll keep it to themselves if I could have gotten the same item for $23 cheaper somewhere else? This is petty and mean. If I could have saved a nickel somewhere else, then they should tell me.

“… you can be eligible for a refund …” Sweet! I’ll get a refund! Oh wait… I’ll be eligible for a refund. What?

Desired situation:
I use my Citi card for purchases. Citicard looks out for me; if they find the item cheaper then they refund me the difference.

Actual situation:
I use my Citi card for purchases. If I think to manually go to “Price Rewind” then…

  • Read 1500 words explaining the fine print of what’s covered.
  • Then provide the details about it:
    How much did it cost?
    When did you buy it?
    Where did you buy it?
    This should surely be a skit on SNL. They want me to tell them the cost of something I just bought using their card? I should tell them the date and location of the transaction that they already know? This is cynical and stupid.
  • Having selected an item and consulted a lawyer … I then …
    wait 30 days.
    Then I receive my refund.
  • Oops, no. I don’t receive a refund. I receive an email indicating that I’m eligible for a refund. I’m then invited to upload a scanned copy of my receipt. I assume you all file all of your receipts  for all purchases by date for future reference and uploading to cynical credit card offers. I do. I photocopy all of them and cross file them by date, merchant, product line, color, and average specific gravity of the products. Who doesn’t?

I appreciate the warm regards from Jud Linville. But his email inspires me to use other credit cards instead of my Citi card.

If they want to inspire me to use their credit card, then they should do something for me. It should require no effort from me. They should call within an hour of my purchase to say, “Item ABC is available for $X cheaper at store XYZ which is within 5 miles of your purchase.” They wouldn’t have to refund me a cent. But they would let me know that I could return my item and buy it cheaper somewhere else. That would be putting big data to a practical use which helps me instead of giving me useless, legalistic, nearly-impossible-to-use delayed benefits.

Advertisements

27 September 2012

JasperWorld 2012 favorite quote

Filed under: Quotes — Tags: — mdahlman @ 09:09

JasperWorld 2012

Yesterday was the last day of JasperWorld 2012, and it was a big success. Lots of customers, community members, and partners learned (and taught!) lots of new things.

Most of the general sessions were very interesting, but particularly near and dear to my heart was the big data panel.

  • Don DeLoach, Infobright
  • Billy Bosworth, DataStax
  • John Kreisa, HortonWorks
  • Mark Hydar, VoltDB
  • Moderator: Claudia Imhoff, Boulder BI Brain Trust (BBBT)

All panelists provided good insight into big data today and in the next 3 years. But as I’m looking back through my notes now, I found my second favorite quote from the session.

You’d be amazed at what a guy can do with 6 weeks, $50k, and the cloud.
– Billy Bosworth

My favorite quote came immediately after that:

IT hates those guys.
– Billy Bosworth

13 February 2012

Hadoop Hive and iReport

Filed under: JasperReports — Tags: , , , — mdahlman @ 11:09

Hadoop Hive and iReport

In articles that I write, I tend to focus on Jaspersoft technologies. I leave it as an exercise for the reader to figure out how to manage the backend data source whether it’s an XML file or a relational database or a big data solution like Hadoop.

I just came across an article with some really useful tips about configuring Hadoop Hive and then connecting to it from iReport. After configuring Hive, it then shows an interesting query example using exploding lateral views(!!). Don’t worry, the article explains what the lateral view HiveQL syntax means. It’s a great big picture view of getting Hadoop configured well and then performing reporting or analysis tasks on it.

Take a look at this short tutorial covering Hadoop Hive configuration and iReport. Maybe someday soon they add the next step of performing more Ad Hoc analysis using JasperReports Server as well.

3 February 2012

Collections in reports on MongoDB

Filed under: iReport, JasperReports — Tags: , , , — mdahlman @ 15:25

Collections in Collections

MongoDB stores data in collections which correspond to tables in relational databases. These collections hold data like Strings and Numbers and… Collections. These Collections are generally Arrays. (Always Arrays?) A classic example is orders which have order_line_items. In a normalized relational database you would have one table order_line_items with a foreign key pointing to the table orders. A query might look like this:

     select o.order_id, o.customer_id, oli.order_line_item_id, oli.product_name, oli.quantity
     from orders o
          inner join order_line_item oli on ( oli.order_id = o.order_id )

The result of this query will be a tabular result set which JasperReports or any other report engine will have no trouble processing.

But that same data set is likely to modeled in MongoDB as single collection called orders. Each record will have an order_id, a customer_id, and an array of order_line_items. For many purposes this is ideal. A program can retrieve a few orders, it can iterate through the order_line_items, and it can process them however it likes. But this handling of Arrays within other records poses some pitfalls for many standard reporting scenarios.

Start Simple

In the simplest case, the Arrays have N elements where N is known at design time. Think of the field ‘location’ storing longitude and latitude. We know that the Array ‘location’ will always contain two elements. We know ‘location[0]’ can be treated as the longitude, ‘location[1]’ is the latitude, and we can safely ignore ‘location[17]’ in all of our reports.

The Jaspersoft MongoDB connector handles this case fantastically well. Consider this example.

     db.orders.save({
       "id" : "1001",
       "cust" : "abc",
       "lines" : [
         { "line number" : "line1", "product" : "ProdA" },
         { "line number" : "line2", "product" : "ProdB" }
       ]
     })
MongoDB Query Editor

The fields are automatically determined by the "Fields Provider" mechanism.

I add a few orders following the pattern shown above. Then I query them using the simplest possible query in the iReport query designer:

     { 'collectionName' : 'orders' }

You can see that iReport does a bunch of work for me automatically. After I click “Read Fields” it retrieves all of the  documents. It parses them to find the fields. It sees that ‘lines’ contains an array of items. It creates fields corresponding to these items: ‘lines.0.line number’, ‘lines.1.line number’, etc.

Granted, it’s a simple case. But that’s a solid foundation to start from.

Handle Complexity

This is a good example of why the default behavior is not good enough. At the time I’m designing the report, no order has more than 3 lines. But in the future I will have orders with many more lines. I need the report to handle N order lines correctly where “N” is not known at design time.

The first thing you must do is add another field to the report manually. The “Fields Provider” automatically makes all of the leaf-level nodes in the document available. That includes fields like “lines.0.line number” and “lines.0.product”. But you need to manually tell it you want the whole thing without reaching into its components.

Add a field called “lines” with data type “java.util.Collection”.

The MongoDB connector will now bring back your Array of N order_lines as desired. But how can you display them intelligently in a report?

Complexity via cop-out

I could treat the field ‘lines’ as a Collection, then simply convert it to a String and display this. In certain cases this could be useful. In general… yuck. I get something like this:

     [ { "line number" : "line1" , "product" : "ProdA"} , { "line number" : "line2" , "product" : "ProdB"} , { "line number" : "line3" , "product" : "ProdA"}]

In principle I could do some  parsing in the report itself to extract useful information. Thinking about that too hard makes me queasy. Yuck. Fortunately JasperReports has some mechanisms in place to handle Collections much more easily.

List Component

Create "Table Dataset" and define the relevant fields

The list component is designed to handle simple list of data. This example fits the bill perfectly. The key fact that many users don’t think of is that they can pass any Collection (like ‘lines’) into the component like this:

     new net.sf.jasperreports.engine.data.JRMapCollectionDataSource($F{lines})

This data source exists specifically to handle exactly this type of requirement. It’s perfect.

I created a new dataset in the report. I had to manually add the fields “line number” and “product” to this dataset. This is because there is no query associated with the dataset that iReport could use determine fields automatically.

Once we pass $F{fields} to the dataset, laying out the list is simple. Here each line in my list includes the line number and product name.

Table Component

The table component in JasperReports hasn’t gotten the love it deserves. It’s a great solution to this problem. Dropping a Table component into the report design is simple.

Laying out the table is straightforward. It imposes a bit more structure than the List component. In my simple example here I make the List and the Table look the same. But the Table holds additional features that make it easy to include a header or footer. It also has more semantic meaning because it includes the idea of separate columns.

Subreport

A subreport can take the same data source as a Table. So the key idea behind working with Tables and Subreports is identical. Subreports provide more flexibility than a table to accomodate complex layouts. But there is some corresponding complexity in maintaining a report with a subreport. In my example where I simply want to list out order lines, a subreport is clearly overkill. But in other cases it may be more appropriate.

Custom Java Utility

In some cases you may want a simple way to very specific things with the Collection. I encountered one JasperReports Server customer who simply wanted to create a comma separated list of values based on the contents of an Array coming from his MongoDB data source. It is possible to create a string like this as a variable which is part of the “Order Lines Dataset”. Then we could display the string using a List or a Table component. But this feels like a bit of a hack.

Typically the best way to do a bit a string processing like this is with an external function. I wrote a Java function to handle it. DON’T PANIC. There’s no reason that you need to write Java code to handle collections. I just wanted to include an example of it here for the sake of completeness. Here it is:

     /**
      * Takes a Collection of Maps and a String key
      * Returns a comma separated String of all values corresponding to that key
      */
     public static String concatMapValues(Collection<Map<String,String>> coll, String key) {
       StringBuilder sb = new StringBuilder();
       for(Map<String,String> m : coll) {
         if (sb.length() > 0 && m.containsKey(key)) {
           sb.append(", ");
         }
       sb.append(m.get(key));
       }
       return sb.toString();
     }

Final Result

Now that we have multiple different ways to process the order lines, let’s take a look at the report in action. I used a slightly updated query from the trivial one that I show at the start of this article. This query returns all orders where one of the order lines includes “ProdA”

     { 'collectionName' : 'orders',
       'findQuery' : { "lines.product" : "ProdA" } }

The final report has exactly the data I want. Well… it has it multiple times because I tried every different method. But normally I would just choose one. The report layout is shown as well. Again, it’s needless complex because I show two poor ways and four good ways to do the same thing.

Executed report showing 3 orders along with all order lines for each order (with multiple variations)

iReport in "Designer" mode showing the report layout

Sample Materials

You can try out the report for yourself. The sample data and .jrxml files are in this document. Get the Jaspersoft MongoDB connector separately.

Of course you’ll need iReport as well. JasperReports Server can be used to deploy the reports.

2 September 2011

Cool Reporting on MongoDB

Filed under: iReport, JasperReports — Tags: , , , — mdahlman @ 16:24

Interesting Reporting on MongoDB

Before you can do interesting reporting, you need to do simple reporting. You have to walk before you can run. Please start with the article Simple Reporting on MongoDB. That article explains where to get the MongoDB connector for iReport and JasperReports Server. When you have completed that, then read this article more advanced and more interesting reporting techniques.

Simple Filters

Let’s start with a simple hard-coded filter.

{ 
  'collectionName' : 'accounts',
  'findQuery' : { 'name' : 'M & Y Takemura Communications, Ltd'  }
}

Filtering on a single value makes sense when that value is a complex document. There could be lots of information in that document to create a report. To make this truly interesting though, we need this query to be parameterized. So our first truly realistic sample query is the following:

{
  'collectionName' : 'accounts',
  'findQuery' : { 'name' : '$P{CUSTOMER_NAME}'  }
}

Field Selection and Dot Notation

Documents in MongoDB can be big. The queries above will return every field in the document. When this is not appropriate, it’s easy to create a query to return only the fields that you need.

Imagine the case where we have lots of users and we want a report showing which users had logged certain events. There could be lots and lots of events, but there’s only one particular event that I care about in this report. Rather than return the complete document for each user we instead return only the fields we care about.

{ 
  'collectionName': 'active_users',
  'findFields' : { 'id':1, 'date':1, 'events.ACHIEVEMENT_Minor':1 }
}

Enter this query into iReport’s query field, and then iReport is able to execute the query and return only the fields we’re interested in.

iReport Query Editor

The query editor makes it easy to test query syntax and to see query results

We could then combine this with a filter if we only want certain users returned.

{ 
  'collectionName': 'active_users',
  'findFields' : { 'id':1, 'date':1, 'events.ACHIEVEMENT_Minor':1 },
  'findQuery' : { 'events.ACHIEVEMENT_Major' : 1 }
}

Joins!

Joins? MongoDB doesn’t support joins. In general your data should be modeled so that you don’t need joins. If you need them, then perhaps you should be using a traditional RDBMS instead of MongoDB. But what about simple joins that are required in some types of reports? Subreports can be used in JasperReports to effectively join two collections together.

Imagine a collection that stores summary data by country and a collection that stores customers. I want to query the country collection to find my top countries for some time period and then query my customers collection to find the top customers in the those countries.

It’s simple in JasperReports. We create a query in the top level report to return the countries that I want. Then for each country I run a subreport that returns the customers from that country.  Here’s a simple version.

Parent query:

{ ‘collectionName’:’countries’ }

The subreport query finds all accounts that are in that country and whose name starts with “B”. (It’s a silly example. Why would I want companies that start with “B”? But I wanted to show the syntax for Regular Expressions in queries.)

{
	'collectionName':'accounts',
	'findFields':{'name':1,'phone_office':1,'billing_address_city':1,'billing_address_street':1,'billing_address_country':1},
	'sort':{'billing_address_city':1,'name':1},
	'findQueryRegEx':{'billing_address_country':'/$P{AccountCountry}/','name':'/^B /'}
}

The above set of two queries gives me exactly what I want. It has the drawback of running “N + 1” queries. By that I mean that runs the subreport query once for each country. In some cases that could be bad. In these cases it could be even better to use the “IN” syntax so that I only need to run each query once.

I can use the same parent query. But rather than pass the countries one at a time to the subreport we can pass the complete list of countries as a Collection. Then the subreport can run a single query to get all of the accounts that I want. The working reports are attached below, but here’s the query to show the idea.

{
	'collectionName':'accounts',
	'findFields':{'name':1,'phone_office':1,'billing_address_city':1,'billing_address_street':1,'billing_address_country':1},
	'sort':{'billing_address_country':1,'billing_address_city':1,'name':1},
	'findQuery':{ 'billing_address_country': {'$in':$P{AccountCountryCollection}} }
}

Summary

MongoDB has more flexibility than most of today’s Big Data databases in its query syntax. That’s one thing that made it attractive as a reporting target for Jaspersoft. Hopefully this article gives a good idea of some of the possibilities that exist for reporting directly against a MongoDB data source.

Sample reports using all of the above queries are available here.

Older Posts »

Create a free website or blog at WordPress.com.