Right Outer Join

3 February 2012

Collections in reports on MongoDB

Filed under: iReport, JasperReports — Tags: , , , — mdahlman @ 15:25

Collections in Collections

MongoDB stores data in collections which correspond to tables in relational databases. These collections hold data like Strings and Numbers and… Collections. These Collections are generally Arrays. (Always Arrays?) A classic example is orders which have order_line_items. In a normalized relational database you would have one table order_line_items with a foreign key pointing to the table orders. A query might look like this:

     select o.order_id, o.customer_id, oli.order_line_item_id, oli.product_name, oli.quantity
     from orders o
          inner join order_line_item oli on ( oli.order_id = o.order_id )

The result of this query will be a tabular result set which JasperReports or any other report engine will have no trouble processing.

But that same data set is likely to modeled in MongoDB as single collection called orders. Each record will have an order_id, a customer_id, and an array of order_line_items. For many purposes this is ideal. A program can retrieve a few orders, it can iterate through the order_line_items, and it can process them however it likes. But this handling of Arrays within other records poses some pitfalls for many standard reporting scenarios.

Start Simple

In the simplest case, the Arrays have N elements where N is known at design time. Think of the field ‘location’ storing longitude and latitude. We know that the Array ‘location’ will always contain two elements. We know ‘location[0]’ can be treated as the longitude, ‘location[1]’ is the latitude, and we can safely ignore ‘location[17]’ in all of our reports.

The Jaspersoft MongoDB connector handles this case fantastically well. Consider this example.

     db.orders.save({
       "id" : "1001",
       "cust" : "abc",
       "lines" : [
         { "line number" : "line1", "product" : "ProdA" },
         { "line number" : "line2", "product" : "ProdB" }
       ]
     })
MongoDB Query Editor

The fields are automatically determined by the "Fields Provider" mechanism.

I add a few orders following the pattern shown above. Then I query them using the simplest possible query in the iReport query designer:

     { 'collectionName' : 'orders' }

You can see that iReport does a bunch of work for me automatically. After I click “Read Fields” it retrieves all of the  documents. It parses them to find the fields. It sees that ‘lines’ contains an array of items. It creates fields corresponding to these items: ‘lines.0.line number’, ‘lines.1.line number’, etc.

Granted, it’s a simple case. But that’s a solid foundation to start from.

Handle Complexity

This is a good example of why the default behavior is not good enough. At the time I’m designing the report, no order has more than 3 lines. But in the future I will have orders with many more lines. I need the report to handle N order lines correctly where “N” is not known at design time.

The first thing you must do is add another field to the report manually. The “Fields Provider” automatically makes all of the leaf-level nodes in the document available. That includes fields like “lines.0.line number” and “lines.0.product”. But you need to manually tell it you want the whole thing without reaching into its components.

Add a field called “lines” with data type “java.util.Collection”.

The MongoDB connector will now bring back your Array of N order_lines as desired. But how can you display them intelligently in a report?

Complexity via cop-out

I could treat the field ‘lines’ as a Collection, then simply convert it to a String and display this. In certain cases this could be useful. In general… yuck. I get something like this:

     [ { "line number" : "line1" , "product" : "ProdA"} , { "line number" : "line2" , "product" : "ProdB"} , { "line number" : "line3" , "product" : "ProdA"}]

In principle I could do some  parsing in the report itself to extract useful information. Thinking about that too hard makes me queasy. Yuck. Fortunately JasperReports has some mechanisms in place to handle Collections much more easily.

List Component

Create "Table Dataset" and define the relevant fields

The list component is designed to handle simple list of data. This example fits the bill perfectly. The key fact that many users don’t think of is that they can pass any Collection (like ‘lines’) into the component like this:

     new net.sf.jasperreports.engine.data.JRMapCollectionDataSource($F{lines})

This data source exists specifically to handle exactly this type of requirement. It’s perfect.

I created a new dataset in the report. I had to manually add the fields “line number” and “product” to this dataset. This is because there is no query associated with the dataset that iReport could use determine fields automatically.

Once we pass $F{fields} to the dataset, laying out the list is simple. Here each line in my list includes the line number and product name.

Table Component

The table component in JasperReports hasn’t gotten the love it deserves. It’s a great solution to this problem. Dropping a Table component into the report design is simple.

Laying out the table is straightforward. It imposes a bit more structure than the List component. In my simple example here I make the List and the Table look the same. But the Table holds additional features that make it easy to include a header or footer. It also has more semantic meaning because it includes the idea of separate columns.

Subreport

A subreport can take the same data source as a Table. So the key idea behind working with Tables and Subreports is identical. Subreports provide more flexibility than a table to accomodate complex layouts. But there is some corresponding complexity in maintaining a report with a subreport. In my example where I simply want to list out order lines, a subreport is clearly overkill. But in other cases it may be more appropriate.

Custom Java Utility

In some cases you may want a simple way to very specific things with the Collection. I encountered one JasperReports Server customer who simply wanted to create a comma separated list of values based on the contents of an Array coming from his MongoDB data source. It is possible to create a string like this as a variable which is part of the “Order Lines Dataset”. Then we could display the string using a List or a Table component. But this feels like a bit of a hack.

Typically the best way to do a bit a string processing like this is with an external function. I wrote a Java function to handle it. DON’T PANIC. There’s no reason that you need to write Java code to handle collections. I just wanted to include an example of it here for the sake of completeness. Here it is:

     /**
      * Takes a Collection of Maps and a String key
      * Returns a comma separated String of all values corresponding to that key
      */
     public static String concatMapValues(Collection<Map<String,String>> coll, String key) {
       StringBuilder sb = new StringBuilder();
       for(Map<String,String> m : coll) {
         if (sb.length() > 0 && m.containsKey(key)) {
           sb.append(", ");
         }
       sb.append(m.get(key));
       }
       return sb.toString();
     }

Final Result

Now that we have multiple different ways to process the order lines, let’s take a look at the report in action. I used a slightly updated query from the trivial one that I show at the start of this article. This query returns all orders where one of the order lines includes “ProdA”

     { 'collectionName' : 'orders',
       'findQuery' : { "lines.product" : "ProdA" } }

The final report has exactly the data I want. Well… it has it multiple times because I tried every different method. But normally I would just choose one. The report layout is shown as well. Again, it’s needless complex because I show two poor ways and four good ways to do the same thing.

Executed report showing 3 orders along with all order lines for each order (with multiple variations)

iReport in "Designer" mode showing the report layout

Sample Materials

You can try out the report for yourself. The sample data and .jrxml files are in this document. Get the Jaspersoft MongoDB connector separately.

Of course you’ll need iReport as well. JasperReports Server can be used to deploy the reports.

Advertisements

3 November 2011

Filtering on multiple fields in MongoDB

Filed under: JasperReports — Tags: , — mdahlman @ 23:07

Filtering on multiple fields in MongoDB

This thread got me thinking about filtering on multiple fields in a MongoDB query. The author of the original question attempted to use the $and operator which was introduced in MongoDB 2.0. But that’s not needed for simple queries that filter on multiple fields. I created a simple report to show the idea.

Filtering on multiple fields at once in MongoDB

Filtering on multiple fields at once in MongoDB

The report is available here if you want to play with it. Here’s the sample data I used:

db.shiraz.save( { "ServiceType" : "sms", "ServiceHealthState" : "ok", "DocumentID" : "11" } )
db.shiraz.save( { "ServiceType" : "sms", "ServiceHealthState" : "ok", "DocumentID" : "12" } )
db.shiraz.save( { "ServiceType" : "sms", "ServiceHealthState" : "ko", "DocumentID" : "13" } )
db.shiraz.save( { "ServiceType" : "jms", "ServiceHealthState" : "critical", "DocumentID" : "21" } )
db.shiraz.save( { "ServiceType" : "jms", "ServiceHealthState" : "critical", "DocumentID" : "22" } )

If you haven’t used (or heard of!) the Jaspersoft connector for MongoDB, then start with this article.

22 September 2011

Date Queries in MongoDB Reports

Filed under: JasperReports Server — Tags: , , , — mdahlman @ 12:21

First Dates with MongoDB

Background

You need some software before you can make use of these ideas:

You should probably start by reading my previous articles before reading this one:

Date queries in MongoDB reports

This idea actually applies to all types of Java objects, not just Dates. But most data types likes Strings and simple numbers get converted magically to what you probably want. Dates are probably the most common object that is too complex to be handled by this automatic conversion process.

Let’s take what is perhaps the simplest possible example. I want a query that will return only the documents for which a specified field, status_date, is later than some date. It’s a simple JSON-style query in the MongoDbQuery language of the Jaspersoft MongoDB connector:

{ 
  'collectionName' : 'active_users',
  'findQuery' : { 'status_date' : {'$gte':$P{StartDate}} }
}

It’s easy enough to understand even if it’s a bit verbose with curly braces in curly braces in curly braces. My report has a java.util.Date parameter which gets passed into the query. In earlier versions of the connector—as pointed out by one astute reader—there was no way to do this. Attempts to embed syntax like this into the query resulted in JSON parsing exceptions: { ‘status_date’ : {‘$gte’: new ISODate(’2011-09-15T00:00:00.000Z’)’ } }

Sample Data

In case you’re relatively new to MongoDB, like me, it can smooth your path to testing to have a sample data set and sample report that are known to work well together. You can insert a these few documents on the MongoDB shell and then use the sample report listed below to get something that ought to work with minimal effort. Then you’re ready to start writing your own reports.

The first document is listed on many lines to make it easier to parse it by eye to see what’s in it. The second two documents are compressed onto a single line. The final “find” command is useful to confirm that the earlier inserts worked as expected.

db.active_users.save(
{
"status_date" : ISODate("2011-09-01T07:48:58Z"),
"id" : "123",
"events" : {
"ACHIEVEMENT_Minor" : 1,
"ACHIEVEMENT_Major" : 1,
"ACHIEVEMENT_Ultimate" : 0
},
"install_source" : {
"detail" : "campaign - Free iPad",
"vt1" : "email",
"vt2" : "facebook"
},
"rev" : {
"net_total" : 700
}
}
)
db.active_users.save({"status_date" : ISODate("2011-09-09T22:17:27Z"),"id" : "456","events" : {"ACHIEVEMENT_Minor" : 1,},"install_source" : {"detail" : "campaign - Free iPad","vt1" : "email","vt2" : "facebook"},"rev" : {"net_total" : 450}})
d = new Date()
db.active_users.save({"status_date" : d,"id" : "789","events" : {"ACHIEVEMENT_Minor" : 1,},"install_source" : {"detail" : "campaign - Like us on Facebook","vt1" : "email","vt2" : "facebook"},"rev" : {"net_total" : 222}})
db.active_users.find()

Sample Report

Reminder

The idea for this article (and for this new feature to the Jaspersoft MongoDB Connector) came from a reader comment. Please let me know when you run into questions, but for general questions about the MongoDB connector please post to the JasperReports Server forum instead of posting a comment here. That’s a better location for asking and answering questions.

 

2 September 2011

Cool Reporting on MongoDB

Filed under: iReport, JasperReports — Tags: , , , — mdahlman @ 16:24

Interesting Reporting on MongoDB

Before you can do interesting reporting, you need to do simple reporting. You have to walk before you can run. Please start with the article Simple Reporting on MongoDB. That article explains where to get the MongoDB connector for iReport and JasperReports Server. When you have completed that, then read this article more advanced and more interesting reporting techniques.

Simple Filters

Let’s start with a simple hard-coded filter.

{ 
  'collectionName' : 'accounts',
  'findQuery' : { 'name' : 'M & Y Takemura Communications, Ltd'  }
}

Filtering on a single value makes sense when that value is a complex document. There could be lots of information in that document to create a report. To make this truly interesting though, we need this query to be parameterized. So our first truly realistic sample query is the following:

{
  'collectionName' : 'accounts',
  'findQuery' : { 'name' : '$P{CUSTOMER_NAME}'  }
}

Field Selection and Dot Notation

Documents in MongoDB can be big. The queries above will return every field in the document. When this is not appropriate, it’s easy to create a query to return only the fields that you need.

Imagine the case where we have lots of users and we want a report showing which users had logged certain events. There could be lots and lots of events, but there’s only one particular event that I care about in this report. Rather than return the complete document for each user we instead return only the fields we care about.

{ 
  'collectionName': 'active_users',
  'findFields' : { 'id':1, 'date':1, 'events.ACHIEVEMENT_Minor':1 }
}

Enter this query into iReport’s query field, and then iReport is able to execute the query and return only the fields we’re interested in.

iReport Query Editor

The query editor makes it easy to test query syntax and to see query results

We could then combine this with a filter if we only want certain users returned.

{ 
  'collectionName': 'active_users',
  'findFields' : { 'id':1, 'date':1, 'events.ACHIEVEMENT_Minor':1 },
  'findQuery' : { 'events.ACHIEVEMENT_Major' : 1 }
}

Joins!

Joins? MongoDB doesn’t support joins. In general your data should be modeled so that you don’t need joins. If you need them, then perhaps you should be using a traditional RDBMS instead of MongoDB. But what about simple joins that are required in some types of reports? Subreports can be used in JasperReports to effectively join two collections together.

Imagine a collection that stores summary data by country and a collection that stores customers. I want to query the country collection to find my top countries for some time period and then query my customers collection to find the top customers in the those countries.

It’s simple in JasperReports. We create a query in the top level report to return the countries that I want. Then for each country I run a subreport that returns the customers from that country.  Here’s a simple version.

Parent query:

{ ‘collectionName’:’countries’ }

The subreport query finds all accounts that are in that country and whose name starts with “B”. (It’s a silly example. Why would I want companies that start with “B”? But I wanted to show the syntax for Regular Expressions in queries.)

{
	'collectionName':'accounts',
	'findFields':{'name':1,'phone_office':1,'billing_address_city':1,'billing_address_street':1,'billing_address_country':1},
	'sort':{'billing_address_city':1,'name':1},
	'findQueryRegEx':{'billing_address_country':'/$P{AccountCountry}/','name':'/^B /'}
}

The above set of two queries gives me exactly what I want. It has the drawback of running “N + 1” queries. By that I mean that runs the subreport query once for each country. In some cases that could be bad. In these cases it could be even better to use the “IN” syntax so that I only need to run each query once.

I can use the same parent query. But rather than pass the countries one at a time to the subreport we can pass the complete list of countries as a Collection. Then the subreport can run a single query to get all of the accounts that I want. The working reports are attached below, but here’s the query to show the idea.

{
	'collectionName':'accounts',
	'findFields':{'name':1,'phone_office':1,'billing_address_city':1,'billing_address_street':1,'billing_address_country':1},
	'sort':{'billing_address_country':1,'billing_address_city':1,'name':1},
	'findQuery':{ 'billing_address_country': {'$in':$P{AccountCountryCollection}} }
}

Summary

MongoDB has more flexibility than most of today’s Big Data databases in its query syntax. That’s one thing that made it attractive as a reporting target for Jaspersoft. Hopefully this article gives a good idea of some of the possibilities that exist for reporting directly against a MongoDB data source.

Sample reports using all of the above queries are available here.

Simple Reporting on MongoDB

Filed under: iReport, JasperReports — Tags: , , , — mdahlman @ 16:22

Big Data (MongoDB) with Reporting and Analytics

Big Data databases are being adopted with rapidly increasing frequency. Opinions on NoSQL and Hadoop are being generated nearly as quickly as the data going into these databases. This article doesn’t contribute a viewpoint on any of that.

I’m assuming that you already have data in MongoDB. You probably had a great reason for using MongoDB, and reporting on that data was probably not a top consideration. But now that the data is there, what options exist for reporting and analysis?

Option 1: Dump it to a traditional RDBMS

This technique is not exciting or sexy, but it’s ideal in many cases. Create ETL jobs to get the data out of MongoDB and into PostgreSQL, MySQL, Vertica, Teradata, Netezza, or whatever good old relational database you prefer. From there you have a plethora of tools at your disposal for slicing and dicing the data or for designing reports. Having acknowledged the utility of this technique for many requirements… I will proceed to ignore it.

Option 2: Connect your BI solution directly to MongoDB

You’ve got a lot of data in MongoDB. That’s why it’s called “Mongo” after all; that’s why the whole field is referred to as “Big Data”. So there are plenty of downsides to pulling all of the data back out and putting it somewhere else.

If requirements are simple enough, you can code it up yourself. Write some PHP scripts to create reports. The MongoDB PHP driver makes it easy enough to connect and get data.

If your requirements are a bit more extensive, then the build-it-yourself method has important drawbacks. A simple tabular report can be created quickly. But what happens when your users like it and then say, “Great, now can I get the same thing in PDF?” Or soon they’ll ask for charting. Or you’ll find you need to implement security on who can see which reports. There are plenty of benefits to using standard reporting and analysis tools for solving problems like these. The biggest barrier to this is that most Big Data databases (including MongoDB) are not accessible via standard methods like JDBC. That means traditional Business Intelligence (BI) tools can’t get directly to Big Data data sources.

Jaspersoft has addressed this problem for a number of Big Data databases including MongoDB. It offers connectors to a variety of datasources. Read the marketing fluff. See the connectors.

OK, that’s enough background. The rest of the article below this point is a hands-on guide to creating reports in iReport against a MongoDB data source.

Prerequisites

First Report

The installation of the connector is simple. It’s covered in the documentation included with the connector. After installing the plugin and restarting iReport, you’re ready to define your connection. The connection needs to know the host, port, and database that you’ll be connecting to. This is expressed in a JDBC-like url shown below.

Simple JDBC-like connection URL

Now you’re ready to create your first report. Choose File -> New to launch the report wizard. Choose the Coffee template (well… choose any template besides the blank ones). Then click “Launch Report Wizard”. When you reach the step of the wizard that requires a Query in the MongoDbQuery language you have reached your first real test. MongoDB doesn’t have a query language. What should you enter here?

For your first report, use the simplest possible query:

{ ‘collectionName’ : ‘myCollection’ }

Simple MongoDB Query

Click “Next”. iReport then runs your query. It connects to MongoDB, reads each document from the specified collection, finds each field and its datatype, and presents these fields to you. Documents in MongoDB do not necessarily have the same fields. The list of fields presented is the union of all fields found for each document.

The available fields are automatically discovered. Choose the ones you want.

If you’re dealing with a very large collection, then you perhaps don’t want to wait for iReport to read through every record. You can tell it to read only the first 100 documents like this:

{ ‘collectionName’ : ‘myCollection’,
‘rowsToProcess’ : 100 }

Note: the field ‘rowsToProcess’ is read only by iReport’s Fields Provider. That means when you run the report it will still process all documents from the collection.

Finish the report wizard and then preview the report. You have just created a report against a MongoDB database without having to first dump the data into MySQL. Woo hoo!

Your first report showing data from MongoDB should be working now!

Useful Reports

Creating the first report was easy. But let’s face it: a report that includes every record from a collection is not really interesting. It’s bound to be a humongous report. Let’s add filters to get genuinely useful reports.

Continue to my article Cool Reporting on MongoDB for some advanced techniques.

You can download the report from this article (and all of the reports from the second article) here.

Create a free website or blog at WordPress.com.