Syndicate

Syndicate content
Open Source Business Intelligence news

BI feeds

Open Source BI uptake in UK Healthcare

James Dixon - Thu, 12/18/2008 - 02:36

The recent decision by National Health Service in Islington to use Pentaho for BI is getting lots of attention:

As Glynn Moody often points out the UK is a laggard in Europe in terms of open source adoption. Cases like this hopefully mark the turn of the tide.

      

Open source can keep a technology alive, less so a company

James Dixon - Wed, 12/17/2008 - 18:49

Cyan Worlds has decided to release the source code for Myst Online (Uru Live) to open source.

Dana Blankenhorn at ZDNet in his ‘Can open source save gaming companies?‘ post seems to think that Cyan is trying to save the company from dying. I read it differently, I think they are trying to save the software from dying.

Here is a quote from Cyan:

But we’ve poured so much into UruLive, and it has touched so many, that we could not just let it whither and die. We still have hopes that someday we will be able to provide new content for UruLive and/or work on the next UruLive.

They seem to be viewing open source as a way to keep Myst alive. This is probably from both personal reasons and also with the hope that at some point Cyan might be able to monetize it again.

I think Matthew Aslett at 451 Group reads it the way I do and points out that open source is not a panacea

      

BI: The Year in Review

TDWI - Wed, 12/17/2008 - 08:00
For BI professionals, it was a year in which several long-simmering trends seemed to coalesce and boil over.

Major Data Warehousing Events of 2008 (and Predictions for 2009)

TDWI - Wed, 12/17/2008 - 08:00
BI analyst Michael Schiff looks at the accuracy of the predictions he made last year, examines the major BI events of 2008, and suggests new trends to watch in 2009.

Analysis: BI Transformation in 2009

TDWI - Wed, 12/17/2008 - 08:00
Financial disruption may hold promise for BI, but it will also transform BI itself: big tools to small tools, fewer analysts and more business users, and finally appreciation for human intelligence.

Follow Up to ‘Open Source Business Intelligence In an Economic Down-Turn’

James Dixon - Wed, 12/17/2008 - 02:57

Michael Schiff over at Enterprise Systems Journal seems to agree with my last post

http://www.esj.com/business_intelligence/article.aspx?EditorialsID=9261

Prediction #3: Open Source Growth will Accelerate

Economic pressure will accelerate the growth of open source technology as well, especially as open source has now established itself in production deployments. Because many vendors are utilizing source technology in their applications in order to reduce costs or, as in the case of several data warehouse appliance vendors, partnering with open source business intelligence and data integration vendors to offer a more complete solution, the growth will be seen in both standalone and embedded environments.

      

SEX and BI

BIguru-online - Tue, 12/16/2008 - 14:57

What makes Business Intelligence successful? Why is one project turning out great and the other one a complete failure? I wish I had the answer to this but I don’t. What I do know – see my previous blogs – is that BI has to be super fancy sexy. This means that any BI solution should look great which greatly increases the changes of success. The last time I blogged on this topic I had 5000% more traffic on my blog (google term: BI SEXY). So to all my new readers please react to below mentioned best practices and see if you share this. Please react and become a BI GURU yourself. Design phase

  • Think about targetgroups (different users have different needs)
  • Think about usage (adhoc and structured )

 Build phase

  • Go for quality instead of quantity
  • Mix your team (functional and technical)

 Implementation phase

  • Don’t stop after build.
  • Support BI by forming a competence center
  • Find high level support (thougtleadership, inspiration much more than sponsorship)

Q&A: Market Forces That Will Mold BI in 2009

TDWI - Tue, 12/16/2008 - 08:00
It’s been a year of great change, and there’s more ahead. JasperSoft’s CEO takes a look at four key trends for 2009 that will have an impact on BI.

Kettle at the MySQL UC 2009

Matt Casters - Thu, 12/11/2008 - 18:25

Hello Kettle fans,

Like Roland I got confirmation earlier this week that I could present my talk on “MySQL and Pentaho Data Integration in a cloud computing setting”, at the next MySQL user conference.

I’m very excited about the work we’ve done on the subject and it’s going to be great talking about it in April.

See you there!
Matt

Open Source Business Intelligence In an Economic Down-Turn

James Dixon - Thu, 12/11/2008 - 05:08

Could the current economic climate affect the balance of power between the entrenched proprietary vendors and the new open source competitors? It seems logical…

Business Intelligence is interesting in that it provides different value both boom-times and down-turns. In boom-times companies are looking to maximize sales and growth by analyzing the results of marketing campaigns, weblogs, and demographics etc. In economic down-turns companies are looking to analyze things such as costs, margins, and profitability of channels in order to maintain profitability while sales decline.

BI therefore has differing values during various economic scenarios. The BI tools and technologies remain the same in all these scenarios, but there are differences. During booms the budgets can be considerable-to-extrordinary. During down-turns the budgets become severly constrained.

In previous economic cycles budgets have not been major differentiating factor amongst the BI vendors because the vendors have had, generally, similar pricing and business models. That is to say the entrenched BI vendors have presented the same pricing in all economic scenarios despite the economic differences. However these old-school (or `historically-comfortable` if you want a politically correct term) business models are as prone, if not more so, to economic down-turns as the markets that they serve. This is because it costs more to sell BI tools than it does to produce the technology. You can look at the quarterly reports of Cognos and Business Objects just before they were consumed and see that income from new licences was 35-40% of all income, that sales and marketing expenses was 35-40% of expenditure, whereas investment in product development was 15-25% of expenditure. These figures show that almost all the money a customer spends on BI products is spent by the vendor on marketing to, and selling to, the next round of customers.

What is different about today’s economic situation is that for many segments of the software market (for BI and also for CRM, ERP etc.) this is the first economic down-turn where substantial and viable open source and commercial open source offerings have been available.

Will the current economic climate provide a boost to these new offerings? We only have to wait…

      

QlikView's Rapid Time-to-Implementation Improves BI Value

TDWI - Wed, 12/10/2008 - 08:00
QlikTech believes its QlikView product will benefit disproportionately from a projected surge in BI spending

Silver Creek Accelerates Product Data Integration

TDWI - Wed, 12/10/2008 - 08:00
Existing DI and DQ tools can't easily be adapted to address product data integration. What's needed, proponents say, is a better, dedicated tool.

Q&A: Integrating Content Management and BI

TDWI - Wed, 12/10/2008 - 08:00
Integrating BI, content management, and portals has long been a challenge for IT. Open source has a price advantage, but is it right for you?

Q&A: From Part-time to Prime-time: BI Evolution Continues

TDWI - Wed, 12/10/2008 - 08:00
A look at the trends that left their mark in 2008, plus what's ahead in 2009.

Pentaho Data Integration vs Talend (part 1)

Matt Casters - Thu, 12/04/2008 - 16:42

Hello data integration fans,

In the course of the last year or so there have been a number of benchmarks on blogs here and there that claimed a certain “result” pertaining to performance of both Talend and Pentaho Data Integration (PDI a.k.a. Kettle).  Usually I tend to ignore these results a bit, but a recent benchmark got so far off track that I had to finally react.

Benchmarking itself is a very time-consuming and difficult process and in general I advice people to do their own.  That being said, let’s attack a first item that appears in most of these benchmarks: reading and copying a file.

Usually the reasoning goes like this: we want to see how fast a transformation can read a file and then how fast it can also write it back to another file.  I guess the idea behind it is to get a general sense of how long it takes to process a file.

Here is a PDF that describes the results I became when benchmarking PDI 3.1.0GA and TOS 3.0.2GA.  The specs of the test box etc are also in there.

Reading a file

OK, so how fast can you read a file, how scalable is that process and what are the options?  In all situations we’re reading a 2.4GB test file with random customer data.  The download location is in the PDF on page 4 and elsewhere on this blog.

Remember that the architectures of PDI and Talend are vastly different so there are various options we can set, various configurations to try…

1) Simply reading with the “CSV File Input” step, lazy conversion enabled, 500k NIO buffer : 150,8 seconds on average for PDI. Talend performs this in 112,2 seconds.

2) This test configuration is identical to 1) except that PDI now runs 2 copies of the input step.  Results: 94,2 seconds for PDI.  This test is not possible in Talend since the generated Java software is single threaded.

Reading a delimited file, time in seconds, lower is better

There is a certain scalability advantage of being able to read and process files on multiple CPU and even multiple systems across a SAN.  There is a serious limitation in Talend since they can’t do that.  A 19% speed advantage for PDI is inconsequential for simple reads but brutal for more complex situations, very large files and/or lots of CPUs/systems involved.  For example, we have customers that read large web log files in parallel over a high speed SAN across a cluster of 3 or 4 machines.  Trust me, a SAN is typically faster than what any single box can process.

Writing back the data

The test itself is kinda silly but since it is being carried around in the blogosphere, let’s set a reference, a copy command.   I simply copied the file and timed the duration.  That particular copy set a reference time of 122.2 seconds: a copy from my internal disk to an external USB 2.0 disk. (for the exact configurations see the PDF)

3) If reading in parallel is the fastest option for PDI, we retain that option.  Then we write the data back with a single target file.  PDI handles this in 196.2 seconds.  Talend can’t read in parallel so we don’t have any results there.

4) A lot of times, these newly generated text files are just temporary files for upstream processes.  As such it might (or might not) be possible to create multiple files as target.  This would increase the parallelism in both this challenge as the upstream tasks.  PDI handles this task in 149.3 seconds.  Again I didn’t find any parallelization options in TOS.

5) Since neither 3) and 4) are possible in Talend I tried the single delimited reader / writer approach.  That one ran for 329.4 seconds.

Reading/writing a delimited file, time in seconds, lower is better

CPU utilisation

I also monitored the CPU utlisation of the various Talend jobs and Kettle transformations and came to the conclusion that Talend will never utilize more than 1 CPU while Kettle uses whatever it needs and get its hands on.  For the single threaded scenario, the CPU utilization is on par with the delivered performance of both tools.  There doesn’t seem to be any large difference in efficiency.

Conclusion

Talend wins in the first test with their single threaded reading algorithm.  I think their overhead is lower because they don’t run in multiple threads. (Don’t worry, we’re working on it :-))  In all the other situations where you have more complex situations, where you can indeed run in multiple threads, there is a severe performance advantage to using Kettle.  In the file reading/writing department for example, PDI runs in 3 threads and lazy conversion beats Talend by being more than twice as fast in the best case scenario and 65% faster in the worst case.

Please remember that my laptop is hardly and not by any definition “high end” equipment and that dual and quad core CPUs are commonplace these days.  It’s important to be able to use them properly.

The source code please!

Now obviously, I absolutely hate it when people post claims and benchmarks without backing them up.  Here you can find the used PDI transformations and TOS jobs.  With a little bit of modification I’m sure you all can run your own tests.  Just remember to be critical, even concerning these results!  Trust me when I say I’m not a TOS expert :-)  Who knows, perhaps I used a wrong setting in TOS here or there.  All I can say is that I tried various settings and that this seemed the fastest for TOS.

Remember also that if even a simple “file copy” can be approached with various scenarios, that this certainly goes for more complex situations as well.  Even the other tools out there deserve that much credit.  Just because Talend can’t run in multiple threads, that doesn’t mean that Informatica, IBM, SAP and all are not capable of doing so.

If I find the time I’ll post a part 2 and 3 later on.  Feel free to propose your own scenarios to benchmark as well.  Whatever results come of it, it will lead to the betterment of both open source tools and communities.

Until next time,
Matt

IDC Report: Dominant DW Vendors Face Challengers

TDWI - Wed, 12/03/2008 - 08:00
A new study from IDC shows that which companies continue to dominate the data warehousing market and which are enjoying surging growth

Selling BI by the Slice

TDWI - Wed, 12/03/2008 - 08:00
Mid-market shoppers at the recent Sage Summit Customer Conference wanted to try BI by the slice, and many didn't even call it BI. Vendors' pitches kept that in mind.

The Value of BI in a Weak Economy

TDWI - Wed, 12/03/2008 - 08:00
When times are tight, BI can actually save you money.

BPM and Beyond: The Human Factor of Process Management

TDWI - Wed, 12/03/2008 - 08:00
How Best-in-Class companies make the most of BPM

Flighing high in economic storms

Matt Casters - Tue, 12/02/2008 - 21:01

Next week I’ll be in Orlando for another week of brainstorming, planning scheming, plotting for world domination and yes, even coding.

Q : “What are you going to do next week?”
A : “The same thing I do every time when I’m in Orlando - Try to take over the world!”

So I went to kayak and entered my flight preferences: leave and return on Sunday giving me a full week over there.  I was almost shocked to see that the same flight I took 2 months ago now costs less than a third:

  • July 13th 2008 : BRU/MCO - MCO/BRU (over FRA) : 2,400 USD (summer time folks!)
  • October 12th 2008 : BRU/MCO - MCO/BRU (over IAD) : 2,000 USD
  • December 7th 2008 : BRU/MCO - MCO/BRU (over PHL) : 600 USD

Typically I’m not selecting the cheapest flight as that would put me on 18 hour layovers in Bankok or something like that. In the past I’ve once spent 8 hours at Chicago airport and trust me, it’s not worth the 100 USD you can save.  You’ll spend it on Internet access, food, “beverages”, magazines, etc.

That being said, the December 7th flight is the cheapest flight with “only” 1 layover.

In the past I’ve noticed that the airlines added more and more options for me to fly to Orlando or at least across the Atlantic ocean.  Now that the economic downturn is upon us, perhaps there’s finally a bit of over-capacity.  After all, the last 5 flights I took from Brussels to the US had been fully booked flights.  That’s right folks: listening to hollering kids with Mickey Mouse ears for 9 hours straight.  Even noise canceling headset have a hard time with that kind of noise.

Electronic System for Travel Authorization

Another thing of interest for the geeks among you is that you are now encouraged to apply for authorization to enter the US well in advance to replace the manually written green “Visa Waiver” documents.  Nobody makes a fuss about it, but registration for us Europeans is obliged or so you can read from January 12th 2009 on.  I’m sure there are going to be freedom fighters here and there that are going to be up in arms over this sort of program, but personally I’m glad that we can finally fill in those green “waiver” documents electronically at home.  From the looks of it, there’s nothing on there that you don’t already fill in manually now. (it felt kinda familiar filling them in)

I’ll let you know how they perceive my eagerness to fill in these “hidden” electronic documents at the US border next week :-)

Until then,
Matt

User login