I am currently working as an Actuarial Analyst, but I have also worked several y...

dgudkov · on Aug 17, 2015

Look at my EasyMorph (http://easymorph.com). It's a visual replacement for scripted data transformations. People use it to replace SAS and Visual Basic scripting. It also allows creating reusable modules. Contact me at <hnusername>@easymorph.com if it looks interesting to you.

shaunxcode · on Aug 17, 2015

Hey, I clicked through, read the tutorial, got excited about your examples.. tried to download and found out it was windows only! I would have totally evaluated it further if there were an os x/linux option.

dgudkov · on Aug 17, 2015

Thanks for checking it out! As we're targeting Tableau users so eventually we will release an OS X version.

nycdatasci · on Aug 19, 2015

Speaking of Tableau (which was founded on the concept of VizQL), how is this different? Doesn't tableau basically enable knowledge workers to create data-centric web applications?

hbt · on Aug 17, 2015

use a virtual machine or install windows as dual boot.

If this software solves a problem you are really having, nothing would stop you.

CognitiveLens · on Aug 17, 2015

if solving one problem involves creating another, it's good to be cautious.

mistermann · on Aug 18, 2015

Does this have an API that can be called from .Net?

I'm really liking some of the things Microsoft is doing with Power Query, but I don't like how it is (afaik) only callable from Excel or PowerBI online. I'd like similar capability, but more open, and could be called via scripting, from SQLCLR, etc.

Another big hitch: Microsoft has not published proper API's for manipulating PowerPivot models in Excel, and I don't think they intend to - I've heard one 3rd party has reverse engineered the API's (you can decompile the .Net binaries, but I haven't had the time to look at it yet).

tycho01 · on Aug 18, 2015

Would you perhaps have any info/reference about where I could learn more about this reverse engineered PowerPivot API? This sounds pretty exciting. :D

To expand on my question, I'd heard (from Rob 'PowerPivotPro' Collie, former product manager on the project IIRC) that the core had been written in 'unmanaged code' (probably C++?), so I believe reverse engineering it would be a significantly larger effort than just opening some of its DLLs in say DotPeek, at least from as far as I've been able to tell.

mistermann · on Aug 19, 2015

The PowerPivot engine itself I imagine is in unmanaged code, but the code that just writes datasets and whatnot to the model is (from what I've heard)managed code, and indeed you can decmpile the libraries and see all sorts of things, I've only looked around for about 10 minutes or so. And I had just read on some obscure thread that someone had successfully found the undocumented API call to write to the model, which is what I'm wanting to do - but I don't even know what the product name is that supposedly does this, sorry.

tycho01 · on Aug 19, 2015

Right, makes sense, I'll try and check out what's available then. :)

By writing to the model, you mean programmatically adding new measures or the like?

My interest is in programmatically querying models using DAX, though to this end I'd also look to look in the direction of Microsoft's DirectQuery mode in SQL Server which supposedly did DAX-to-SQL conversion.

If one could use such a conversion plus MDX to start querying models on an Apache Spark cluster through pivot table/chart interfaces...

mistermann · on Aug 20, 2015

Not even measures, I'm just wanting to be able to create tables, define relations, etc, with the accompanying sql or m script. I'm hopeful they'll let us do that some day, but I still don't quite believe they've changed their stripes entirely.

dgudkov · on Aug 18, 2015

Currently EasyMorph supports integration through command line only. We do not plan having API for the desktop client, but we will definitely make EasyMorph Server API if we reach that point.

icebraining · on Aug 17, 2015

We use Pentaho Kettle for those kinds of transformations. It's FOSS, and connects to a whole bunch of programs and formats.

It's a graphical tool - you drag-n-drop modules, then configure and connect them, though it can also run scripts (it has JavaScript, Java, Bash and Ruby support, besides SQL, of course) - but after configuring the transformation/job, you can also run it on the terminal, which is useful for periodically re-running it.

http://community.pentaho.com/projects/data-integration/

mindcrime · on Aug 17, 2015

I've been doing a lot of work with Kettle as well, and it is a handy tool (albeit with a few warts).

What I think would be handy for use in an organizational setting, where "business users" might want to use some of the transforms, would be a way to publish transforms somewhere, making them discoverable and accessible to others. I don't want to make it sound like I'm talking about UDDI or anything (although, thinking about it, maybe you could use that), but just an easy way for a Joe Business User to get a list of available transforms, some explanation of what they do, what input they take, what they output, etc. And maybe a way to make changes to the "small stuff" (like the input and output path, for example) without having to load up Spoon and edit the ktr that way. Since transforms can be parameterized, that should be doable...

You could also picture combining this with something like a Yahoo Pipes like web interface, to let you define your own chains of transforms and operations as well. And hell, a web-based interface for editing ktr files would be a pretty interesting thing as well, if somebody would build it.

mhw · on Aug 18, 2015

Have a look at Alteryx (http://www.alteryx.com/) - it's pretty close to what you're describing, I think.

myoffe · on Aug 17, 2015

I haven't used it extensively, but SQL Server Integration Services (SSIS) looks like it does a lot of the things you're talking about.

wesd · on Aug 17, 2015

It does. There are other ETL tools as well.

https://en.wikipedia.org/wiki/Extract,_transform,_load#Tools

knn · on Aug 18, 2015

The databricks platform should solve exactly your problem - reusable data pipelining/transformation. I saw a demo of it last night and it was extremely slick. Their product is amazing, it makes data pipelining incredibly easy compared to setting up a hadoop cluster and running hive/etc. (I don't work for them - but if any databricks employee sees this, please hire me!) It runs on a spark cluster over AWS, which is much more modern and powerful than SAS/excel/sql. Since you know how to program already, it shouldn't be too hard to pick up spark (even has python bindings)

mcarroll_ · on Aug 18, 2015

@rgoddard - May be a bit overkill but check out Immuta (www.immuta.com). Its a data platform, built for data scientists, that enables you to query across many disparate sets of data using familiar patterns such as SQL, file system, etc. Our SQL interface allows you to hook to Excel, Tableau, Pentaho...so you could write your abstracted logic and connect to many data sources or mashed up analytic results. contact me at matt@immuta.com if you're interested after reading through the site.

jsandiego · on Aug 17, 2015

We are having a similar set of issues where I work (insurance industry). Always looking for folks to chat with/discuss similar issues

drdoom · on Aug 17, 2015

Is there a way to get in touch with you?

jsandiego · on Aug 19, 2015

Sure email is ios at arrowheadgrp.com