Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am currently working as an Actuarial Analyst, but I have also worked several years as a programmer.

As an analyst the tools I use are excel, access, SAS Enterprise guide and Oracle SQL developer. One of the big problems I face is that we have no good way to abstract away a process and really make it reusable.

My general work flow is using SAS to pull data from multiple sources, combine and run the data through some series of logic/calculations. Then take the resulting data, copy to excel for some additional analysis or report. This might be for a monthly/quarterly report or an analysis that needs to be update with the additional runout of data.

But these steps are all tightly coupled together. If I want to rerun the same logic on a different data set, or an updated data set I will copy and paste all of the files, update the queries. I have no way to bundle them together so that I can easily reuse with different data sources, or refreshed data.

Really want I want is someway to encapsulate different sets of data transformations/calculations into to functions to reuse them in different contexts and among different people.



Look at my EasyMorph (http://easymorph.com). It's a visual replacement for scripted data transformations. People use it to replace SAS and Visual Basic scripting. It also allows creating reusable modules. Contact me at <hnusername>@easymorph.com if it looks interesting to you.


Hey, I clicked through, read the tutorial, got excited about your examples.. tried to download and found out it was windows only! I would have totally evaluated it further if there were an os x/linux option.


Thanks for checking it out! As we're targeting Tableau users so eventually we will release an OS X version.


Speaking of Tableau (which was founded on the concept of VizQL), how is this different? Doesn't tableau basically enable knowledge workers to create data-centric web applications?


use a virtual machine or install windows as dual boot.

If this software solves a problem you are really having, nothing would stop you.


if solving one problem involves creating another, it's good to be cautious.


Does this have an API that can be called from .Net?

I'm really liking some of the things Microsoft is doing with Power Query, but I don't like how it is (afaik) only callable from Excel or PowerBI online. I'd like similar capability, but more open, and could be called via scripting, from SQLCLR, etc.

Another big hitch: Microsoft has not published proper API's for manipulating PowerPivot models in Excel, and I don't think they intend to - I've heard one 3rd party has reverse engineered the API's (you can decompile the .Net binaries, but I haven't had the time to look at it yet).


Would you perhaps have any info/reference about where I could learn more about this reverse engineered PowerPivot API? This sounds pretty exciting. :D

To expand on my question, I'd heard (from Rob 'PowerPivotPro' Collie, former product manager on the project IIRC) that the core had been written in 'unmanaged code' (probably C++?), so I believe reverse engineering it would be a significantly larger effort than just opening some of its DLLs in say DotPeek, at least from as far as I've been able to tell.


The PowerPivot engine itself I imagine is in unmanaged code, but the code that just writes datasets and whatnot to the model is (from what I've heard)managed code, and indeed you can decmpile the libraries and see all sorts of things, I've only looked around for about 10 minutes or so. And I had just read on some obscure thread that someone had successfully found the undocumented API call to write to the model, which is what I'm wanting to do - but I don't even know what the product name is that supposedly does this, sorry.


Right, makes sense, I'll try and check out what's available then. :)

By writing to the model, you mean programmatically adding new measures or the like?

My interest is in programmatically querying models using DAX, though to this end I'd also look to look in the direction of Microsoft's DirectQuery mode in SQL Server which supposedly did DAX-to-SQL conversion.

If one could use such a conversion plus MDX to start querying models on an Apache Spark cluster through pivot table/chart interfaces...


Not even measures, I'm just wanting to be able to create tables, define relations, etc, with the accompanying sql or m script. I'm hopeful they'll let us do that some day, but I still don't quite believe they've changed their stripes entirely.


Currently EasyMorph supports integration through command line only. We do not plan having API for the desktop client, but we will definitely make EasyMorph Server API if we reach that point.


We use Pentaho Kettle for those kinds of transformations. It's FOSS, and connects to a whole bunch of programs and formats.

It's a graphical tool - you drag-n-drop modules, then configure and connect them, though it can also run scripts (it has JavaScript, Java, Bash and Ruby support, besides SQL, of course) - but after configuring the transformation/job, you can also run it on the terminal, which is useful for periodically re-running it.

http://community.pentaho.com/projects/data-integration/


I've been doing a lot of work with Kettle as well, and it is a handy tool (albeit with a few warts).

What I think would be handy for use in an organizational setting, where "business users" might want to use some of the transforms, would be a way to publish transforms somewhere, making them discoverable and accessible to others. I don't want to make it sound like I'm talking about UDDI or anything (although, thinking about it, maybe you could use that), but just an easy way for a Joe Business User to get a list of available transforms, some explanation of what they do, what input they take, what they output, etc. And maybe a way to make changes to the "small stuff" (like the input and output path, for example) without having to load up Spoon and edit the ktr that way. Since transforms can be parameterized, that should be doable...

You could also picture combining this with something like a Yahoo Pipes like web interface, to let you define your own chains of transforms and operations as well. And hell, a web-based interface for editing ktr files would be a pretty interesting thing as well, if somebody would build it.


Have a look at Alteryx (http://www.alteryx.com/) - it's pretty close to what you're describing, I think.


I haven't used it extensively, but SQL Server Integration Services (SSIS) looks like it does a lot of the things you're talking about.


It does. There are other ETL tools as well.

https://en.wikipedia.org/wiki/Extract,_transform,_load#Tools


The databricks platform should solve exactly your problem - reusable data pipelining/transformation. I saw a demo of it last night and it was extremely slick. Their product is amazing, it makes data pipelining incredibly easy compared to setting up a hadoop cluster and running hive/etc. (I don't work for them - but if any databricks employee sees this, please hire me!) It runs on a spark cluster over AWS, which is much more modern and powerful than SAS/excel/sql. Since you know how to program already, it shouldn't be too hard to pick up spark (even has python bindings)


@rgoddard - May be a bit overkill but check out Immuta (www.immuta.com). Its a data platform, built for data scientists, that enables you to query across many disparate sets of data using familiar patterns such as SQL, file system, etc. Our SQL interface allows you to hook to Excel, Tableau, Pentaho...so you could write your abstracted logic and connect to many data sources or mashed up analytic results. contact me at matt@immuta.com if you're interested after reading through the site.


We are having a similar set of issues where I work (insurance industry). Always looking for folks to chat with/discuss similar issues


Is there a way to get in touch with you?


Sure email is ios at arrowheadgrp.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: