The change of focus is perhaps hard to understand without more context. Basically, there are two really different modes of programming:
Most of what we see on HN is about building applications, servers, websites etc. Big, monolithic things that take weeks to years and are deployed to somewhere else and used by lots of people. Most programming tools are built for this kind of work, where the time between writing the code and actually using it is days or months.
But the people we want to make programming accessible to are mostly knowledge workers. Their work is characterised by a mixture of manual work and automation, throw-away code and tools rather than applications. It's better supported by Excel, SQL, shell scripting etc than by the big languages and IDEs.
We realised that we can do much more good focusing on that kind of programming.
I am currently working as an Actuarial Analyst, but I have also worked several years as a programmer.
As an analyst the tools I use are excel, access, SAS Enterprise guide and Oracle SQL developer. One of the big problems I face is that we have no good way to abstract away a process and really make it reusable.
My general work flow is using SAS to pull data from multiple sources, combine and run the data through some series of logic/calculations. Then take the resulting data, copy to excel for some additional analysis or report. This might be for a monthly/quarterly report or an analysis that needs to be update with the additional runout of data.
But these steps are all tightly coupled together. If I want to rerun the same logic on a different data set, or an updated data set I will copy and paste all of the files, update the queries. I have no way to bundle them together so that I can easily reuse with different data sources, or refreshed data.
Really want I want is someway to encapsulate different sets of data transformations/calculations into to functions to reuse them in different contexts and among different people.
Look at my EasyMorph (http://easymorph.com). It's a visual replacement for scripted data transformations. People use it to replace SAS and Visual Basic scripting. It also allows creating reusable modules. Contact me at <hnusername>@easymorph.com if it looks interesting to you.
Hey, I clicked through, read the tutorial, got excited about your examples.. tried to download and found out it was windows only! I would have totally evaluated it further if there were an os x/linux option.
Speaking of Tableau (which was founded on the concept of VizQL), how is this different? Doesn't tableau basically enable knowledge workers to create data-centric web applications?
Does this have an API that can be called from .Net?
I'm really liking some of the things Microsoft is doing with Power Query, but I don't like how it is (afaik) only callable from Excel or PowerBI online. I'd like similar capability, but more open, and could be called via scripting, from SQLCLR, etc.
Another big hitch: Microsoft has not published proper API's for manipulating PowerPivot models in Excel, and I don't think they intend to - I've heard one 3rd party has reverse engineered the API's (you can decompile the .Net binaries, but I haven't had the time to look at it yet).
Would you perhaps have any info/reference about where I could learn more about this reverse engineered PowerPivot API? This sounds pretty exciting. :D
To expand on my question, I'd heard (from Rob 'PowerPivotPro' Collie, former product manager on the project IIRC) that the core had been written in 'unmanaged code' (probably C++?), so I believe reverse engineering it would be a significantly larger effort than just opening some of its DLLs in say DotPeek, at least from as far as I've been able to tell.
The PowerPivot engine itself I imagine is in unmanaged code, but the code that just writes datasets and whatnot to the model is (from what I've heard)managed code, and indeed you can decmpile the libraries and see all sorts of things, I've only looked around for about 10 minutes or so. And I had just read on some obscure thread that someone had successfully found the undocumented API call to write to the model, which is what I'm wanting to do - but I don't even know what the product name is that supposedly does this, sorry.
Right, makes sense, I'll try and check out what's available then. :)
By writing to the model, you mean programmatically adding new measures or the like?
My interest is in programmatically querying models using DAX, though to this end I'd also look to look in the direction of Microsoft's DirectQuery mode in SQL Server which supposedly did DAX-to-SQL conversion.
If one could use such a conversion plus MDX to start querying models on an Apache Spark cluster through pivot table/chart interfaces...
Not even measures, I'm just wanting to be able to create tables, define relations, etc, with the accompanying sql or m script. I'm hopeful they'll let us do that some day, but I still don't quite believe they've changed their stripes entirely.
Currently EasyMorph supports integration through command line only. We do not plan having API for the desktop client, but we will definitely make EasyMorph Server API if we reach that point.
We use Pentaho Kettle for those kinds of transformations. It's FOSS, and connects to a whole bunch of programs and formats.
It's a graphical tool - you drag-n-drop modules, then configure and connect them, though it can also run scripts (it has JavaScript, Java, Bash and Ruby support, besides SQL, of course) - but after configuring the transformation/job, you can also run it on the terminal, which is useful for periodically re-running it.
I've been doing a lot of work with Kettle as well, and it is a handy tool (albeit with a few warts).
What I think would be handy for use in an organizational setting, where "business users" might want to use some of the transforms, would be a way to publish transforms somewhere, making them discoverable and accessible to others. I don't want to make it sound like I'm talking about UDDI or anything (although, thinking about it, maybe you could use that), but just an easy way for a Joe Business User to get a list of available transforms, some explanation of what they do, what input they take, what they output, etc. And maybe a way to make changes to the "small stuff" (like the input and output path, for example) without having to load up Spoon and edit the ktr that way. Since transforms can be parameterized, that should be doable...
You could also picture combining this with something like a Yahoo Pipes like web interface, to let you define your own chains of transforms and operations as well. And hell, a web-based interface for editing ktr files would be a pretty interesting thing as well, if somebody would build it.
The databricks platform should solve exactly your problem - reusable data pipelining/transformation. I saw a demo of it last night and it was extremely slick. Their product is amazing, it makes data pipelining incredibly easy compared to setting up a hadoop cluster and running hive/etc. (I don't work for them - but if any databricks employee sees this, please hire me!) It runs on a spark cluster over AWS, which is much more modern and powerful than SAS/excel/sql. Since you know how to program already, it shouldn't be too hard to pick up spark (even has python bindings)
@rgoddard - May be a bit overkill but check out Immuta (www.immuta.com). Its a data platform, built for data scientists, that enables you to query across many disparate sets of data using familiar patterns such as SQL, file system, etc. Our SQL interface allows you to hook to Excel, Tableau, Pentaho...so you could write your abstracted logic and connect to many data sources or mashed up analytic results. contact me at matt@immuta.com if you're interested after reading through the site.
I'd divide it as essentially "User of Packages" vs. "Writer of Packages", and of course, the dichotomy is not actually a clear one. But the choice was to suggest that there are some people for whom the programming language and its libraries are not an end of itself, and "Just crack it open and write your own thing for X..." is essentially a non-starter.
I think it's useful to distinguish between the two groups, because not only do they have different skill sets, but they have different motivations. For example I will never directly be evaluated on the performance or style of my code in the way a programmer might be - only the paper that code helped me write.
Was there a problem inherent to building large applications that you found intractible, or is the shift solely due to focusing on entry-level accessibility?
We built a Foursquare clone recently and the BOOM guys built an extended version of HDFS and Hadoop (http://db.cs.berkeley.edu/papers/eurosys10-boom.pdf). It works out pretty well. The shift is not really about accessibility either - I've watched people do some pretty advanced data work in fields like physics and biology.
It's more about making computers into personal tools. If you look at the tools the average person uses - email, excel, google etc - they all work really well individually but they are really hard to extend or compose. Each application is a world unto itself and doesn't play with the outside world. What would really help people work is not the ability to build their own applications but the ability to move data around and glue tools together. It's kind of like applying the unix philosophy to office suites.
The shift is primarily due to the fact that relatively few people seemed to want to build large applications, including ourselves.
There are definitely some differences between building large apps and these more communication/analysis tasks, but we think the foundation itself applies to both. The language is an adaptation of the Dedalus[1] semantics, which the BOOM lab did some amazing things in distributed systems with [2]. If it can build a clone of hadoop, chances are it can build most things. We've built a number of our compilers in Eve, several of our editors were bootstrapped, we've built numerous examples, most recently a complete clone of Foursquare. Before we'd want others to try and do that though, we need our tooling to get a bit better. We expect that the Eve editor will get there eventually kind of out of necessity - we're going to bootstrap a lot of it this time too, starting with the compiler.
My take from the abstract of Dedalus is "and adds an explicit notion of logical time to the language"
If you're interested in building a tool that relies on distributed communication and data flow it makes sense to bake a notion of logical time into the the system. Boolean logic has no notion of time. If you propose x != y say, you could be saying that throughout the lifetime of the system x is never equal to y or you could be comparing x to y at this instant in time. It depends of course if these are constants and/or variables.
Type theory shows that different logics map to different type systems so what may be holding programming back is that the logic of a system is not _dynamically_ selectable as the system evolves. Most (all?) programming languages have a simple boolean logic, mutable state, and just tons and tons of syntactic sugar on top of that. Obviously languages like Haskell and Clojure are more advanced (algebraic data types in the former and immutable data structures in the latter) but they still have a fixed/static way of being in the world if you know what I mean.
Natural language shows us that humans use many different types of logic contextually. Logic is not monolithic, maybe Eve is an admission of this?
Sorry if this makes no sense, it's just a hunch that's been percolating for a while.
One of the things we are still working on is expressing non-monotonic logic nicely (things like "birds can fly, but penguins can't, but Harry the Rocket Penguin can"). It's unpleasant in standard datalog but I think we can provide a nicer interface.
10 minutes into the Rich Hickey video and I can see why you responded to me with a link to it. This is indeed what I'm getting at; so many languages use fundamentally the same underlying logical and state model. Rich mentions single-dispatch, stateful OO. To that I would add boolean logic. Our systems or so riven by it we don't even see it. And I reckon it doesn't have to be that way! I can totally see why Eve is written in Rust from the 10 minutes of this talk that I've seen, and I can see that it is the incidental complexity of managing the lifetime of objects in your head in C++ that have forced this shift. Or, as per Hickey, Clojure-wards.
Still though, both Rust and Clojure, both presume an omnipresent bivalent atemporal logical discourse. If you're working on a different logic (or sets of logic?) in Eve then why not make them dynamically user-selectable at run-time in an intuitive manner :) Granted, I have _no earthly idea_ in practise what this means but when you reflect on how humans manipulate concepts internally you see that we have the machinery for this built into us -- or learnt somehow at a very early age. Tapping into this fluid logical apparatus would be ever so neat.
Is the endgame here making eve applications automatically distributed or parallelized?
I ask because the monotonic logic that Daedalus excels at expressing is quite limiting. Unless you are in an execution environment where operation ordering/synchronization is expensive (i.e. among a set of distributed processes) - the nice order-independent properties that CALM analysis gives you don't really buy you much.
I can't see us using CALM for anything in the near future. The focus right now is just on making the basic programming experience smooth. We chose Dedalus because the discrete, synchronous model of time makes it easy to separate things which are truly stateful from things which are not and to handle both in a live, interactive environment. CALM is just a bonus.
It seems like this might be a good fit with what Sandstorm is doing (making personal servers easy to use). Even if I'm writing a program for myself, I still want to access it from multiple computers and share the results.
Granted, shell is useful for concise interactive one-liners. Granted, Java
is a terrible replacement for shell. But there's no kind of programming where shell is better than Perl, Python or Ruby - "big languages" which, unlike Java, were designed to be useful for scripting by people who aren't full-time programmers. Because they are "big languages" they also have the property that you can build bigger things on them, starting from your messy prototypes. It's better not to even try that with shell or Excel.
That is a much better and saner explanation than these "revolutionizing programming"-grandiosity talks.
Eve will always tend to be missunderstood at Hackernews or Reddit. It simply is not aimed at professional developers and intended to build production systems.
My opinion is that programming is literally defined by editing text files with difficult to comprehend source code. Anything that is easier or deviates from that is by definition not programming and therefore something users do.
The rationalization is that anyone using an easier tool to program computers without as much code must not be able to use code. Therefore us programmers are better than them.
Its very similar to the earlier era of punchcard programmers scoffing at assembly language programmers. Or hand-tool craftsmen sneering at mass-produced component-based manufacturing.
That infantile belief system will persist until super-intelligent AIs revise it or (maybe) the next generation wises up.
Most of what we see on HN is about building applications, servers, websites etc. Big, monolithic things that take weeks to years and are deployed to somewhere else and used by lots of people. Most programming tools are built for this kind of work, where the time between writing the code and actually using it is days or months.
But the people we want to make programming accessible to are mostly knowledge workers. Their work is characterised by a mixture of manual work and automation, throw-away code and tools rather than applications. It's better supported by Excel, SQL, shell scripting etc than by the big languages and IDEs.
We realised that we can do much more good focusing on that kind of programming.