Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: DTL: a language and JavaScript lib to transform and manipulate data (getdtl.org)
63 points by jk0ne on April 20, 2022 | hide | past | favorite | 20 comments
DTL is a project that began it's life as part of a another piece of software for a startup I founded a few years ago. For that project I needed a way to describe how to rewrite data in a portable way. I needed to be able to create the rules on the fly and store them in a database, I needed them to be able to describe transformations I hadn't thought about, but also needed them to be safe and predictable. Though the startup didn't survive, the language I made was so useful to me I felt I had to extract it and make it usable to everyone. DTL is the result. Though the npm module is relatively new, the language itself has been in use in production systems for years. Over the past couple of years I have been working to make it more accessible and useful to newcomers. Though it's really powerful, I tried to make it easy to use and simple to understand so that you can get up to speed quickly and use only as much as you need.

To summarize: DTL is a javascript module and related CLI tools that are really handy for transforming data from one format to another. It's made to allow you to specify your transformations as data (JSON by default) which means they are easily shared from frontend to backend and vice versa, as well as making them easily stored in databases, etc. It can be used as part of your project to transform data between APIs, between the frontend and your database, etc. and can do simple mappings as well as complex calculations. It can also be used for validation and is really handy for extracting useful information from large / complex datasets (there are some great examples of this you can try on the website). The CLI tool (dtl) is like jq on sterooids, allowing you to slice, dice and remap csv, yaml, json or even plaintext data doing anything you can describe in a DTL transform. If you ever wished you could `grep` in complex data structures, today is your lucky day. :)

I'd love any feedback you have and if you think of anything it doesn't have that it should, I'd love to hear that too.



Looks very cool. Can see a use-case for it at Sky Ledge (IoT/operational insights platform) [1]. To date, we've been building out data ingest from devices for our customers, and for that it's easy enough for us to spin up a service (for more complex use-cases) or a Lambda (for simpler use-cases) to perform the transformation.

However, we're at a stage where we're focusing on developer experience and allowing anybody to create their own "control rooms". Ingest is a particular pain point (we don't want users having to spin up their own servers/lambdas to handle ingest).

Something like DTL where users can define their own transformation schema would be ideal. The syntax might be a bit esoteric and we'd probably want to support both uploading the transformation schema and also composing it via UI for less technical users.

[1] https://skyledge.com


That's a very similar use case to where DTL started.

We needed a way to link arbitrary APIs together without having to resort to custom code to rewrite the data between formats. DTL was born as a way to handle that translation without having to worry about untrusted code or limiting the end-user's ability to define how to remap the data. DTL can only operate on the data it is provided and can not rewrite the input data as it runs, so it is pretty safe to use in that way.

I'd love to see DTL get used in the way it was originally built for. Happy to help / talk sometime if you want.


Seems fairly similar to lpeg[0] once you go deep diving into its more advanced usages.

…which is one of my “once I get motivated enough” projects to rip the VM out of the lua library and use it standalone as a data conversion pipeline library/command line utility. The (now abandoned) python port started on that route but seems like they gave up after getting it somewhat working.

Probably a bit overkill if all you want to output is AST nodes but might be worth a look.

[0] http://www.inf.puc-rio.br/~roberto/lpeg/


Thanks. Yes, DTL's core textual syntax is described with PEG. I make use of the Peggy (https://peggyjs.org/) PEG processor to build up the AST that is used to actually process DTL.

There are C based PEG processors, which I've looked at once or twice also, but I haven't sat down to try to convert it. Mostly out of a desire to get the existing module to work well. A working module for one language is better than a partially working module for multiple. :P


Looks cool. But why use this instead of jq? You mention it's "like jq on steroids"; could you provide an example of a real-world use case where it solves a problem that couldn't be solved with jq?


There are a number of areas where DTL can be used where jq can't. The obvious ones are that DTL is a library and is therefore usable directly in your own code, both frontend and backend.

In terms of the command line, I'm sure advanced users of jq could get almost, if not all of DTL's command line functionality out of jq, that said I think DTL would make it a lot easier. DTL, I think, provides a more approachable syntax, especially when you are doing more than just extracting data as it already exists. The DTL builtins of `grep` `group` are great examples, as are `chain` `derive` and the various set-operations present in DTL.

Add to that that DTL is functional in nature and allows you to specify your own transforms as controls to the various functions, you can easily get very sophisticated with your remapping in ways that would be much more difficult (maybe impossible) with jq.

In terms of an example, the one most readily available I have got turned into a DTL test (in dtl-expressions-advanced.js in the test suite), which is to perform a soundex search on textual input data. DTL doesn't support soundex natively, but it was trivial to create it with a DTL transform, and because it's json, it can be imported into any transformation object you might need to use it in.

And because DTL is a module AND a cli, you can move your transforms back and forth between your code and your command line without re-translation, which I have found to be immensely useful... but YMMV.


jq is a DSL that is IMHO pretty hard to learn and remember. I kept trying to use it for years. I would find using a JS library much easier.


Really stoked to have stumbled upon this. HN has a way of serendipitously surfacing projects just when I need them.

CMS.gov keeps changing their data format for nursing home info and writing multistep validation and ingest is a nightmare.

This looks like I may be able to help simplify things tremendously.


This is really quite fascinating.

Can it handle JSON with trailing commas and similar? jq hating that has been something of a bugbear of mine.

(also, are you the jayk I remember from back in the day?)


Thanks.

It can. The cli can handle json5 (https://json5.org/) so trailing commas and comments are a-ok.

And yes, re: jayk. Almost certainly. :)

It has been an ambition of mine to turn this into a c lib to parse/process the syntax into an AST so it can be used in multiple languages... but I haven't been able to set the time aside for that particular batch of work, yet.


* it already does parse to an AST and caches the AST generated from an expression, so for a given expression the parsing is only done once, but that's just within Javascript. I would like to extract that out and have DTL libraries in multiple languages... but that whole 'day job' thing makes that a ways off, I expect. :D


I've been thinking about problems like this quite a bit the past few years.

Notable that SQL::Abstract v2 goes from the DWIM syntax to an AQT (abstract query tree) that's pretty much always representable as JSON and then renders from there to SQL, with the intent to enable the front end and back end of such efforts to happen in completely different environments. (have a quick look at https://metacpan.org/pod/SQL::Abstract::Plugin::ExtraClauses for the stupid shit I've been able to pull off with that architecture)

Now sri has written mojo.js he's periodically trying to nerd snipe me to do a JS/TS implementation of that stack and honestly I'm getting increasingly tempted.

Meanwhile I'm working on a DSL host designed for config languages and also hopefully REPL-controlling-a-daemon type stuff which seems like it has a bunch of aesthetic similarities to your goals (like 'syntax only means one thing' which I was really pleased to see).

What I'm currently expecting to do for the DSL host thing is to have two primary production implementations (I'm currently focusing on a reference implementation that won't be either of them, as ever I tend to pick exceedingly hairy yaks) - one in golang for the "pure go is everything" people, and one in nim since that will hopefully get me reasonably performant C and javascript backends.

Whether any of this is actually a good idea is, of course, to be determined, but it's not like that's ever stopped me before.


The SQL::Abstract stuff is interesting. DTL has it's origins in needing to do a very similar thing - to be able to create a set of transformations and go back and forth between the real expression and a UI / object representing the action to be taken... and to make programmatic adjustments to them. Well, that and a stubborn refusal on my part to resort to `eval()` ;)

There were many times when I wasn't sure if DTL was a good idea, tbh. Thankfully, nearly every time I came close to abandoning it, something else came up that DTL made easy (Or, it forced me to fix something in DTL that prevented it from being easy.) I'm fairly confident it was a good idea at this point. Or if not good, at least useful. ;)


Looks very nice, how does it compare to arquero ? https://observablehq.com/@uwdata/introducing-arquero


Thanks. :)

From what I can tell, Arquero looks to be intended to provide an SQL-like interface to querying tabular data and is in a very imperitive format, do this, then with that do this, etc.

DTL, on the other hand, doesn't require your data to be in any particular format or quantity. It is designed to work with whatever data you have and is really intended for data transformation, rewriting data between formats.

Apart from that, DTL is designed to be self-contained and portable, meaning you can use DTL transforms server or browser side, you can transfer them, you can store them in DBs. Their definition is not Javascript and what DTL transforms can do is transform input data and that's all. They have no system or javascript access, which means they are far more secure to use than the equivalent javascript code.

I personally think that for most uses DTL is suited for, it is substantially easier to understand than even the equivalent javascript code.

DTL Transforms are essentially formulas for how to arrive at your desired data, where imperative code is more of a 'do this, then do this, then do this' which is significantly harder to anticipate / understand the result of.


how does this compare to jsonata?

https://jsonata.org/


I am not super-familiar with jsonata, so someone else can probably comment better here. That said, I think the biggest difference is Jsonata is really oriented towards querying, as evidenced by it's xpath oriented syntax. DTL can do querying, but it's really made for data transformation, as evidenced by it's more JSON oriented syntax.

I also think DTL is significantly more approachable and familiar and therefore easy to learn and make use of. (I think xpath is terribly obtuse and unintuitive, personally) DTL is also made to be self-contained. A single entrypoint in your javascript code, `DTL->apply_transform()' can process any transform you can create.

DTL is extremely powerful, but it's deceptively simple, and you only need to learn as much as you need for the task at hand. It has no real special syntax you haven't encountered before, and it has no special modes. It has the capability to do things like map / reduce / grouping, etc. But it accomplishes this in a very functional-form, requiring no special syntax beyond what you would already use to get a string length, for example. DTL also has a number of builtins that are very helpful when remapping data that you would otherwise have to implement yourself. And the entirety of any DTL transform can go wherever you can put a JSON block, including in your db or a config file, which makes it significantly easier to make use of in your projects.

DTL also has a very sophisticated CLI tool, allowing you to make direct use of it without writing any code whatsoever, so you can start working with your data immediately, without having to learn much at all. DTL also has a great REPL that lets you play with the syntax, complete with full help within the REPL.

In a general sense, DTL predates Jsonata by several years (originally created in 2012) and has been in use that entire time, though I only published to npm a little over a year ago.

SO.. TL;DR:

* Made for transforming complex data in and out, not just querying

* Much easier to use and more familiar

* Really useful builtin functions

* more approachable, use as much or as little as you need

* Embeddable

* Great CLI tools

But, to be fair, I am a little biased. ;)


Has this been tested for security?


It has been used in applications that have been security tested.

The module itself doesn’t use the JavaScript parser at all and has no mechanism for making JavaScript calls from within the DTL expressions. This is by design and part of DTLs original requirements was to prevent arbitrary code execution, and great care has been taken to maintain that safety.

That said, it carries the same risk as using any publicly available js module, and you should definitely investigate yourself and not take my word for it.


I think if you're aiming for adoption, a IDE environment would be necessary, otherwise the hurdle to learn a DSL is too high.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: