The blog is just hugo so it's 100% static files over nginx.
The search engine is serverside-rendered mustache templates via handlebars[1], via served via spark[2]. It's basically all vanilla Java. I do raw SQL queries instead of ORM, which makes it quite a bit snappier than most Java applications. The sheer size of the database also mandates that basically every query is a primary key lookup. The code is written around that constraint.
Although the search engine is a bit on the slow side since it's routed through cloudflare and I think I'm relatively far away from the closest datacenter so it adds like 100ms to the load times.
Love it. I've seen so many cases where engineers with just basic SQL knowledge (like myself; I'm no JOIN god) can run circles around the queries ORMs generate.
ORMs are great to spare you from writing heaps of error-prone boilerplate mapping code.
Which is kinda what they are for, that's why they're called "Object-Relational Mappers". Not "Object-Relational Query Generators". Because they suck at the latter.
ORMs help you the first three weeks. After that it is a lead ball around both feets, dragging you down into dark pits of bad performance, incomprehensible relations, and impossible debugging.
That anyone with more than 6 months of experience still drag Hibernate up from their chest is just absolutely beyond my comprehension.
I think if you're doing basic CRUD operations on small tables with relatively simple relations, then ORM is just fine. This is to be blunt what a lot of applications do, and so a lot of them justifiably use ORM.
That said, the moment you leave the small table simple relations space (by e.g. having a table with a quarter billion rows), then ORM is not a good choice.
In all seriousness, it's great to see something written in a "boring" language like Java, which seems to get a lot of hate in developer circles, hover at the top of HN.
Java really can perform amazingly well, especially if you minimise the use of unneeded libraries and frameworks. Super curious to see how your stack evolves as you get more load.
Best of luck to you on the journey!
Ps there's truly a world of difference between "Spring Boot Developer" and "Software Engineer with Java experience". I suspect a lot of people who hate Java or think it performs badly have only worked with the former group of people.
I'm a believer in the conservation of cool. You can build something cool in a boring language, or build something mundane in a cool language.
Java gets a lot of shit, some of it is merited but a lot of it isn't really fair to the language. There's a lot of Java developers that are kinda shit-tier copy-paste developers developing shit-tier copy-paste applications, because the language is so forgiving as to accommodate that, but it's also a competent language that you can do seriously impressive things with.
You can be insanely productive in Java because it's extremely stable and mature. You almost never have to deal with library churn or other upstream changes that urgently needs to be fixed. I can think of exactly two instances in my professional life that's happened, migrating off Java 1.8 and the oh-shit moment of needing to patch log4shell.
Over 10 years of professional java (right from ejb madness of early 2000s through the last few years of spring annotation hell) taught me the balance to using java in personal (or startup) projects is simply to stick to core java features and the bare minimum libraries and being very skeptical to the hot air coming from outside the community.
My code from late 2000s, with very few modifications to keep up with modern java syntax, writren in bare java talking to postgres in just raw sql runs circles around anything new I've tried or built with for modern web application backend stacks.
IDE support is top class. Thread are just awesome when done right. Static types make code incredibly easy to read and reason about even years later. JVM has been super stable forever. A whole lot of features I need are just baked right into the language, but not obvious at first. Mvn just works. And my reluctance to external libraries actually made me write the logic myself making me much better understand related concerns.
Congratulations and I wish more cool projects picked Java as their language, but I see them use Oracle's ownership as the strawman argument against it. I don't know enough about that.
I'm on 18, but will upgrade to 21 when it drops. Not really felt the need to upgrade to 19 or 20 as they haven't provided any features I'm interested in.
Yeah the static stuff being fast is less surprising - it was mainly the search results page that astounded me.
> via served via spark[2]
Had not heard of this Spark (only the other Apache one). Will definitely take a look.
> Although the search engine is a bit on the slow side since it's routed through cloudflare and I think I'm relatively far away from the closest datacenter so it adds like 100ms to the load times.
I've hit the CF loading screen which introduces a big delay, but when I don't see that the loading is really instantaneous.
Yes, and it was not that well designed to be honest... the successor is quite a lot nicer and it's called Javalin[1].
Same philosophy but just got things right where Spark, being the "first" (in the Java world, using the design inherited by Sinatra[2]) had a few design issues.
For anything handling user input I'd be concerned about maintenance status for fixes. Even beyond the codebase itself, even just maintaining an up to date pom.xml can be important - seems theirs was last updated in July of last year. Very brief manual browse of it shows potential exposure to things like https://nvd.nist.gov/vuln/detail/CVE-2022-25647 - not sure if that's reachable in the codebase but there could be others.
Curious: Do you have plans to bring in FOSS LLMs for summarization and Q&A style queries anytime in the coming months?
Btw, I was half expecting that you quit because of FUTO grants (saw your post on their forums), but I guess it wasn't that. Either ways, rooting for you!
Not in the short term, if for no other reason than not having the hardware for it. Maybe down the line. Would be neat if someone who was into that sort of stuff wanted to integrate with Marginalia though. I've got a free API ;-)
I do think LLMs has the potential to integrate well with search engines and I'd love to see a sort of open source search ecosystem emerge with different projects collaborating to exceed the sum of their parts.
Yeah I was in talks with FUTO and they agreed to help the project out a bit, I won't say more until I have money in hand. It's taking a while but it's not on their end, I just need to sort out some legal stuff first.
The search engine is serverside-rendered mustache templates via handlebars[1], via served via spark[2]. It's basically all vanilla Java. I do raw SQL queries instead of ORM, which makes it quite a bit snappier than most Java applications. The sheer size of the database also mandates that basically every query is a primary key lookup. The code is written around that constraint.
Although the search engine is a bit on the slow side since it's routed through cloudflare and I think I'm relatively far away from the closest datacenter so it adds like 100ms to the load times.
[1] https://github.com/jknack/handlebars.java
[2] https://sparkjava.com/