Why does neither Microsoft nor Google nor Apple take the lead and offer free LLM answers like Google offers free search results?
Is that because there are simply not enough GPUs out there to do this at scale?
If so, it will become really interesting once that constraint goes away. There might be a shift in the search space like there was a shift from analog to digital photography.
Search wasn't broken Google needed a business plan and chose web ads. That is what broke search. The bifurcation of interests between selling ads and serving search results.
Who cares what's real if it's useful? We use all the times fictions that are not "real" but are useful, such as law codes, corporations, religions, etc.
When you're doing scientific research, or looking for public transport; the difference can decide your career's future or likelihood of you going home at night.
Well, too bad that that social construct supplies you the tokens that allows you to buy food.
Or do you imply that food is not real, because everything leading to it is also unreal, even if it's physical (e.g.: money). If that's the case, try to survive without it for a couple of months.
> and I've not seen yet Google maps fail to find me a public transport
This is because Google Maps doesn't use LLMs, but standard and deterministic indexing methods working on hard data provided by governments.
Even if there's enough GPUs they also needs to be available at a cost that makes it financially viable, both in terms of purchasing price, but also power and cooling. Environmentally it might not look to good to offer LLM answers to a wider audience, there are already concerns about the water consumption of ChatGPT in some regions, see: https://fortune.com/2023/09/09/ai-chatgpt-usage-fuels-spike-...
This is about storing data in the form of "Facebook user XYZ looked at a page about travel to Antarctica" and the reason Facebook wants to store such data is to show them travel offers when they read their Facebook feed?
If so, can I download this data about me? Is there a way to download everything Facebook stores about me and then I will see all the websites and pages I visited that Facebook knows about?
It is about so much more. It is any personal data that facebook store that is not strictly necessary to provide the core products (such as messaging or sharing content).
You bought some medicine online and the store shared that with Facebook. It can be location data from images. In theory it could also be things like "you messaged a person who leans towards a particular political side". All is data that can be data mined and used to calculate the highest revenue advertisement at any specific time you load a Facebook page.
Two high profit questions that historically advertisement networks want to find out is when someone is expecting children, and when they are most likely to go to Disneyland.
> If so, can I download this data about me? Is there a way to download everything Facebook stores about me and then I will see all the websites and pages I visited that Facebook knows about?
If you live in the EU, you have the legal right to ask for all information Facebook has on you.
I would guess that Facebook considers the “about travel to Antarctica” not part of your information on Facebook, though. They’ll tell you you visited a specific page, but not what they tagged that page with, even if it is your page (as I said, that’s a guess)
I think that getting a list of all the data a company holds on you is allowed as part of the GDPR yes. In my experience of such things (and I have no direct experience of Facebook specifically), but often this is very, very dull.
You say "travel to Antartica" (i.e. something meaningful and interesting to a human), but the computer says "ID:2sb374k44nmdld7394m44na7a63bba73hha3" (i.e. something meaningful to a computer)
So instead of seeing a human-readable list of things you've looked at, you get a totally unintelligible list of primary keys that are used in some embedding-space in some algorithm somewhere that is used to pick the most salient ad to show you at that moment in time.
Like I said, I have no idea how Facebook does this but I would be amazed if there was anything human-readable about the profile they've built on you. Its just too high-maintenance (and frankly pointless) to have human-readable labels for everything.
The data has a meaning to Facebook, whether it is stored in human-readable format or not. If there is a translation map or other data required to interpret your data, that should be included in a data export.
That is my point, there is likely no translation map. It's an ID for some data-point/vector/embedding used in an algorithm and likely has no meaningful human interpretation or translation.
It is not secret code for "looks at Antarctica travel pages", it is a computer generated value for some intersection of thousands/millions of variables.
For example, if a user specifically asks for a
URL's full text, it might inadvertently fulfill
this request.
So this seems to imply two things:
1: Bing has access to text on websites which users don't. Probably because websites allow Bing to crawl their content but show a paywall to users?
2: The plugin has a different interface to Bing than what Bing offers via the web. Because on the web, you can't tell Bing to show the full text of the URL.
I have to contact my ISP. That's not the open web I subscribed to :) Until they fix it, I just keep reading HN. A website which works the way I like it.
There are various techniques automated agents (eg crawlers like Google's) can use. Ethical ones are done in agreement or following the guidance of the content providers to allow discovery which suits both parties while not giving unrestricted access which wouldn't always suit the provider.
We could hypothesize that in this case BWB is employing some of those techniques while it isn't a discovery-enabling service, but rather a content-using one, and so would be expected to present as an ordinary user and be subject to the same constraints.
Nothing you couldn't do with a decent VPN, but 'Open'AI these days already achieved what they wanted from publicly demonstrating GPT, and are now more focussed on compliance with regulation and reducing functionallity to the point of minimally staying ahead of the competition in released product, while fullsteaming ahead with developing more powerfull and unrestricted AI for internal exploitation with very select partners.
In such a scenario, the true power of AI is the delta between what you can exploit vs what you competition has access to. HFT would be a nice analogy.
Option 1 is definitely true, but I don't think paywalls are the issue. Bing has a "work search" option, to index and search sharepoint sites. My bet is there's a leak between public and private search.
Maybe some sites allow search engines to bypass paywalls so the full content gets indexed, and the plugin appears to be a whitelisted search engine to these sites?
A lot of sites just implement the paywall client-side with some JavaScript and CSS, so any kind of search indexer would still see the full text regardless of the user agent or source IP.
What is the hard thing about building an open, user-friendly Reddit alternative?
Hosting the posts shouldn't be that hard. Storage is so cheap these days. Is it the legal aspects of handling user generated content?
Ranking the posts is another issue. Is that where the value of Reddit lies?
Maybe one could build some hybrid thing which capitalizes on existing structures? I could imagine a frontend which only shows posts by users who signed their posts via their Hacker News accounts. Aka they sign their post with a private key and publish the public key on their HN profile. This way, a new Reddit alternative could benefit from the karma distribution of the best community on the web today.
Hosting the content could maybe be done via one of the new decentralized systems like Mastodon, Nostr or Bluesky? Those inherently have open APIs, so it would be easy to build a frontend which aggregates the content into one simple UI.
* Hosting costs. Reddit was very lucky to have imgur pick up a lot of its bandwidth in its early days, but free image/video hosting sites are cyclical: absent a benevolent billionaire, the costs will rise with popularity, and the site will eventually need a source of revenue, which will introduce friction and start its inevitable decline in popularity.
* Moderation. Always a highwire tightrope act. Most Reddit spin-offs of the past several years have been focused on minimizing moderation, which ends up attracting people who tend to get banned from other places before the site gets a chance to form its own identity and pick up steam.
* Network effects, which are basically a lottery. You can have a scalable service with great UI, and a solid moderation story, but you still need to get lucky and catch lightning in a bottle to take off. This is common knowledge, which makes it even harder to justify starting to develop or use a new social medium.
Personally, I like places like HN, which focus on good moderation without trying to scale up. We are blessed to have dang, but if the site were structured more like Reddit or a forum with different boards, I bet it would become unmanageable very quickly.
> Most Reddit spin-offs of the past several years have been focused on minimizing moderation, which ends up attracting people who tend to get banned from other places before the site gets a chance to form its own identity and pick up steam.
Mastodon is a good example and counterexample of this trend. Gab was the biggest Mastodon instance, largely populated by the kinds of people pre-Musk Twitter banned or limited (and their followers.)
But the second (post-Musk) wave wasn't people who got banned, it was people leaving because they didn't like Musk and/or his changes to Twitter. And Reddit's own userbase came from Digg in much the same way.
Imagine if Mastodon had been easy to migrate to, Twitter would have collapsed like a popped balloon.
Reddit has a natural administrative/scalability partition boundary though, which makes federation much easier. I think a federated reddit would work better than Mastodon has.
>Reddit has a natural administrative/scalability partition boundary though, which makes federation much easier. I think a federated reddit would work better than Mastodon has.
To the point that quite a number of subreddits that have been banned, or that have been voluntarily shut down, have already set up their own clones on their own domains. t_d, drama, and fds are three notable examples that I can think of off the top of my heads.
HN is functionally like a single topic forum, which makes it a little easier to have rules about what is generally allowable content. Many single topic forums still exist, though in recent years have had to take a back seat to Reddit et al. Maybe if Reddit goes completely to pot, people will look them up again.
That doesn't apply by default. While the US is enabling people to be more litigious, EU will mostly slap you with fines for doing bad things. You have the option to... not do bad things. (Yes, I'm sure there's some odd counterexample somewhere, but this holds in general)
The problem is law doesn't fine for doing bad things, it fines for breaking the law which is not exactly the same.
In my previous company for a project we had to hire lawyer for more than a week just to determine if we were breaking law, even though the site didn't do any bad things by normal people standard.
In the end the suggestion was to just slap a consent form with consent rejection redirecting to some other site, which didn't made sense to me, but yeah that's what the law says.
"Ask HN: Sites with the quality of Hacker News, but for more general topics?" (https://news.ycombinator.com/item?id=34302827) lists a couple. There's regular "what is the HN of <insert industry>" Ask HN questions, somebody would make a list.
The hardest part is going to be the community itself. Reddit (the board and shareholders) are betting that the community is too large to migrate to a better alternative.
Every time you upset your user base, you give them an opportunity to leave. Or, you turn your advocates into neutral. They stop being willing to do free work for you.
Reddit is dependent on free work. Moderators are a significant portion of that process, and they are the ones who depend on the API the most. If your moderators decide to do less work, your community starts going down in quality. If the community goes down in quality, they will make the decision that a smaller community is better than a poorly run community and someone else will capture that use case.
The question is if that will happen before an IPO. Given the climate, that IPO may be two years away.
Not only does it rely on the labor of mods and dealing with a lot of bad behavior. However, the admins also enforce a "code of conduct" on the mods and threaten the loss of the sub for lack of compliance.
Do you visit niche or speculate topics? They have far better communities than any of the default communities. My observation is comment quality scales with complexity of the topic.
It’s extremely hard to rebuild most of these high quality communities elsewhere. Often, it seems only the troublemakers/outliers are willing to move to another platform. They simply become a stain on the alternative platform.
Yeah. A new platform needs to offer something fun or interesting in its own right to attract users. Otherwise it's going to be an island prison for the worse members of the old community.
A golden rule of using Reddit has been to unsubscribe from all the default subreddits and subscribe to niche interest ones. /r/politics etc are huge but they’re dumpster fires.
There’s essentially two reddits in one: the default one and the enjoyable one. You don’t get the enjoyable one by default.
The issue is also that the right people need to come.
Generally speaking, the issue with new startup competitors for social networks is that the first people they attract a critical mass of are people who got banned from the other sites for spam or excessive toxicity, and once they’re there they spook potential new people. It’s the online version of the “Nazi bar” problem.
The internet is still young, but full of once successful than failed social media sites. Myspace, digg, Friendster, Orkut, Google+, Vine, LiveJournal, etc..
I promise you, Reddit won't be around in 50 years. And it may not be much in 5 the way they're disrespecting their core audience.
There are reddit-likes on the fediverse (which means they are using the ActivityPub protocol). The fact that a software like kbin works on ActivityPub means that users on other fediverse software like Mastodon (twitter-like) can interact with posts on kbin. Discussion, interaction and discoverability are thus not limited to just the small community on kbin.
https://kbin.social/
> What is the hard thing about building an open, user-friendly Reddit alternative?
One already exists in Lemmy.
I suspect a big hurdle is dealing with all of the laws & regulations that exist in the United States. I've already seen one good sized mastodon instance vanish forever because hostile actors flooded it with actual child abuse material. And despite #fediblock, new instances with hate speech spring up all the time.
Bootstrapping an alternative. Growing it from nothingness, being easily welcoming but not overrun by spam and malicious content. Getting to a critical mass before losing the goodwill with users and runway with whoever pays for this. This is a very significant moat, one that makes Reddit's leadership believe they can turn the screw without worrying about competition for now.
> What is the hard thing about building an open, user-friendly Reddit alternative?
This 'alternative' needs to be able to attract and move both new and existing Reddit users, replicating its network effect and retaining them so that they do not go back to Reddit.
> Hosting the content could maybe be done via one of the new decentralized systems like Mastodon, Nostr or Bluesky? Those inherently have open APIs, so it would be easy to build a frontend which aggregates the content into one simple UI.
Before these generative AI systems this was not a problem and free APIs on social networks was fine. Now having free open access APIs on social networks doesn't make that much sense anymore thanks to generative AI.
It just enables these AI systems to easily train on their platforms at little to no cost to accelerate the grifters, scammers, and bots flooding and overloading the social network which also increases the costs of spam, moderation, servers and low quality content. It doesn't scale for humans alone to reduce it once API access is totally free, whether if it is on the largest instances or even with another Reddit alternative.
This is the point critics are missing when they argue about Bitcoin's intrinsic value. Bitcoin has no value other than its network effects. Facebook, Reddit, Bitcoin, etc. are valuable because of their usage. Which makes them hard to replace even if they're not perfect.
creating momentum is hard, even with a better product. Even before the internet, the slightly technologies did not necessarily get enough traction to unseat the incumbents
Any indie makers here doing their taxes on their own?
So far, I have always worked with a tax consultant. But I wonder if it really makes sense to pay thousands of Euros just to put numbers into forms. It feels like it is something that one should be able to automate.
QuickBooks and GnuCash handle the accounting, which supplies the data needed for your taxes. However, they won't spit out your tax forms as that's a separate task and not likely something open source is going to tackle anytime soon given how convoluted, arbitrary and constantly changing tax laws can be. Most open source developers would rather go in for multiple elective root canals before dealing with that.
Yes, it's possible to do your own taxes but it's a headache the first time or two you do it. I have never found any open source software to help with it beyond things like LibreOffice Calc to do various tax worksheets.
There was taxbird for filing (there is another application with the same name that does something different, you want(ed) https://de.wikipedia.org/wiki/Taxbird ). But that has been killed by the tax authorities: They shut down the interface used there, so nowadays only proprietary applications exist.
But from experience, don't bother with any software, German taxes are sufficiently weird that only the most trivial cases will really work out, even if you pay lots of money for it. Get a tax accountant. It isn't just putting numbers into forms, it is more the problem of structuring your life around possible tax savings. Software won't help with that.
If you only ever need filing and putting numbers into forms, you can file your taxes on the official website: https://www.elster.de . You'd need a login anyways to authorize your tax accountant or enable your software to submit its data.
Does it? If you pretend IPv6 doesn't exist, sure, but that's like pretending UDP doesn't exist because all of your applications use TCP, or only logging traffic going to port 80 because you don't have HTTPS yet.
Every firewall I've come across has a default deny rule for incoming IPv6 traffic, giving the firewall the same properties as any IPv4 network. Host firewalls are the same; anything ranging from Windows Firewall to UFW and firewalld have presets to block all traffic except for the applications you've whitelisted. Once you get to huge enterprise routers managing routable IPv4 addresses and IPv6 addresses the situation may become different, but it's still not that much overhead.
The biggest problem with securing IPv6 seems to be ignoring it assuming that makes it disappear. If you configure your firewall to drop all IPv4 traffic not on a whitelist but somehow manage to forget to add the same rule for IPv6, you should re-evaluate your networking knowledge and maybe get up to speed with how the internet has changed since 2015.
Its also all kinds of code that interacts with the internet in all kinds of ways. Extending all that code to two kinds of IPs, writing tests, setting up two types of IPs in development, staging and production, monitoring real life implications ... that would be a huge cost with no benefit at all.
If you're writing code, you'll be either manually specifying the IP address family (so there's no real IPv6 risk) or you're probably using middleware that does all the hard parts for you anyway. If anything, I'm annoyed how hard it is to get a socket listening on both IPv4 and IPv6 in many low level libraries. I just want a socket to receive data on, who cares what address family it's from.
In my experience, IPv6 Just Works (tm) with modern software. There are some mid 00's frameworks for blacklisting abusive hosts that can't parse IPv6 addresses, or don't understand the /64 subnet you need to treat as a single IP address, but that's all I've ever run into. If anything, that gave me an excuse to finally get rid of an old Perl network filter running on my server.
I'm not sure how many tests the average piece of software needs that deals with the type of address family connecting. I suppose it matters if you want to test your rate limiting middleware or your logging library? That should only matter for the vendored code of course because modern libraries all have those tests themselves already. It's not like you need to run and write every test twice, only one or two very specific subcomponents if any.
If you're writing firewalls or kernels or router firmware then yeah you'll have your hands full with this stuff, but that's far from the standard developer experience. In those cases, IPv6 is a reality as much as TCP and UDP are.
To add a trivial example: if an application is coded well, it’s ready to connect to hosts both on ipv4 and ipv6, in the sense that when resolving a dns name, it will ask for addresses of any kind (unless it supports being explicitly told to only use ipv4).
So now you’re getting a record with multiple ip addresses, some of which are ipv6, but ipv6 is blocked… there you go with random connection delays and possibly timeouts.
Ipv6 exists and it’s getting more and more adoption, no matter if some people keep their head under the sand…
Including the case where you have something else in the middle already - for example, if you're fronting a website through cloudflare, then you can only have IPv6 on your server and still support dual-stack for clients:)
> but that's like pretending UDP doesn't exist because all of your applications use TCP, or only logging traffic going to port 80 because you don't have HTTPS yet.
In fairness, neither of those things would be unreasonable stances, given those conditions.
It does take more work to run dual stack. I wanted to avoid duplicating efforts when I set out to learn ipv6, so I disabled ipv4 routing on my network and just ran ipv6 with DNS64/NAT64 to provide my clients access to legacy services. In this configuration, most of my traffic was end-to-end ipv6. I tested iOS, Android, Windows, and Ubuntu clients with no issues, even my aging printers support it!
The only issue I have is the same issue found in the article, but in reverse: now that I have a securely configured ipv6 network, how can I ensure that my hosts are fully prevented from communicating over a rogue ipv4 net?
It got pretty much unnoticed on HN, that Europe recently voted to make all crypto payments illegal unless the seller collects the personal data of the buyer. Independent of the amount. So there will be a track record of everything bought via crypto.
Is it only a matter of time until cash is going away globally, and states have access to everything their people buy?
Regarding end-2-end encryption: It does not prevent a government from reading your messages anyhow. They could instruct Meta (or whichever company is in control of the app you use) to send them the the messages you write directly from your phone. Or from the phone of the receiver. Or to send them the private key from your phone. They could also ask Apple or Google to do so, since those have acceess to everything on your phone.
Everything can already be tracked via crypto, that procedure of attributing a name to it just makes the process easier.
Additionally, everything you do buy is already tracked. Even with cash.
But unlike naysayers, these things already encroaching on our lives gives us even more reason to push for stronger E2E support as a default. Assuming it's done properly, and not "I just need to ask Google for the keys".
> Regarding end-2-end encryption: It does not prevent a government from reading your messages anyhow. They could instruct Meta (or whichever company is in control of the app you use) to send them the the messages you write directly from your phone. Or from the phone of the receiver. Or to send them the private key from your phone. They could also ask Apple or Google to do so, since those have acceess to everything on your phone.
There is a huge difference between "use the encryption key you already have to decrypt this message" and "implement changes in your software that allow attacking this person".
Last I heard, US courts couldn't force companies into doing anything, only to reveal information, or to mandate secrecy. The idea of a warrant canary is 100% based on the idea that the government cannot force the company to publish a statement it does not wish to publish.
Obviously you can't pick up a grocery delivery and remain anonymous.
But in theory if you could pay for groceries with monero (you can't afaik), you could pay from the same wallet you conducted a hack from or purchased a darknet server to host leaked data with. The grocery store wouldn't know the originating wallet or any of its other activity.
Anyone who thinks grocery stores are going to allow monero payments is pretty naive, but Monero at least makes these flows possible.
For example, if Walmart decided to let people pick up orders with monero payments, you'd log in with your walmart account (linked with real name), place your order, pay to the generated wallet address with Monero, and then show ID on pickup. Walmart would have no way of knowing where the money came from.
I don't have any Monero or particularly like it (but mainly because I consider proof of work unsustainable and wasteful). But you have to admit the privacy implications are interesting, and the technology is impressive.
I think it would be much more likely that a startup comes along and makes the transaction tracking efficient, like with a trusted payment app, than finding some way around Orwellian government motives.
Is that because there are simply not enough GPUs out there to do this at scale?
If so, it will become really interesting once that constraint goes away. There might be a shift in the search space like there was a shift from analog to digital photography.