I think a bigger problem than 38% of webpages being dead is a lot of it is entities/groups/businesses now use facebook pages almost exclusively and have no other web presence outside of Facebook. In other words a Facebook account becomes a requirement to interact with them.
The same happened with forums. They're all subreddits, Facebook groups or Discord chats now. A lot of valuable information is kept hidden in those groups now, and it makes me really sad.
I love forums. I've kept the DIY Book Scanner forum online since... 2009? Recently (last two years) these damn AI scrapers have killed PHPBB over and over again. They got me kicked off my shared web hosting plan by abusing search and other forum features.
I upgraded to VPS for $500. The other admin spent 15-20 hours fixing/troubleshooting/transferring. And you know what? At the end of all this, I paid to give my data to these jerks, to keep it online for them to harvest. The forums are dead quiet.
Now I think, Discord is fine. They'll just sell the data to AI companies directly, the burden won't fall on me.
Reddit at least shows up in searches. I also think it's important not to look at the past with rose colored glasses. I think some random forum is much more likely to disappear than a subreddit.
I don’t think it’s rose colored glasses. Google saw the value of forums as a source of information when it bought and indexed Deja News’ usenet archive. A lot of pop and early Internet culture resided there. This was then turned into Google Groups, underfunded, targeted and businesses, and more or less buried.
Independent forums (phpBB, and the like) often came up on searches before this communities moved to Facebook Groups, where they’re mostly set to private due to spammers.
Similarly there was a time when Google indexed tweets more or less live, so you could find information for very recent events. I think Twitter asked for money and so that was the end of this.
Now I think Reddit, and maybe Stack Overflow, are the only things helping Google be anything more than an extremely hostile version of the yellow pages. I fear Reddit might at some point withdraw their content from Google and that’ll be the end of it.
Unfortunately keeping things up to date, secure, and free of spam is a lot of effort. Is very compelling to take your content where eyeballs already are, especially when you can let someone else take care of the hard parts for “free”
Car forums are still alive but yea the shift from thread discussions to comment and/or video discussions really kills a lot of knowledge. It’s great to find old forum posts showing you how to work on your car. It’s tiresome skipping through videos to find what you need to or even searching Reddit.
The big thing about discord is you can chat now with people but the knowledge is not in a good format to come back to later.
Sports forums too. I find that user stickiness there is pretty good because the Reddit sports subs are simply too large, and have an Eternal September/zero-friction problem, as joining it is a click away.
However, if you managed to find the forum on a search engine and took the trouble to sign up for an account, you are more likely to abide by the general vibe of the place, rather than Redditize it with shallow, meme-y comments that reliably get a lot of upvotes.
I think the challenge is finding these rare, valuable places in a sea of noise. Gone are the days that you'd stumble on them, with Google et al keeping you blasting down freeways with no chance of turning into a quiet cul-de-sac where you might see the perfect home for sale, or at least rent it for a while. I still find value and joy on the internet, but it's much rarer and typically the hold-overs from a previous era, or tenuous things like following a handful of YTers while ad blocking still works.
I'm probably in the minority of people who appreciate that trend. Valuable information being hidden means that community comes before information. If you want to gain access to other people's knowledge you have to opt-in and interact with and understand the people who made it, and that creates an incentive to contribute back and use knowledge in an appropriate way.
The open internet seems increasingly predatory and a place where some gigantic ML company just vacuums up your stuff or resells your content for ad revenue, parasitic.
I don't mind the fact and think it's honestly a natural reaction to this that people guard their information. It's sort of like a medieval monastery version of the internet where people recognize that information is cultivated rather than just some commodity you scrape off the web.
The reason why platforms such as Discord, Facebook, etc don't give open access to your content isn't because they don't want a predatory ML company to vacuum up your content, rather it's because they want to sell access to your content to the predatory ML companies. Meta already trains its own AI on Facebook and Instagram user content. Discord's business model (host an unlimited amount of data for free in perpetuity) is inherently unsustainable, so it's likely they'll start selling user data to ML companies within a few years (assuming the AI boom lasts that long).
That's a really human-centered view of what we have in forums.
I think many of us (wrongly) have a tech-centered view of online communities -- witness the multitude of "Show HN" posts that are "look! I made an online place for humans to congregate and discuss X".
The tech stack matters little (if at all), but bootstrapping the community's trust and culture (and maintaining both) are most of the heavy lifting, and the differentiator for success.
I recently (re)discovered an HN post by one of the core community moderators of deviantArt, and its success was made possible by its culture:
AI will make all that even worse.
Data staying hidden behind nice UX is VERY bad news.
May be all that will lead to an equivalent of open-source, but for data.
Not only a lot of communities are hidden because of Discord (at least with Reddit they were more discoverable), the worst part is the fact that they are unsearchable or behind a paywall.
Like the "join my discord if you pay at least 3$/mo!" is pretty innocent but you are gatekeeping a community that before was pubblic.
If we are talking about something like a content creator focused about an hobby or pc problems you can see how Google will become even more useless.
Reddit was the least bad choice between it and Discord but has failed the "i want to be a social network".
Even the idea of payment to access a community is just absurd. If I'm an integral part of a tight-knit community I can see myself participating in common expenses, but I would never pay for access. At that point you're just a consumer buying a service.
>Even the idea of payment to access a community is just absurd.
Is it?
If we put aside the common notion that "everything on the internet is and should be free-as-in-beer and fuck you if you disagree", is it really that absurd?
Communities more often than not prefer setting up some kind of filtering to weed out certain people, and a paywall is one of those filters.
I only use Facebook to stay in touch with widely dispersed family members. Nothing else. One peek a day to see what's up. Assuming you have an account, I find this makes the task much easier:
And meta keeps things endlessly. Not just a hyper compressed picture and a set of references to local files. That part of the siloed web vanishes too, just less dangly and obvious.
Are there any businessses of any notable size that are using Facebook alone? Local businesses near me have plenty of info on Google Maps. The website if they have one is usually out of date, but calling them directly answers my questions.
Also 38% of a web filled with diversity, no hidden agenda, and amateurs (in the first best of ways). This number is probably now .00001% of a much bigger, far more homogeneous web. a web 1.0 site > today's walled garden "group page".
I've been to restaurants where they only have the menu in digital and uploaded to FB. And they looked at me as if I was a weirdo when I told them I don't use FB.
Many times I recommend to my clients to use Facebook instead of their own websites. It was overkill. Often having your own website is a waste of money.
You used to be able to see a custom feed of a selected friend lists but since they removed that option the site has been completely unusable, unless perhaps you do something like remove 90% of your "friends" and groups but that would hurt usability in different ways.
Ooooh so thats what happened. I just recently restarted using my Facebook account after about 6 years of not really using it. I found it odd that I was only scrolling to android games ads, some diy videos and some rage-motivated generic posts from accounts I don't know...
I liked more the Facebook that showed me the humblebrag posts of my friends/connections (and I'm not being sarcastic)
It was very usable with custom lists up until recently. Their help pages still reference the ability to browse updates from custom friend lists (at least when I checked a couple of weeks ago) but the actual feature has been dead for a while now. Guess they didn't like that people were able minimize pointless engagement and doomscrolling.
It really is. I’m following two groups and a handful of people. I never see posts from any of them and it’s difficult for me as a software engineer to navigate the site.
> From a user perspective Facebook's feed is spam.
The topic is FB groups. They aren't spam, at least for those I'm a member. Some groups may be quiet, some are active, but I don't recall coming across spam posts from any of them. A particular group has a rule that members can promote their business once a week, enforced by the group's admin
Maybe it’s different where you are, but around here that filter would mean I could almost only patronize large chains. Small businesses have Facebook or maybe insta (which is much worse, Facebook business pages grant far more access to a non-logged-in user) and no website. Restaurants might have a barely-updated website (the updates are on Facebook) that links to some third party ordering service, maybe.
But it's clear that continuing to use Facebook in that manner will only strengthen the isolation effect. Voting with your wallet and going against the machine invariably involves some level of personal sacrifice. For me, sacrificing patronage is incredibly easy to do. There is more to life than commercialization.
My girlfriend says she only uses Facebook to interface with small stores, who use it as a sole point of contact or distribution. Let that sink in for a moment. Breaking this cycle will require hard work.
I suspect Instagram or Facebook gets them 10x the eyeballs of having a website, at 1/20th the effort, zero cost, and nearly zero skill or expertise at anything tech-related.
I suspect both can be true at the same time. In the case of Instagram it still seems silly to miss out on potential customers by only posting on a non-public platform though.
Facebook definitely makes more sense to me. It only stops me if I try to go browse back through all their photos or something. I can look at posts and any of their… I dunno how Facebook works, but featured or whatever images, for menus or current sales or what have you, no problem. Insta stops me if I try to scroll past the first screenfull of content, and doesn’t have as much info available outside of posts (most of which I can’t see)
I have avoided a place with only insta, simply because i couldn’t see anything I needed to.
Many small businesses live on a shoe-string though and the cost of developing and maintaining a website is prohibitive.
Their self administered facebook page isn't anything to write home about and, likely, generates zero business but it is free so long as they resist the temptation to boost posts and have an extra 3 people see them for only $36.
Some of the interactive stuff on old BBC election coverages still almost work to this day.
Hard to imagine that with many sites now 20 years on. It's not even that it;s impossible with the technology, it's probably closer to how writing got worse after the invention of the word processor. Every thing is managed and structured now so the freedom / bubble needed to make things good in a way that can't be easily explained is gone.
Be sure to donate some quid to the Internet Archive (archive.org) to support their efforts to preserve (not just) old content, then do your best to make local copies of anything you find of value, just in case they disappear one day. A good number of mostly technical pages I have in my bookmarks file, that grew steadily and has been moved during installations for over 20 years, now point to their latest complete backup before the said page went silent. The Internet Archive is a huge boon to everyone.
I realized I was overusing bookmarks. I now save webpages (perhaps as PDF) if it contains information I want to refer to later, such as an insightful article, technical information, a humorous bit, or the like.
Bookmarks are good only for links to things for which only the most current version is worth accessing. That’s my banking websites, a shopping site, my employer’s remote desktop system, etc.
There's also https://archivebox.io which can take your bookmarks and archive them in many ways. Unfortunately back when I tried it last time it was a big buggy, I wish there was a better solution to build a nice archive of the sites I visit more often just in case.
I save webpages as PDF because they retain the images and fonts of the original page. One issue I run into is that sticky headers/footers used on websites often obscure top/bottom text of the page when exported to PDF. This can be addressed by using UBO to remove the sticky DOM elements before saving, but it's a bit of a hassle.
Others have recommended ArchiveBox, I will recommend using any bookmarking tool that fires off a web request to the Wayback Machine to archive a page when you create the bookmark.
Not a noticeable amount in this age when even expensive SSD storage has multiple gigabytes. Even pages with multiple images just aren’t that big on a typical hard drive.
I wish the Internet Archive would split itself into two entities: one that simply archives web sites, and the other that does everything else (e.g., edgy IP testing of ebooks and video games). That way if the "other" entity gets sued into oblivion, the web sites remain. I think what the former is doing is a critical service for humankind, and I do donate, but I worry about their future.
I have run a news website since 2019. Every hour, I have a crawler look for dead links. I replace about one link a day with a link to archive.org. The funniest ones are the day after an election when all the candidate websites go blank. The saddest are the government websites that go offline from 3am to 5am every week.
I'm surprised it's not more. 2013 was long after the days of hobbyist websites of the early net, and into the time when most new sites were business driven. Given how long businesses last I'd expect many more sites to be long gone 11 years later. I guess maybe the death of a lot of community-building spaces (angelfire, Geocities, etc) probably counts for a lot of them going.
What would be particularly interesting would be to graph how long websites last for. I suspect quite a lot of the content from the early days is still around, and this period (2008 - 2018) is the peak of sites vanishing.
I hope not all things last forever. A while back I stumbled upon my first .com, from the 90s, which was hosted on Angelfire and dutifully rehosted by archive.org and it went about how you'd imagine.
Despite being in 4th grade when my little friend and I made the webpage, things on there (while fine for the era) are just not okay by today's standards even if I understand the context for what led to it being there. It was nothing terrible, but just distasteful in a blissfully unaware way a 4th grader in the 90's would be. I realize that stuff will probably never be off my conscience and I just have to deal with it and hope nobody sees it.
I have similar material. If it's reassuring, we all were just kids/teens and learning of the world. I feel a lot for the youth after us that made the Internet more accessible and, at times, more permanent.
Everything on internet is intrisically ephemeral. Embrace that instead of fighting against it. If you want to archive stuff then make offline copies. PDF/A (especially the -1 and -2 versions) is format explicitly designed for archiving and works well for static content.
I think it is bit of a shame that mirroring is not more readily built into web stack (=http/html); if you could trivially make links that included local copy (as fallback?) this linkrot would be far lesser concern. The way how for example wikipedia links everything through archive.org is bit of a hack imho
Agree. Sometimes you just experiment with something, put up a tiny website somewhere... forget about it until you decide it's no longer relevant for whatever reason and you pull the plug on it... it's not a bad thing. But it's great to have stuff like web archives though, to keep our collective memory for worthwhile content. I specially hope that accurate accounts of events gets preserved, as it was originally written, somewhere it can't be changed. That's because rewriting history seems to be a favourite these days and preserving the original accounts as things were happening can help combat this, and even if the account were not completely accurate, it can help understand the actions of contemporary actors - i.e. you may be able to understand what they thought was true at the time, even if that was later revealed to be incorrect.
I view this as a serious failing of the internet that we collectively should have done a better job of avoiding. In most cases I believe the content itself is in fact still available somewhere and it’s simply the link that broke. Some kind of two layer system like the DOI system used for libraries would be helpful for cases like that:
This is a feature, not a bug. It would be a terrible life in a world that does not forget or forgive. It’s also good that some preservation effort is necessary for worthy content: the value of it gets more appreciated.
> It would be a terrible life in a world that does not forget or forgive
This is an orthogonal concern, and arguably is mainly about privacy
> It’s also good that some preservation effort is necessary for worthy content: the value of it gets more appreciated.
This same argument seems to imply that virtually everything should be expensive. Cheap storage is bad because we don't appreciate the value of the files we store. Expensive healthcare is good because it really makes us appreciate our organs.
> worthy content
The hard part is looking into the future to determine which content will be considered worthy then. So far no human civilization has managed to figure that out. They mostly seemed to focus on preserving the image of how amazing their kings were.
Simple. Store everything. I give you an example: the clay tablets written in cuneiform discovered at Ur. They were disposed of by Sumerians, possibly because they have fulfilled their purpose and were simply thrown away. They deal with important things like commercial transactions, but also unimportant things like a personal letter or a poem. This unimportant things pretty much taught all we know about the Sumerian language: syntax, vocabulary, regional variations, etc. In archaeological terms, a refuse is a huge treasure trove, precisely because no one chose what to throw away. Everything is there. It's entirely up to the archaeologist to comb and come up with an interpretation.
>This same argument seems to imply that virtually everything should be expensive
Non-zero doesn’t mean “unaffordable” or “expensive”.
> The hard part is looking into the future…
The hardest part is to understand that the content we want to preserve carries more valuable information about us than about itself.
Scientific knowledge can be discovered again, it’s not something to worry about. The preservation shapes the future views of us, leaving the trace in the history of those who preserve, their life and their experiences. Maybe they just needed to accept their mortality and irreversible flow of time?
I don't know. Scientific knowledge is expensive to rediscover. It requires a lot of false starts and often involves a great deal of luck/randomness. Historically, periods of flourishing are often associated with the rediscovery or importing of large collections of wisdom. For example revivals of ancient works or the introduction of works from afar due to newly developed trade routes. If ideas were easy to rediscover, we shouldn't expect those events to have much impact.
There is certainly a cost of storing data, and cost should enter the equation. But we're losing a lot of data for reasons other than cost and we don't have a reasonable way of assigning a value to the lost data.
My point is, we are engaging in a very unnatural process, trying to preserve something against the second law of thermodynamics. We are going to lose the data and things are going to break no matter what. We cannot change the nature, but we can accept it.
> We are going to lose the data and things are going to break no matter what. We cannot change the nature, but we can accept it.
If man was content with the nature of things he would never fly, or go to the moon, or any of the other myriad accomplishments humanity has made. If we can preserve clay tablets from thousands of years ago we can find some way to keep the information we produce today for posterity.
I’m not arguing with that. We can and we will preserve some information. We must not be obsessed with saving everything. Imagine the humankind in a million years from now; if we pick 10 most important facts about each century, that will be a hundred thousand facts to remember. Maybe we develop abilities to know and use them all, but from a modern human perspective that’s already too much. Now, can you reduce the XXI century to 10 facts? How much of those zettabytes of information would be worth keeping for a million years? For ten thousand years? For a thousand years?
I can't imagine an archaeologist or historian studying e.g. Pierre de Fermat ever saying, "Good thing we have so few documents and artifacts! That way we know what was really important to him."
What's the proof for Fermat's Last Theorem again? Doesn't matter, it was just a footnote anyway so let's not bother preserving it. It doesn't matter that it took our smartest minds 358 years of trail and error to rediscover the proof. It can always be discovered again.
We don’t know if the proof did exist. The complexity of the one we have is an indication that he did not tell the truth, whether that was a genuine mistake, a joke or something else. For sure it wasn’t the longest time gap between a problem statement and a solution.
Yes, rebuilding civilization from scratch would be a difficult task, taking centuries if not millennia, if no knowledge is preserved. However we do preserve it and do spend considerable effort, what cannot be said about our culture and individual experiences.
Maybe the complexity of the one we have is an indication that we haven't rediscovered it. We discovered a worse one. Maybe it will take our smartest minds another 350 years to rediscover his. If preserving data was easier in the 1600s we could just grep through Fermat's hard drive and we would know for sure!
Let's keep making preservation easier, and preserving as much as we can. Maybe much of it is worthless, but I guarantee there's at least one document we think is worthless today, that historians 500 years from now will be glad we preserved anyway.
Where do you think we have lived the last 5 thousand years? We have clay tablets written in cuneiform that were excavated in Ur from a refuse, and thanks to those we know the little we know about Sumer. The invention of writing made the exercise of forgetting impossible. This has been thoroughly studied by anthropologists like Jack Goody, James Carey, David Oslon, Barry Powell and some other writers like Walter Ong. We live in fact in a terrible world that is mostly trap in the past, where cultural complexity grows in onion layers. Anyone can go back to the past and yearn for it. We can always go back to the past through our stored knowledge, but that past will mean different things to different people as they have not experienced it. Since the invention of the printing press we have lived in a constant state of information inflation. Middle Ages scholars used to complain that with the printing press anyone could read and write books, scholastics were scandalised by the rise in the vernaculars, Michelangelo complained about Flemish painters and their vacuous form of art and so on.
What is worth mentioning here is the rate at which decay is occurring. The articles mentions that 38% of sites that existed in 2013 are no more; that's a decade. How much of that is noise and how much of that is useful information, or at the very least "interesting" content, we don't know. It's gone. How much of that info has been saved by the large web scrappers, or how much is stored by google or twitter is also unknown to us.
What do you define worthy content? A tweet with a million views even though is just an actress semi-naked? A tweet with 300 views about breaking discovery? We celebrated like there was no tomorrow when the internet brought down the gatekeepers, those newspaper, books, magazines, tv and radio editors; just to get swamped in noise, conspiracy theories, memes, tik tok and so on. The problem is that we can't barely cope with the huge amounts of information that is thrown at us and we are too many, with too different tastes to even agree what's worth and what's not.
The "feature" as you've called it, may be by design, but it doesn't mean is useful or morally correct.
It is highly subjective. I’m very curious about the past, but I don’t care if nobody will know the name of Newton or Mandela in 10000 years, but some YouTube blogger will somehow be a legend.
> How can enthropy be morally correct or incorrect?
You said the disappearance of content was a feature, not a bug. If it is a feature it was designed. I understood your comment as implying that somebody created this feature.
You now speak about entropy, which one? Boltzmann's or Shanon? This doesn't have to do anything with bit rot or the like. When you write a book, you cannot unwrite it. Is a fait accompli. But if I create a website and I load a bunch of content and after few years I don't pay domain, server space, etc. it will be deleted; at that point a webcrawler may have had copied all the info, or not, we do not know, or somebody may have thought it was worth it and saved a link to it. At a fundamental level, who decides what stays and what is deleted, who owns it if the webcrawler stored it without your permission? These are all moral questions, not technical ones...
> You said the disappearance of content was a feature, not a bug. If it is a feature it was designed.
It wasn’t designed, it’s just a very common metaphor about the perception of things rather than the way they came to life.
It means that we should embrace it instead of trying to fix it.
> You now speak about entropy, which one?
Boltzmann. A system where information is preserved forever through the arrangement of energy states is highly improbable, so regardless of individual moral choices and the effort it will fall apart by the laws of nature.
The predominant sentiment I'm seeing in comments here is a 'what does it matter?' While at the same time in discussions here about search engine result quality inevitably the most popular comments express a decline in quality (influenced by SEO, AI, spam, among other things), along with a desire for surfacing interesting, human made content that gets buried by modern search algorithms.
On HN we see every day interesting, first page content marked with a title representing its year, sometimes dating back a decade or more. Its age isn't apparently a detracting aspect to this audience if the content is still worth sharing.
And from the article the headline figure also doesn't represent irrelevant/undesirable content either but Wikipedia references, news articles, government pages, along with less unexpectedly ephemeral things like Twitter posts.
We're lucky there is archive.org but since it's not indexed like a regular search engine the only tether to old pages are still-live links found via regular search engines/sites (including HN). Essentially unless sites continue to exist that contain links to archived content the chances of future discovery becomes slim.
My stance is if you find something interesting that you expect is worthwhile sharing try saving it in the most convenient single file page format available to you (MHTML, SingleFile, PDF), to have your own copy. For MHTML at least it also saves the original page URL in its metadata. Saving to online archives is also great but admittedly higher friction (and can sometimes result in things like IP restrictions on archive.org even when saving just a handful of pages in a row, ime).
This is because the majority of webpages are on commercial (for profit) sites now and for-profit companies do not build anything that lasts. Part of this is their use case requirements: CA TLS has to be the only way to access the sites. And since CA TLS is extremely fragile and short lived, so are the sites. But additionally, any dynamic site also has a short lifetime. HTML files-in-folders sites will last till the heat death of the universe unmaintained. A php or nodejs/etc site will last a year or two after it stops being mantained.
The core of the web pages on the internet are still there. It's just that the thick layer of commercial cr'app' websites built up on top are transient.
From the point of actually trying to keep something online, it's basically constantly having to fight against code rot.
You don't update your server or database or runtime/framework/library? You'll get hacked and will drown in CVEs. You do try to do these updates? Have fun rewriting bits of your code, because the old version of a framework/library is no longer supported and there are breaking changes, which mean needing a partial rewrite.
Your best bet around that might be one of the relatively stable databases like SQLite, a micro-framework on the back end for a RESTful API, a simple solution for auth like basicauth/mTLS/... at the web server level in front of your API and then something without a toolchain on the front end, like jQuery. I mean this unironically, unless you want to maintain very few sites, then you probably have a bit more time on your hands.
Feels to me like the only content that can have any sort of longevity without constant investment of time is static sites - where updating your web server or moving to a different one is trivial and there are no write operations involved in most of the processes (maybe setup logrotate or just delete the logs occasionally).
It's an issue of servers, not JS. A BackboneJS app written in CoffeeScript back in 2010 would still run just fine today if it was hosted on S3/CloudFront. Replace the framework/language/year with anything, and it'd still hold true.
But if the page needed to fetch data from an unmaintained API server that ran out of disk space, lost its DB network connection, got rebooted by a VPS provider, or any other issue, that site will probably never work again.
Yeah, its mostly a matter of database/server exploits. My personal wordpress site got white-hacked; left a friendly note telling me to update. I switched to static instead.
HTML benefits from the fact that browsers have always bent over backwards to make sense of whatever freak noise they're fed. Try the same with your average compiler/interpreter and it'll send you right back to your desk with a list of admonishments. Trouble is, those admonishments change, and require updates. The HTML inside a PHP program from 2005 probably works, but the PHP probably doesn't.
Some sites that I (and some coworkers) wrote for were basically frozen fairly recently. I told everyone they should make their own copies of anything they care about because I’d put money on a few years from now the sites needing some maintenance work to keep them secure etc. At that point I’m pretty sure that the person responsible wouldn’t spend the resources at to patch things up and will just shut off the sites.
What does it have to do with that software being free? I haven't found proprietary software to be more reliable or easier to maintain integration with.
Have fun rewriting bits of your code, because the old version of a framework/library is no longer supported and there are breaking changes, which mean needing a partial rewrite.
You don't get to choose if a third-party decides to rewrite an API interface, deprecate an entire library, etc.
You know that proprietary code has a cost because you pay for it. "Free" software is added to projects without much thought of what happens down the road.
There was a really nice music school/venue/coffee shop in the town I used to live in. It shut down, and they took their Facebook and Instagram pages down. The only evidence it ever existed is in posts announcing events on other Facebook pages, in memories of people who went there, and on still surviving business listings that are likely to go away.
>> "Server information sidebar: This site is being served from an Ubuntu box with 2GB of RAM. The server is currently provided by several people"
This is either added since his death, and it's maintained by supporters, or it was there to begin with and one of the several people took over maintenance and funding.
But that still leaves the question of who. Maybe they want to remain anonymous.
While I am aware that nothing is permanent and everything takes maintenance, I sure feel
like there ought to be a bit _more_ permanent way to publish things.
I run a few daily word games. They're static sites that could continue forever as long as they have players, but sometimes I think about what would happen if I died or stopped paying attention. Domains would expire. Maybe some people could still play cached versions, and archive site versions would still exist.
I host them on github pages so if I exposed that url then that could probably exist for as long as github does - or breaks something that requires the owner to click a button to fix or something.
I could probably publish a free desktop version on various stores, and that might last as long as the store, or OS upgrades break it.
It would be great if there was a way to publish something such that it exists as long as anyone cares about it.
> It should be a standard for the open internet, that any respectable page has an "export complete archives" as a clickable button. (But it's mainly the opposite, today: adtech corporations don't want your written works to be free, they want them incarcerated in their revenue-generating walled gardens—and if ten billion human's worth of written history gets erased at the end, well, too bad for them!) [0]
Can't you export your data from social media platforms? I haven't tried it, but did Google this for Facebook[1] and it looks as though you can?
I remember people saying that once it is on the web it is there forever. But my experience is how fast web sites disappear. Or get redirected to a nonsensical site in the orient. I think maintaining the domain name is the big problem.
However I am proud to say that you can still see my very first published web page from 1995 if you know the rather obscure url... http://admin.benwillies.com/ticker/
I wrote this page as a proof of concept for a friend of mine who was a financial consultant but unfortunately the humor was a turn-off instead of getting him excited about this new fangled thing called the Web.
If one looks at the graph, 20% of sites were dead in 3 years.
This isn't a facebook thing, most facebook migration happened a decade ago. Instead this is companies closing, campaign websites shuttering, urls being changed, community events being over, and so on.
I've been thinking about this for a couple years. With our (recent) history becoming more and more digital we are losing more of our history. There's lot of creativity of all forms being digital and online only lasting only as long as the creator supports it. No longer are we viewing physical pieces art, but ephemeral pieces art. (I'm including websites made for personal use, by hobbyists, etc).
I know Archive.org archives a lot, but they can't archive everything, especially all the small personal websites.
A natural consequence of federation. You could attempt to archive everything and host it. But ultimately, each host is responsible for what it hosts and keeping itself available.
That is what I like about federation. All the incentives are individualized. I think the scale that was needed for business in the last century has disappeared. Businesses previous needed scale and reach but for an individual to make a decent living in a niche is very easy today.
> Nearly one-in-five tweets are no longer publicly visible on the site just months after being posted. In 60% of these cases, the account that originally posted the tweet was made private, suspended or deleted entirely. In the other 40%, the account holder deleted the individual tweet, but the account itself still existed
this seems extraordinarily high. since they cant do that estimation on all tweets ever sent it must be a bias of their sample
I blogged from 2000 onward but hit my stride with Wordpress in 2004. I posted thousands of times. I had a Google PR on 8 at one point but it tended to be 6/7.
In 2020 I decided to shutter the site.
The database with the content is still there, it's just inaccessible.
Google, Bing, Archive.org and others I contacted and had them remove my content.
Then I removed everything visible.
Very very happy I did, especially with the rise of AI.
Yeah, I've noticed this - I made a funny little art project in 2013 wordclouding parodies of This is Just to Say (and later other famous poems and literary quotes that often get parodied) because wordclouding is the least interesting kind of visualization there is:
and quite a lot of the content I used is no longer found, so all that remains is the word cloud but not the original poem.
It's a bit worse than the article details though because often the web page is there, or if not is available from IA, but much of the content was actually inside of comments and so forth.
(project was originally inspired because back then you couldn't go anywhere on the web without someone hitting you over the head with their This is Just to Say parodies)
Honestly, seeing startups die and people abandoning their blogs(of which I'm also guilty, despite my best intentions, I just happen to pay my domains for 10 years at a time), I would have expected the number to be much higher.
I post my site content Markdown to an open Git repo for this reason. Anyone can pull and build my pages. I think Git should stay for at least another 100 years. https://github.com/hatdropper1977/john.soban.ski
Attention is limited. We cannot see everything on the Internet. We do not have enough time for that.
There is a lot of valuable and interesting data on the Internet, but it is not visible. Certainly high quality, low profile blog that ended its development in 2015 will not be ranked high in Google.
Media platform, search engines monetize content. YouTube channels need to churn new content every week or so to stay relevant and to stay watchable.
Our society produces content, not quality, not products.
SEO can be gamed, it is impossible to create objective index of valuable content. Bad actors will hack the game, spam results, destroy quality to gain profit.
Google search engine most often connects users with media sites, with news sites, with the middle men. The more often not connect users with product directly. Write "search engine" in search query, you may not only find search a "search engines" but articles about "Best search engines in 2024", or "best SEO tricks to boost your page".
Google does not have any incentive to fix this. Search engines are dead tech. It will be replaced by chatbots in a few years. People will not search for content, content will be generated at wish.
I wanted to find "wargames" related pages. It is quite impossible to find anything interesting concerning warhammer on the normie internet (not Facebook).
The second thing is I cannot find anything "amiga" related.
This solved this my initial problem. I have also found out that many interesting pages are gone. I think that Google directing our attention toward "content" broke good quality pages.
Right now I am using less and less google, because I use more and more my bookmark manager.
My solutions may not be as complex as common crawl, but they are enough for me. For now. I am still working on my program. It has been fun and interesting experience for me, and I learned a lot. About open graph protocol, about schema, about web scraping, etc. etc. Maybe this will inspire people to be more self sufficient, and more self-hostable.
In times of walled gardens we need more standard, and more open data to keep what remains of the old wild west of the Internet.
are we saying we're going to keep making hard drives so we can save everything ever produced? I see the value in many things but I worry about the load of expecting everything to be saved forever.
In some cases, a lot of valuable information that doesn't exist anywhere else. A big German immigration forum vanished this winter. There was a lot of valuable information for people navigating tricky bureaucratic processes.
Im am not a lawyer but as far as I know ,it depends on you jurisdiction and the exact code in question.
Many legal systems don't know "fair use" and by default you have effectively zero rights to do anything with copyrighted materials without explicit permission.
The license will tell you what is allowed (and If its a standard one you can assume it is in accordance with the law)
You can take small snippets, but not the whole page. But why do you want to release someone else work publicly!? Just save it on your own device if you find it useful.
more likely to find that nonprofit educational and noncommercial uses are fair.
less likely to support a claim of a fair use than using a factual work (such as a technical article or news item).
if the use employs only a small amount of copyrighted material, fair use is more likely.
And since code tends to be more idea than expression most of it can be considered to not fall under copyright after the application of the Abstraction, Filtration, Comparison doctrine.
Of course if you piss off an entity with a bunch of money to throw at lawyers it could be a bigger issue, regardless if you’re in the right, because defending yourself can rack up legal fees.
I do copyright/patent/trade secret inspections of source code for a living.
EDIT: Yet again, downvoted for just stating the truth... The irony is that 10 years ago and before the controversies around LLMs this comment would not have garnered negative attention because the forum was all for weaker copyrights... when copyright affected musicians instead of programmers' bottom lines... Sigh...
Fair use is an affirmative defense: you only get to pull it out after they sue you, and your claim is judged on a case by case basis. It’s not the magic spell of protection people seem to think it is.
Yup, I know that, but I encourage people to not be afraid of republishing little bits of code for educational or archival purposes. They should know they do have fair use to lean on if they do end up in the incredibly unlikely situation of ending up in court.. unless of course they're painting a very large target on their backs and making quite a bit of money from the IP of a large corporation!
The entire world benefits from sites like the Internet Archive and the commons-friendly approach of fair use. I recommend changing the laws in your country!
EDIT: I did a little bit of investigation and there are similar limits on copyrights in Germany known as "limitations on protected rights" that seem to carve out things like educational use and archiving, but I don't know anything about German law. I would find it surprising if most Western nations didn't have something similar to fair use unless there was an active interest in damaging the public's access to information.
As many have stated, I would assume more than 38%. But good quality content is rare and the dynamic content made page combinations go infinite 20 years ago.
We are maintaining 25y old urls is a bit ehh. cumbersome, and I sometime wonder if its worth it. Most of the traffic seem to come from bots and they do seem to learn some of the 301's. It seems to be good for SEO, etc.
Some users also gets redirected to the content on the new urls. It feels a bit like helping an elderly person over the street to where the shop is.
Anyway, I hope that bots and humans trust our services more.