Totally, I think there's a lot of retroactive justification for what's familiar whether it be microservice or monolith. They both have advantages and disadvantages -- we're at a point where deploying and running either is available on most clouds.
That said, I think interesting possibilities exist where the two approaches are blended. I work for a cloud [1] that supports a concept of "services" grouped within an app each of those services is a web server that can be configured to sleep [2] under specific conditions -- effectively serverless but without the loss of shared memory and all the efficiencies of running a more traditional web server.
The grouping of services also provides a way to spin off parts of an application while keeping it within the same deployment process.
I don't know of an internal feedback tool, that's not more general purpose. I think a lot of what you're describing is the-practice-of-software-development. There are Lots of possible solutions, what works best for _your_ team will be somewhat unique to them.
The company I work for [0] provides application hosting which includes automatic preview URLs, that can also be associated with a Github PR. The PR will have a link to the environment, with discussion below. Subsequent releases to the environment also get registered with the PR so it's easy to catalogue and annotate developments/fixes.
As others have said, it depends a lot on the product. A lot of the people on PH are other people building stuff, if it makes sense to launch there, join some PH communities on LI. The more you're already engaged on PH, I think the better off you'll be.
I've found it's harder to read signals from ads for early stage releases. You need something to compare to... Then again, maybe a super narrow target with a very clear conversion might yield some useful engagement.
One of the most difficult challenges with incidents is dispelling the initial conjecture. Something bad happens and a lot of theories flood the discussion. Engineers work to prove or disprove those theories, but the story about one might take on a life of its own outside the dev team. What then ends up happening is post-incident there's a lot of work to not only show that the problem was the result of XYZ, but also it definitely wasn't the result of ABC.
I was responsible for wsj.com for a few years. The homepage, articles and section fronts were considered dial-tone services (cannot under any circumstances go down). My job was to lead the transition from the on-prem site to the redesigned cloud site. As you can imagine there were a few hiccups along that journey.
One particular incident we encountered was when reporters broke the news of a few unrelated industry computer system failures (including finance). Because it was about a financial system, people visited wsj, the spike in traffic was so large it knocked us out. Now other news outlets were reporting wsj down. Unfortunately, there was a perception that these incidents were all related by a coordinated hacking event.
Each minute the site had an interruption of service, I would need to spend hours post-incident making sure the causes were understood, verified and stakeholders knew what they were.
All in all, the on-call experiences were fine. Sure people were tired if they happened in them middle of the night, but the team was supportive and there was a culture of direct problem solving that didn't add _extra_ stress.
Noop [0] is a cloud platform that runs entire application ecosystems (including edge routing) locally and deploys the same setups globally -- it's a departure from the plug and play paradigm of kubernetes, but it means a lot less integration work. Full disclosure, I work @Noop.
Heroku and Reclaim are far from the only two options available. The appropriate choice depends entirely on the team's available expertise and the demands of the applications under development.
There's a lot of disagreements pitting one solution against another. Even if one hosting solution were better than another, the problem is there are SO MANY solutions that exist on so many axis of tradeoffs, it's determine an appropriate solution (heroku, reclaim, etc) without consideration to its application and context of use.
Heroku has all sorts of issues: super expensive, limited functionality, but if it happens to be what a developer team knows and works for their needs, heroku could save them lots of money even considering the high cost.
The same is true for reclaim. _If_ you're familiar with all of the tooling, you could host an application with more functionality for less money than heroku.
The company [0] I work for gave a talk on this [1]. We're going a bit beyond analyzing logs because we have more contextual information about running software so we can compare different application state over time and infer whether it appears the application is experiencing an "incident".
To me, it's about finding the right level of abstraction. At some point we developers have to trust that the underlying system will do what it says it's going to do.
There is a difference between knowing that an infra component will work as intended and how the choice of that component will impact the application.
Consider storage... is it local SSD storage or an object bucket... is it used as a scratch space, caching, or for blob storage
To add to GP, not only will bad or messy architecture arise, but I've seen serious bills occur because devs make a decision to use something without understanding the impact. Architected a different way would have saved the company more than the cost of one FTE
A counter point is that being a professional software developer is more than shipping features, you have to ship something that is profitable in the long run, which requires you have a broader perspective than your desire for a narrow focus.
> A counter point is that being a professional software developer is more than shipping features, you have to ship something that is profitable in the long run, which requires you have a broader perspective than your desire for a narrow focus.
I think what's more common, particularly in mature companies, is that the responsibilities are getting split between those that manage the complexity of infrastructure and software developers. Software devs might dictate the infrastructure requirements in their own terms like systemA should have access to systemB, but they're not going to be setting up the Amazon security group.
I guess what I'm wondering is more about the opposite of the "cool/neat" where orgs either prohibit development in non-standard langs (and that was good/bad) or allow development in non-standard langs (and that was good/bad).
What criteria is used to determine when it's ok to deviate from in-house langs? What justification is used to prevent deviation at all costs?
Introducing a new technology - a new language, or new database (like MongoDB), or a new framework is a VERY expensive proposition.
You need to solve all of the following:
- Deployment. How will you run the new thing in production (and QA and staging)
- Upgrades. How will you upgrade to new versions in the future? Who will be responsible for that?
- For databases, backups. How will you backup data? How will you make those backups available for things like data warehouse reporting or replication to teat environments?
- Monitoring. How will you monitor systems built with the new technology, handle alerts etc? How will logging work?
- Profiling and debugging: what tools will you use for this, particularly in production?
- Testing: how will you run the new tech in your CI environments, and your local development environments?
- Development environments: how will you ensure all of your engineers can productively develop with the new platform?
- Expertise and education: who on your team will be the experts to help support the new tech and onboard your other engineers?
- Standards: what common patterns, idioms, styleguides etc will you adopt? Who will make decisions about these, and how will they be enforced?
A productive engineering environment should have widespread, well understood answers to each of these questions for every technology in their stack. The smaller the "approved stack" the easier it is to do that.
As a result of all of this, I think the criteria for introducing something new is to ask if the cost of all of this is justified by the expected improvement provided by the new tool.
That usually means it needs to either make some feature posible that was impossible without it, or it needs to provide a multiple productivity improvement - not just a 1.5x, probably a 3x or higher.
At my job we have a set of pre-approved languages and frameworks. If you want to do something outside of it we have an architecture review board you can pitch your idea to, but you better have a good explanation why the existing tools don’t solve the problem and why your suggestion does.
That said, I think interesting possibilities exist where the two approaches are blended. I work for a cloud [1] that supports a concept of "services" grouped within an app each of those services is a web server that can be configured to sleep [2] under specific conditions -- effectively serverless but without the loss of shared memory and all the efficiencies of running a more traditional web server.
The grouping of services also provides a way to spin off parts of an application while keeping it within the same deployment process.
1. https://noop.dev
2. https://noop.dev/docs/sleepy-services/