Thank you! We're not hiring just yet, but feel free to log onto the Hamilton OS slack and send us your resume -- happy to chat about future possibilities or connect you with people that are hiring in a similar space!
If anyone else decides to give this a try on video files with multiple audio tracks, there doesn't seem to be an easy way to tell it to select a certain track.
I got it working by manually adding `-map 0:2` (`2` being the trackid I'm interested in) when calling ffmpeg.
You'll have to make that edit in both `videogrep/transcribe.py` as well as `moviepy/audio/io/readers.py`.
And I'm not sure how easy adding real support for that would be, considering that moviepy doesn't currently have a way to support it (https://github.com/Zulko/moviepy/issues/1654)
I just started playing around with the transcription part after seeing this blog post. Consider giving it a try.
I'm not sure how well most subtitle sources will work with this. I don't think they'll generally embed the word timings needed for picking out fragments (just line timings). The blog post mentions it being the case for `.srt` specifically. Not 100% sure, someone with better understanding of the subtitle formats would be able to correct me.
FWIW I'm finding the video transcription to be working quite well (and I even decided to use Japanese-speaking media because I wanted to see how well vosk handles it).
It might be my system, but the transcription is unfortunately a bit slow/single threaded. I quickly added a GNU `parallel` in front of the transcription step to speed up processing an entire season.
I hope the subtitles website I am searching for will provide multiple formats and I understand a lot more effort would be required to produce the .vtt with word fragments. running a diff on vosk text and subtitle file text might help to iron out ambiguities
I've been following Arrow and Datafusion dev for a little bit, mostly because the architecture and goals look interesting.
What I'd be curious about is one of the possible use cases mentioned in the Readme: ETL processes. I have yet to come across any projects that are building ETL/ELT/pipeline tools that leverage Datafusion. Might not be looking in the right places.
Would anyone have insight into whether this is simply unexplored territory, or just not as good of a fit as other use cases?
I have done a lot of work in the ETL space in Apache Spark to build Arc (https://arc.tripl.ai/) and have ported a lot of the basic functionality of Arc to Datafusion as a proof-of-concept. The appeal to me of the Apache Spark and Datafusion engines is the ability to a) seperate compute and storage b) express transformation logic in SQL.
Performance: From those early experiments Datafusion would frequently finish processing an entire job _before_ the SparkContext could be started - even on a local Spark instance. Obviously this is at smaller data sizes but in my experience a lot of ETL is about repeatable processes not necessarily huge datasets.
Compatibility: Those experiments were done a few months ago and the SQL compatibility of the Datafusion engine has improved extremely rapidly (WINDOW functions were recently added). There is still some missing SQL functionality (for example to run all the TPC-H queries https://github.com/apache/arrow-datafusion/tree/master/bench...) but it is moving quickly.
I spent some time evaluating Arc for my team's ETL purposes and I was really impressed. I hesitated somewhat to move forward with it because it seemed really tied into the Spark ecosystem (for great reasons). We just weren't at all familiar with deploying and operating Spark, so ended up rolling our own scripts on top of (an existing) Airflow cluster for now.
Besides performance reasons, are there any other advantages to porting Arc to run on top of datafusion? If the porting effort was shared somewhere I'd love to dig in and see what the proof-of-concept looks like.
Hi eduren. Give me a few days and Ill see what i can publish as a WIP repo. The aim of Arc was to always allow swapping the execution engine whilst retaining the logic - hence SQL -so this should hopefully be easy.
Rust stuff tends to be a bit more resource efficient than Java.
Currently using DataFusion from Rust, and being more resource efficient means we can use smaller machines, which means our costs go down. Deploying services is also faster (smaller docker images, faster startup times) and puts less extraneous load on our machines.
I imagine Arc, and thus downstream users, would see similar benefits.
ETL pipeline is a perfect fit for Datafusion and its distributed version Ballista. Personally, this is the main reason I am investing my time into Datafusion.
Amazon Braket – A fully managed service that allows scientists, researchers, and developers to begin experimenting with computers from multiple quantum hardware providers in a single place. Bra-ket notation is commonly used to denote quantum mechanical states, and inspired the name of the service.
I was thinking that this name would make hallway conversations tougher (no, it's bracket with the "c"), but I'm guessing (and it's just a guess, I know nothing about this field) that the people actually interested in this service know what braket means.
The easiest way to get people to buy less is to raise prices.
If we had a carbon tax that correctly priced the environmental impact of goods, it would decrease consumption. Without having to shame people into removing themselves from the economy.
Yes. I've become convinced that Pigovian taxes [0] (connected to a basic dividend) are the answer to climate change, and to ecological externalities in general. (In addition to greasing the wheels of political viability, a dividend ensures that paying the true cost of carbon is not a de-facto regressive tax, as that cost hits the working class the hardest.)
"Increasing the retail price of tobacco products through
higher taxes is the single most effective way to decrease
consumption and encourage tobacco users to quit." [0]
Carbon taxes could help, but end-of-life taxes on the producer would be more effective at actually driving change. The consumer doesn't have the decision making power to choose lower-impact materials in products or packaging they buy, but the producer does. If we tax them on the disposal cost and other negative externalities resulting from the use and EOL of their products those producers will likely choose different materials.
Raising prices is effective but can have serious side-effects.
E.g. raising gas prices will dramatically hurt citizens living on the country side or outside cities without public transport, and force them to move into the cities, which in turn cause higher demand on housing and rent increase.
Isn’t that the point though, to change behavior? People who live in the countryside and work in the city have only been able to do that because of improperly priced fuel that enables them to do so. Adjusting the price of fuel to reflect the true cost would drive the change in behavior that we need to have. You can’t expect things to change without making actual changes, and it will of course require a transition period as people adjust.
But if you live and work on the country side, higher gas prices will crush you because it impacts not only you personally but the school transport for the kids, the transporting wares from/to rural supermarket, for infrastructure maintenance/development, agriculture/farmers etc etc
There are so many little things that people take for granted that is will have an impact on those living on the country side.
And in the end you just shift the problem to the cities, where the influx of more people cause rising housing costs, higher unemployment, more miserable people and higher prices on foodstuffs because farmers give up.
Living rural areas and farmers are a very important factor for a happy nation IMHO.
The post I replied to specifically mentions people who live in the country who would then have to move closer to cities. That is the profile of a commuter with a city job, not a farmer. Farmers don’t have to commute into the city every day, and thus are not the people who are being discussed here.
Yes, obviously higher fuel prices would affect those who need to drive further due to longer distances between things (i.e. the countryside), but it will also encourage those who only do so by choice to make different choices, which is what we desperately need.
Raising prices does this but the side effect could be putting individuals that really need a product in hard spot. Creating laws on single use non-biodegradable material and/or the amount of it could be beneficial. This will increase prices slightly on those goods.
Carbon taxes should be levied in such a way that most people who are buying the basic necessities would actually see a growth in their income.
Lower middle class should see no change. Middle class should see a net loss if they don't change their habits. And anything beyond would see a substantial loss.
Give each person a carbon ration, if you didn't use it fully you get money back. If you used more than your fair share you have to pay significantly large taxes that go directly to the pockets of people who use less and infrastructure.
"Give each person a carbon ration" agreed. Achieving this with each individual could be the hard part where as enforcing a company to "behave" and/or limit consumers from buying excessively could be quicker win that achieves the same goal.
Pricing a good out of someone's budget range seems entirely like removing them from the economy. What's more, your tax will hit the most economically vulnerable people in society the hardest. It's hard for me to believe people won't feel ashamed when the things they enjoy are suddenly beyond their reach.
>Without having to shame people into removing themselves from the economy.
No, you'll just remove them from the economy without their consent, by introducing regulation to artificially lower supply. Everyone is against this: the companies who won't make as much profit and the consumers who won't be able to purchase the goods that they want. Good luck with that.
>No, you'll just remove them from the economy without their consent, by introducing regulation to artificially lower supply
A few things:
1. Presumably any carbon tax would have to be secured and defended by our democratic institutions. Thus we would have consent (or as close as you can get to large scale consent in our multi-actor society). While I agree that regulating basic consumption for large swaths of the economy has a bit of an authoritarian bend to it, I'm not sure how else we incentivize ourselves to decrease consumption.
2. Lowered supply is not a given. Companies would be incentivized to find production chains, energy sources, and materials that had a lower impact (and thus a lower tax). Less impactful products would be able to price themselves under the high-impact products and satiate the demand.
EDIT Added 3. Consumption itself is not the enemy. The thing we want to minimize is negative externalities. It just so happens that under our current system, manipulating levels of consumption is the only lever our society has for affecting industrial emissions.
A carbon tax will shift shift purchases to government. So unless the goal is to have one group buying less, a carbon tax won’t matter too much. Since any drops in consumption will be offset by using that new tax revenue to do and buy stuff.
If we want to buy less we need to shrink the economy, including government spend.
Personally, I’m just trying to build more things and gather more things myself. Buys less and saves money.
The money collected from the carbon tax would have to be earmarked for things that improve our ecological situation: carbon sequestration, replanting forests, buying and protecting land, etc...
I've got recent experience with data eng / pipleine startups and wondering if you are hiring for your first engineers at this time.