The first two demo videos are interesting examples of using StyleCLIP's global directions to guide an image toward a "smiling face" as noted in that paper with smooth interpolation: https://github.com/orpatashnik/StyleCLIP
I had ran a few chaotic experiments with StyleCLIP a few months ago which would work very well with smooth interpolation: https://minimaxir.com/2021/04/styleclip/
The previous approaches learned screen-space-textures for different features and a feature mask to compose them.
Now it seems to actually learn the topology lines of the human face [0], as 3D artists would learn them [1] when they study anatomy. It also uses quad grids and even places the edge loops and poles in similar places.
There are some interesting 2d things our eyes do for 3d. If something is on the ground, half is above the horizon and half is below. Parallax is a 2d phenomenon.
After styleGAN-2 came out, I couldn't image what improvements could be made over it. This work is truly impressive.
The comparisons are illuminative: StyleGAN2's mapping of texture to specific pixel location looks very similar to poorly implemented video-game textures. Perhaps future GAN improvements could come from tricks used in non-AI graphic development.
I've noticed the same thing with ESRGAN -- teeth are always awful. I'm looking forward to the day when someone figured out how to fix that; I have a few sentimental images taken with a cell phone I would love to see upscaled and cleaned up.
If ReLU-introduced high frequency components are indeed the culprit, won't using "softened" ReLU (without discontinuity in the derivative at 0) everywhere solve the problem, too?
I wonder if you could make the noise inputs work again by using the same process as for the latent code - generate the noise in the frequency domain, and apply the same shift and careful downsampling. If you apply the same shift to the noise as to the latent code, then maybe the whole thing will still be equivariant? In other words, it seems like the problem with the per-pixel noise inputs is that they stay stationary while the latent is shifted, so just shift them also!
I wonder if there are learnings from this that could be transposed into the 1-D domain for audio; as far as I know, aliasing is a frequent challenge when using deep learning methods for audio (e.g. simulating non-linear circuits for guitar amps).
You can see what they're saying about the fixed in place features with the beards in the first video, but StyleGAN gets the teeth symmetry right whereas this work seems to have trouble with it. Why don't the teeth in the StyleGAN slide around like the beard does?
That's likely the GANSpace/SeFa part of the manipulation.
> In a further test we created two example cinemagraphs that mimic small-scale head movement and
facial animation in FFHQ. The geometric head motion was generated as a random latent space
walk along hand-picked directions from GANSpace [24] and SeFa [50]. The changes in expression
were realized by applying the “global directions” method of StyleCLIP [45], using the prompts
“angry face”, “laughing face”, “kissing face”, “sad face”, “singing face”, and “surprised face”. The
differences between StyleGAN2 and Alias-Free GAN are again very prominent, with the former
displaying jarring sticking of facial hair and skin texture, even under subtle movements
That's starting to be high enough quality that you could start considering using that for some Hollywood-grade special effects. That beach morph stuff is pretty impressive. Faces, perhaps not quite there yet because we are so hyper-focused on those biologically, but you could make one heck of a drug trip scene or a Doctor Strange-esque scene with much less effort with some of those techniques, effort perhaps even getting down to the range of Youtuber videos in the near enough future.
First, that's not the same technique and it's not being used for the same purpose.
Second, Hollywood doesn't care about that problem. They will take the best application of the technique, and they don't care if they have to apply a few manual touchups on the result. As long as there is one way of using the system to do the sort of thing they showed in the sample, it won't matter to them that they can't embed a full video game into the neural network itself. They only care about the happy path of the tech.
Someone's probably already starting the company now to use this in special effects, or putting someone on research in an existing company.
> Second, Hollywood doesn't care about that problem.
Hmm, I wasn't trying to nay-say anything here. I mostly agree with your original comment.
See also how in the Gan Theft Auto they are sort-of getting the light reflection for free without having to explicitly teach the network about that parts of physics.
This group of researchers consistently demonstrates a degree of empirical rigor that is unmatched across any other ML lab in industry or academia - remarkable empirical results as always, reproducible experiments, open-source and well-engineered codebase, and valuable insights about low-level learning dynamics and high-level emergent artifacts. Applied ML wouldn't have such a bad rap if more researchers held themselves to similar standards.
You, and everyone like you, who are gushing with praise and hypnotized by pretty images and a nice-looking pdf, are doing damage by saying that this is correct and normal.
The thing that's useful to me, first and foremost, is a model. Code alone isn't useful.
Code, however, is the recipe to create the model. It might take 400 hours on a V100, and it might not actually result in the model being created, but it slightly helps me.
There is no code here.
Do you think that the pdf is helpful? Yeah, maybe. But I'm starting to suspect that the pdf is in fact a tech demo for nVidia, not a scientific contribution whose purpose is to be helpful to people like me.
Okay? Model first. Code second. Paper third.
Every time a tech demo like this comes out, I'd like you to check that those things exist, in that order. If it doesn't, it's not reproducible science. It's a tech demo.
I need to write something about this somewhere, because a large number of people seem to be caught in this spell. You're definitely not alone, and I'm sorry for sounding like I was singling you out. I just loaded up the comment section, saw your comment, thought "Oh, awesome!" clicked through, and went "Oh no..."
> I go to the github. Maybe model download link is there. I see zero code
Paper was released today. Chill. They said they will release the code in September (I'm guessing late September). The paper is also a pre-print. They're probably aiming for CVPR and don't want to get scooped.
> Model first. Code second. Paper third.
That's how you produce ML code and documentation but that is not how you release it. I guarantee you that they are still tuning and making the model better. They're were still updating ADA till pretty recently (last commit on the pytorch version is 4 months ago, to code).
I originally wasn't in CS, and when I first came over I wasn't in ML. We never had code. The fact that ML publishes models AND checkpoints is a godsend. I love it. Makes work so much easier and helps the community advance faster. I love this, but just chill. The paper isn't peer-reviewed. It is a pre-print. They're showing people what they've done in the last 6 months. It's part publicity stunt, part flex, part staking claim, but it is also part sharing with the community. Even without the code we learn a lot because they attached a paper to it. So chill.
The template screams NeurIPS though. Page limit for that would be 9 pages, this is 9.5, they might have started adding things after the first deadline, anticipating an extra page for camera ready?
I mean, that's just a bit of paper astrology of course. But if I'm right, then the author notification is September 28 and camera ready will be due in October, assuming it is accepted. So in that case releasing code (end of) September makes sense.
Edit: regardless of the (good) work NVidia have been doing over the last years, there is an issue here about big teams breaking the blind review process by putting themselves on the front of not just HN, but by now probably also the relevant twitter, fb, reddit pages. They know full-well that a release by NVidia will gain attention, and by the time review really gets started it's very likely any reviewer in their field will know exactly who they're reviewing.
> The template screams NeurIPS though. Page limit for that would be 9 pages, this is 9.5,
That's a fair point and I'm not sure why I didn't consider that they would release a pre-print after they had submitted it. (This is a total fumble on my part)
> there is an issue here about big teams breaking the blind review process by putting themselves on the front
I don't see that as actually breaking the blind review part. There are many more abuses that de-anonymize themselves. Most transformer research is done by big labs because they need the processing power and are the only ones who can afford such equipment (though there was a paper that did transformers on CPUs). Just training ImageNet is out of bounds for a lot of people (I have a few A6000s and it still takes me days). A trivial example is that Google will use JFT and will include it everywhere. If you're qualified to review you're probably going to be able to de-anonymize the lab. I do think we need to do more to make a more level playing field but that's an extremely difficult thing to do. More resources just enables you do do more. But maybe we shouldn't metric hack as much, which would slow things down a little.
The fact that you had to say "chill" three times indicates that you're trying to convince yourself, not me.
None of what you said is responsive to what I wrote. I think it's an opinion piece, but I'm not sure.
The issue here is the scientific method. I've listed the things that are required, as I see it. And I've also listed the reasons why I haven't been able to verify it exists here, despite trying for two years.
I'm glad that you like ML hacking, and I like it too. But models aren't a godsend; they're "the most basic, bare-minimum requirements of reproducibility."
Your reaction shouldn't be "I'm incredibly grateful you'd be willing to do this." It should be "You're required to do this, because if I can't verify your claims, your claims might be mistaken."
To leave it off on a softer note, normally I'd bond with you, ML hacker to ML hacker. Because I love ML, and I love hearing what you've been up to in ML. It's the best job in the world, as far as I'm concerned. (Could any other career give you the opportunity to be a developer advocate for high-performance computing in such an interesting way? https://github.com/google/jax/issues/2108#issuecomment-86623... Definitely looking for more examples of "Github Larping," if you know of any.)
If you agree that the scientific method is the reason ML moves forward, all I'm doing here is protecting it.
The scientific method is being followed here. Code is not needed for the scientific model to be followed. Even data. Literally every other field is able to advance without public code or data (in fact most areas of CS). There's absolutely no reason to believe that they won't release their code. They have a history of doing so. Models and checkpoints are not the bare-minimum for reproducibility. They describe their model enough in the paper. There's enough written in the paper (which is 30 pages) to reproduce the model. Will it be easy? No. But it can be done. And to be clear, I'm saying that the status quo of code being released is a godsend. This is not the norm in literally every other field/subfield. Code helps with reproducibility (and so should be encouraged) but is not required.
If you require someone else's code to reproduce results then you're not convincing me you're a good ML researcher nor programmer.
I retract my claims. You're right. Thanks for calling me out.
I will say that it's... a gargantuan effort to do the things that you're proposing. But as someone who did them you're right, you can. (BigGAN-Deep took a year to track down the bug https://github.com/google/compare_gan/issues/54)
BigGAN-Deep is a decent example of the thing I was really worried about: replication. I thought it'd be really easy to "just implement the paper." But no one had. Mooch did, but not at the same scale as the DeepMind release.
Maybe you're right about me, too. You're convincing me that I'm not a very good ML programmer. It's probably best to bow out on whatever high notes I've achieved.
Karras' work is fantastic. I don't know why this preview of things to come was where I chose to do this. Thank you, nVidia group, for working so hard.
Hey man, I respect that. I also understand your frustration. Reproduction is difficult. I'm going through it right now with a paper that has no code attached. You bet I'm pulling out my hair. I just think taking your frustration out on this paper is not the right vector. Please continue to call out papers that aren't reproducible. Please continue to push forward higher standards. But also recognize where we are and where we've come from. And most importantly, pick your battles. The passion is right, and I agree with the spirit of what you wrote, just not the direction.
And I'm not trying to say you suck. But you said you've been studying the subject for only 2 years. So I am going to check you. It's easy to grow an ego, but it often isn't useful. Sucking at something is the first step to being somewhat good at something. And you're clearly past the step of "sucking" but not to the step of "wizard." I don't know where you are between there tbh. But I do understand the frustration haha. That is normal.
Side note: usually it is good practice to note that you edited comments. It was rather confusing to look back and see something different.
> If you require someone else's code to reproduce results then you're not convincing me you're a good ML researcher nor programmer.
I call bullshit. In computer science, not releasing the code of an algorithm whose output you describe is akin to maliciously obfuscating your methods. No serious paper should be accepted without a script to reproduce the exact same results again.
> In computer science, not releasing the code of an algorithm whose output you describe is akin to maliciously obfuscating your methods.
Well tell that to my advisor (it's also something I've done in the past). So my experience doesn't reflect your claim.
> No serious paper should be accepted without a script to reproduce the exact same results again.
You do realize that this is a pre-print, right? If it went to NeurlIPS then they did release the code to them and will release the code to the public later.
The repetition here is a common rhetorical device, not necessarily an indication of self-doubt.
That said, I agree with your overall position on ML publications. So much of what we see is a tech demo protected by some kind of moat, either a private commercial dataset or insatiable processing requirements or missing code or a combination of the above. These aren’t science, they’re advertisements.
You're clearly disillusioned with the general accessibility of ML research, but I don't think your cynicism is warranted here. Take a look at their prior works[1], and I think you'll agree they go above and beyond in making their work accessible and reproducible. There is no reason to doubt the open-source release of this work will be any different. As to why the release is delayed, I'd speculate it's because they put a significant additional amount of work into releases and because releasing code in a large corporation is a bureaucratic hassle.
There is no reason to doubt the open-source release of this work will be any different.
Then this is not a scientific contribution yet.
We must wait and see.
The most important tenet of science, is to doubt. I didn’t even read the name on the paper before I wrote my comment. Yes, I know this group. They’re why I got into ML, along with the group from OpenAI who published GPT-2. Because A+ science.
Their claims here are likely wrong unless and until proven otherwise. This isn’t a hardline position. It’s been my experience across many codebases, during my two years of trying to reproduce many ideas.
I agree that that is an example of A+ science. But why do you think they’re punishing this now, today? Either because conference deadline or because nVidia pressure. Neither of those are related to helping me achieve the scientific method: reproducing the idea in the paper, to verify their claims.
All I can do is kind of try to reverse engineer some vague claims in a pdf, without those things.
--
Let me tell you a little bit about my job, because my time with my job may soon come to an end. I think that might clear up some confusion.
My job, as an ML researcher, is to learn techniques that may or may not be true, combine them in novel ways, and present results to others.
Knowledge, Contribution, Presentation, in that order.
The first step is to obtain knowledge. Let's set aside the question of why, because why is a question for me personally, which is unrelated.
Scientific knowledge comes when Knowledge, Contribution, and Presentation are all achieved in a rigorous way. The rigor allows people like me to verify that I have knowledge. Without this, I have mistaken knowledge, which is worse than useless. It's an illusion – I'm fooling myself.
When I got into ML two years ago, I thought that knowledge would come from reading scientific papers. I was wrong.
Most papers, are wrong. That's been my experience for the past two years. My experience may be wrong. Maybe others obtain rigorous scientific knowledge through the paper alone.
But researchers happen to obtain a dangerous thing: prestige. Unfortunately, prestige doesn't come from helping others obtain knowledge. It comes from that last step -- presentation.
The presentation on this thread is excellent. It's another Karras release. I agree; there's no reason to doubt they'll be just as rigorous with this release as they are with stylegan2.
But knowledge doesn't come from presentation. Only prestige.
Prestige makes a lot of new researchers try very hard to obtain the wrong things.
If all of these were small concerns, or curious quirks, they'd be a footnote in my field guide. But I submit that these things are front and center to the current state of affairs in 2021. Every time a release like this happens, it generates a lot of fanfare and we come together in celebration because ML Is Happening, Yay!
And then I try to obtain the Knowledge in the fanfare, and discover that either it's absent or mistaken. Because there are no tools for me to verify their claims -- and when I do, I often see that they don't work!
That's right. I kept finding out that these things being claimed, just aren't true. No matter how enticing the claim is, or whether it sounds like "Foobars are Aligned in the Convolution Digit," the claim, from where I was sitting, seemed to be wrong. It contained mistaken knowledge -- worse than useless.
Unfortunately, two years with no salary takes a toll. I could spend another few years doing this if I wanted to. But I wound up so disgusted with discovering that we're all just chasing prestige, not knowledge, that I'd rather ship production-grade software for the world's most boring commercial work, as long as the work seems useful and the team seems interesting. Because at least I'd be doing something useful.
Expecting fully executable code to accompany every publication is kind of unique to the modern ML research Scene. As someone from a very different computational research field, where zero code is the norm, not the exception, this reads as a somewhat entitled rant. Reimplementation of a paper is actually a test of the robustness of the results. If you download the code of a previous paper, there may be some assumptions hidden in the implementation that aren't obvious. So I would argue that simply downloading and re-executing the author's implementation does not constitute reproducible research.
I know it is costly, but for actual reproduction, reimplementation is needed.
I wasn't sure whether to post my edit as a separate comment or not, but I significantly expanded my comment just now, that helps explain my position.
I'd be very interested in your thoughts on that position, because if it's mistaken, I shouldn't be saying it. It represents whatever small contribution I can make to fellow new ML researchers, which is roughly: "watch out."
In short, for two years, I kept trying to implement stated claims -- to reproduce them in exactly the way you say here -- and they simply didn't work as stated.
It might sound confusing that the claims were "simply wrong" or "didn't work." But every time I tried, achieving anything remotely close to "success" was the exception, not the norm.
And I don't think it was because I failed to implement what they were saying in the paper. I agree that that's the most likely thing. But I was careful. It's very easy to make mistakes, and I tried to make none, as both someone with over a decade of experience (https://shawnpresser.blogspot.com/) and someone who cares deeply about the things I'm talking about here.
It takes hard work to reproduce the technique the way you're saying. I put all my heart and soul into trying to. And I kept getting dismayed, because people kept trying to convince me of things that either I couldn't verify (because verification is extremely hard, as you well know) or were simply wrong.
So if I sound entitled, I agree. When I got into this job, as an ML researcher, I thought I was entitled to the scientific method. Or anything vaguely resembling "careful, distilled, correct knowledge that I can build on."
I think that not being able to reproduce the results claimed in a paper is not specific to ML research. While working as a post-doc at a top university research lab, i spent years trying to understand how it can be that some software that was supposed to corresponds to the well cited paper did not even come close to reproducing the results of the said paper, and that the primary author went on to become a prof at a top university in the US. In short, scientific fraud is also quite common, in most academic papers.
i spent years trying to understand how it can be that some software that was supposed to corresponds to the well cited paper did not even come close to reproducing the results of the said paper,
This was my exact experience. I didn’t understand why I kept having it, and kept blaming myself for not being careful enough. My code must be wrong, or the data, or something.
Nah. It was the idea.
Kept feeling like a kick in the gut, until here we are today, when I’m warning everyone that Karras, of all people, might publish such a thing.
I really appreciate that you posted this, because I’m so happy I wasn’t alone in the feeling of “what’s going on, here…?”
I agree, most top conferences nowadays publish reviews openly, and address this issue I think. Also it is easier said than done, this is so endemic in so many different academic settings, not just in the US, but also in Europe.
I get your frustrations with this state of affairs, but for the reasons I mentioned above, I don't think providing the model and code is a panacea here. Maybe the last few years have also set an unrealistic expectation for the pace of progress. In my (former) field of theoretical neuroscience, if a paper was not reproducible, this knowledge kind of slowly diffused through the community, mostly through informal conversations with people who tried to reproduce or extend a given approach. But this takes several years, not the kind of timescale that modern ML research operates on.
Fwiw I think actual knowledge is there in the ML literature, but it's not in these Benchmark-chasing highly tuned papers. It's more high level stuff, like basic architecture building blocks etc. GANs and Transformers for example. They undeniably work, and the knowledge needed to implement them can probably be conveyed in a few pages maximum. No need for an implementation to be provided by the author, really.
I have no particular expertise here, but I wonder if you've learned to accept a mostly-broken process? We have the Internet, so why settle for slow diffusion over years instead of rapid communication?
Why should graduate students have to spend years trying to reproduce stuff that turns out to be no good? Nobody should have to put up with getting their time wasted like that.
I think it is a social problem, not a technical one. A healthy research field should have some level of cooperation between participants. If you go ahead and publish a "this does not reproduce" paper, you can easily ruin someone's career, so in most cases you don't. I know this is not the platonic ideal of science, but it is the reality, especially of smaller research communities. I agree this is not ideal, but not sure if I would call the process broken though.
This concern over ruining someone’s career itself seems like a symptom of a broken process? Making it safe to openly discuss failures is important.
In at least some big companies in the private sector we have “blameless postmortems” where we describe what went wrong in an operational failure without blaming the participating employees.
Sure having blameless postmortems would be amazing, but I think the informal process I described is probably the closest you will get to it. The reason being that any given subfield can't make this decision in isolation, because the people to who it matters (funding agencies, faculty search committees, journal editors to a degree) are not part of the field, and when they see such a 'blameless postmortem' they will think 'whoa, this person really messed up, we'd better stay away'.
Maybe I am wrong though, and a better culture is possible, like the shift to preprints has happened in a lot of fields and was probably previously unthinkable. So good on you for taking an idealistic stance, I am probably just being grumpy. That being said, whatever culture changes may be beneficial, I stand by my original point that simply dumping code and model alongside the paper is not unambiguously good and may even obscure problems.
To be honest I don't think the StyleGAN papers are benchmark chasing. If you read StyleGAN[0], StyleGAN2[1], StyleGAN2-ADA[2], and this paper there is a clear story. They call out the mistakes in the previous papers and resolve them. The papers themselves even admit to where they fall short. But it is research. Problems don't get solved all at once. But if you pay attention to these 4 papers it is very clear Kerras has a very well defined research focus and direction. He's showing his progress over time, sharing it with the community, and learning from the community as well. This is how research should happen.
Sometimes reimplantation is impossible without the code and the paper goes on to win awards because it's by a famous scientist. Then if the reimplantation doesn't work most of the time the graduate students are blamed instead of the original work.
There are always assumptions. At least with public code and models those assumptions are laid bare for all to see and potentially expose any bad assumptions.
I think the discrepancy between what actually happens in the code and the main ideas in the paper is a great point, and it touches on the parent commenter's goal of attaining knowledge. For instance, even if the results are reproducible, maybe the proposed key idea in the paper is not the piece of code with the largest impact on performance. Though, with the code available you might be able to discover exactly what the discrepancy is if there is one and you might be stuck at "it doesn't reproduce" otherwise
Yeah but look at the first comment which said the opposite of what you say is normal. The first comment was praising the fact the code was published, the second that no it's not, followed by an autistic rant.
To his defense, the spirit of his rant was valid, the letter made it sound entitled.
> When I got into ML two years ago, I thought that knowledge would come from reading scientific papers. I was wrong.
I'm in the middle of a PhD and this is always an issue. It takes awhile to learn how to read papers and to gather enough background knowledge that you can read between the lines (publications are limited, you can't put everything in a paper. This is why having code is so great, it accelerates the process). You're two years into your journey, this is often when things _start_ turning the other direction. There's a reason PhDs take so long, and that's with experts (hopefully) helping you learn how to read papers, telling you which papers to read (which is a challenge in of itself), having the ability to spend full time on learning, and learning how to build background knowledge on a subject while learning the state of the art. There's a reason ML pays the big bucks. It takes a long time to learn/gather expertise, it is fucking difficult, and it has direct applications that can lead to useful products today (a big component of why you get paid big bucks). It is also easy to lose track of your progress. I remember the first research paper I read was complete gibberish to me. I'm 3 years into my PhD and now I can understand papers in my niche. But for a long time a lot of stuff didn't click. This is normal. It takes time to learn and 2 years isn't that much (especially when you have a full time job). Making contributions in your first year of a PhD is atypical, even in your second year. It only happens at top universities where people have a lot of help and resources.
Research it hard. It takes years to become an expert and learn how to read papers. Don't give up, but calm down and recognize that given more time things will make more sense.
When I was writing a paper I had to include all source code in a state to be published, otherwise it wouldn't be accepted. I guess today the bar is much lower.
What was released was a pre-print, not a publication. Every top ML conference requires releasing of code, but this is not the norm for CS research nor for research in general.
(Agreed, fwiw. What’s going on here isn’t a criticism of this work specifically, but the trend of everyone thinking that this is science generally. For example, it’s true the code is coming in September. And, you and I both know it’s probably gonna have a model release, just because it’s more impressive, big-name Karras nVidia work. But it might not have a model release. I give that at least 40% odds. If it doesn’t, then everything I said above will be true about that too, Karras or not. People keep doing that, and we have to call out that this is approximately useless for you and me. Actually, I was going to say maybe it’s useful for you, but you’re the language model hacker and I’m the GAN hacker, and I assure you, code alone is useless for me. If it’s useful to you, I would love to know how it helps you verify the scientific method.)
I am not into ML, but from time to time I like to look how this is made and remember only once seeing the code and a model, which I thought was exception from the "norm". Good that more people are calling this out!
Thank you for calling this out. It's critically important that people understand the difference between model, code, and paper and what they mean.
It's also important that people understand that even if code is provided, it's commercially useless. From the NVAE license as an example[1]
> The Work and any derivative works thereof only may be used or intended for use
non-commercially.
It's a great example of the difference between open source (which it is) and free software which it is not. So we're back to square one where it is probably best to clean-room the implementation from the paper, which is nearly useless to reproduce the model.
Unfortunately, I must call you out too, my friend. With love.
Because it’s crucially important that we protect the scientific method here.
The sole goal is to help people like me reproduce the model. If I can’t reproduce the model, I can’t verify the paper.
When I saw “commercial” and then “open source” in your comment, I said “oh no…”
My duty is to the scientific method, so I don’t care if it’s the most restrictive code on the planet as long as I can use it to reproduce the model in the paper.
Because at that point, I have a baseline for evaluating the paper’s claims.
The reason I assume the paper is false until proven otherwise, is because the paper often doesn’t have enough detail to reproduce the model shown in the videos on this tech demos. Meaning, if they’re the it to help me, the ML researcher, then they’re failing to tell me how to evaluate their claims rigorously.
(That said, it’s breaking my heart that I can’t agree with you here, because I want to so badly. I’ve felt similarly for years that scientific contributions need to be “free as in beer” commercially. But I recognize signs of zealotry when I see them, and I can’t let my personal views creep in, because people like me would stop listening if I was here e.g. arguing vehemently that nVidia needed to be delivering us something commercially viable along with a high quality codebase. The price for entry to the scientific method isn’t so high.)
It's more than that. Here there's a fake demo we can only consider an advertisement. If they wanted it to be a scientific paper it should be reproducible and counter checked. What is the point of making a paper full of screenshot ?
It's not just a knowledge for knowledge's sake issue here, it's that it's not even knowledge they're publishing. They're publishing nothing.
They would make a license that says the code can only be provided for peer review and counter validation, then that'd be knowledge. Then, the sake of it is another secondary problem.
You won't need that much source code to verify their claims.
Their central improvement is that they limit the generation of high frequencies by ReLU through a upsample-ReLU-filter-downsample sequence.
Their theoretical section explains quite well why high frequencies can be proven mathematically to cause issues. And their practical implementation using filters to cut those off is very straightforward.
If someone tells you "The microphone recording had 50Hz noise so I used a filter to remove it", that's pretty much good enough for someone with experience in the field to replicate their results. This is the equivalent in AI. They uncovered a simple basic issue that everyone else overlooked, but once you know it, it seems obvious in retrospect.
For various reasons, I left data out of the requirements because most interesting research uses data unavailable to the community. CLIP is such an example, and it's A+ science: Model, Code, Paper.
Having the model is enough to verify the paper's claims, and also to experiment with new approaches (since you can fine-tune the model).
That said, I make this concession as a "meet you halfway" compromise between hard-line positions: "We can't release models, because we trained them on private data" and "You must release both models and data."
In other words, you're technically correct, but in my estimation it would do more harm to the end goal: the whole reason the scientific method is useful, is because it makes the world more useful.
The world would be less useful if fewer commercial companies participated in the scientific method. It's an inclusive group, not an exclusive clique. All you have to do, is give me the tools to verify your claims.
Yeah, for the goal of verifying claims that is good enough indeed.
I'm not sure it is good enough for a variety of other possibly useful use-cases though; eliminating bias in an existing model, correcting a flaw in the training code, creating a different model and proving it is better by training on the same data etc.
It would be nice if there were more public/libre data sets for ML stuff.
I hardly understand this comment. Statisticians have been publishing and arguing about models for 100 years now. No one required code to verify authenticity of research. I suppose it is the sorry state of machine learning research that the methodology is so poor that a person cannot verify the research from the paper.
Interesting to that this method makes use of Equivariant Neural Networks. Taco Cohen recently published his PhD thesis [1], which combines a dozen or so papers he authored on the topic.
I had ran a few chaotic experiments with StyleCLIP a few months ago which would work very well with smooth interpolation: https://minimaxir.com/2021/04/styleclip/