AllenNLP started before transformers, and so it provided high level abstractions to experiment with model architectures, which is where much of NLP research was happening at the time. Transformers definitely changed the playing field, as it became the basis for most models!
I'll give you specific examples where AllenNLP overdid it, while HuggingFace was better just by keeping it simple.
Vocabulary class. HuggingFace just used a python dictionary. I can't think of one person who said they needed higher level abstraction. Turns out a python dictionary is pickle-able, saving to a text file is one line code, while the AbstractSinglettonProxyVocabulary is not and no one wants to care in the first place.
Tokenizer class. HuggingFace just used a python dictionary to return strings and integers. I can't think of one person frustrated by it. It's printable, picklable, and everything in between people can fiddle with. And boy where do I start about AllenNLP's overdoing of Tokenizers.
Trainer class. vs. HuggingFace example scripts. The scripts are just much more readable, tweakable, debuggable etc. HF didn't bother with AbstractBaseTrainer class bs.
It just shows they never understood the playing field.
- First, I don't think anyone thought AllenNLP was a good choice for high performance production systems. Again HuggingFace clearly understood the problem and built a fast tokenizer in Rust.
- A math, physics, linguistics, or even CS PhD student who know basics of coding would prefer bare bone scripts. They just want to hack it off and focus on research. Writing good code is not their objective.
AllenNLP was written for research, not for production. Many of the design choices reflect that.
As far as the vocabulary goes, a lot of AllenNLP components are about experimenting with ways to turn text into vectors. Constructing the vocabulary is part of that. When pre-trained transformers became a thing, this wasn't needed anymore. That's part of why we decided to deprecate the library: Very few people experiment with how to construct vocabularies anymore, so we don't want to live with the complexity anymore.
Hugging Faces APIs really aren't that great, I hear lots of people complain about them. All HF did was make transformers very accessible and sharable with a neat UI.
When we started AllenNLP, PyTorch was just starting to emerge as a competitor to Tensorflow and we made the difficult decision to support PyTorch. In hindsight this was a great decision as the majority of top research is done in PyTorch today.
Tango primarily supports PyTorch, but unlike AllenNLP, is flexible enough to support other deep learning libraries as well. For example, we're adding support for JAX so we can easily leverage TPUs.
For what I've seen Tango is a general dag/pipeline that happens to have some facilities for PyTorch. I don't see any deep learning specific. You could execute sklearn or whatever.
Maybe we need to re-work the docs if the DAG aspects stick out to you so much. The main functionality is the cache. If you have a complex experiment, you can still write the code as if all the steps were fast, and let them be slow only the first time you run it. The DAG stuff is also nice, but less important.
That said, you could execute sklearn. If that's what your experiment needs, it's the right thing to do. This is why it gives us the flexibility to also support Jax: https://github.com/allenai/tango/pull/313
The DL-specific stuff is in the components we supply. Like the trainer, dataset handling stuff, file formats, and increasingly, https://github.com/allenai/catwalk.
Isn't that an issue? for instance if someone made a video 'out of your domain' (e.g. different model than the interal training example) how would the model perform? Would the AUC be impacted? what is the PPV? It seems common in these results that people are experiencing false positives, i did as well. if the percentage of fake news that we read is 10% and the model (auc + operating point on a test set unpublished) has 92% sens and spec we would still expect that ~50% of model positives are true negatives. If the "accuracy" is computed in an unblanaced dataset, what is to be taken from it ?
What happens is it essentially collapses as it requires a set of people to train the model. Meaning that set of people with their biases are training an AI to determine what is fake and what isn't.
Sounds like a pretty bad idea especially if they decide to be gatekeepers of factual articles. It requires the entire team to know their biases one way or another. Regardless if they think it's "right" or not.
When I read "flaxseed oil is the only drying oil that’s edible", Tung Oil came to mind as a candidate for seasoning cast iron. It's easy to buy 100% tung oil that's reasonably priced and FDA approved for food contact (see http://www.realmilkpaint.com/oil.html).