I really want to understand this post, but I can't, may you please direct me, and those noobs like me - to resources to be able to read this? (help anyone to climb the knowledge ladder) ELI3
-
EDIT - GPT helped me understand better:
--
>>> "This model is special because it can do similar tasks as the big models but requires much less computational power1. It’s like having a small but powerful engine that can do the work of a big one. This makes it more accessible for more people to use it"
---
>>> "TinyGPT-V is built on another model called Phi-2 and uses pre-trained vision modules from BLIP-2 or CLIP1. It has 2.8 billion parameters (these are like the model’s brain cells) and can be further compressed to fit on devices with 8GB memory1. This means you could potentially run this model on your personal computer or even some high-end smartphones1"
----
>>> "In summary, TinyGPT-V is a step towards making powerful AI models more accessible and efficient, which could lead to their use in a wide range of real-world applications1. The authors have also shared their code and training weights for others to use and learn from1"
-----
This is really interesting if you fan out implications over N time?
Here is my thinking:
Assume this paper results in a way of "compression-alyzed vision" into a model (a tiny compressed view into a model)
Then one, in a few years can imagine "laser views" - that slice through fractals of models to find the result. Resulting in tiny agents that have a heat-seeking-fractal-laser that can navigate giant data based on a method of knowing instantaneously what to exclude (meaning the path is defined by the walls that you already know you do not want to hit, so your steps are always that which helps you forward)
--
Or am I stating something obvious to all you brainiacs?
I am no brainiac, and it isn't super clear from your post what you're describing, but here is some info that might help you better convey your question:
This is a neural net built by conjoining Phi-2 [the best available small LLM, if you'll pardon the contradiction in terms] with pre-trained vision models like BLIP or CLIP. Models are piles of weights[/parameters] that are generated by training on datasets.
Already work has shown that training a multi-model model from the start results in smaller, more effective model. If you want to know more, check out recent work from CVPR [a vision machine learning conference] from '23[0] and upcoming work for this year[1]
edit to add:
The work of MS researcher Chunyuan Li[2] is worth keeping an eye on, particularly recent work like LLaVA-Interactive[3], a multi-model multi-task AI system, might be what you're trying to describe with your laser/fractal view phrasing.
-
EDIT - GPT helped me understand better:
--
>>> "This model is special because it can do similar tasks as the big models but requires much less computational power1. It’s like having a small but powerful engine that can do the work of a big one. This makes it more accessible for more people to use it"
---
>>> "TinyGPT-V is built on another model called Phi-2 and uses pre-trained vision modules from BLIP-2 or CLIP1. It has 2.8 billion parameters (these are like the model’s brain cells) and can be further compressed to fit on devices with 8GB memory1. This means you could potentially run this model on your personal computer or even some high-end smartphones1"
----
>>> "In summary, TinyGPT-V is a step towards making powerful AI models more accessible and efficient, which could lead to their use in a wide range of real-world applications1. The authors have also shared their code and training weights for others to use and learn from1"
-----
This is really interesting if you fan out implications over N time?
Here is my thinking:
Assume this paper results in a way of "compression-alyzed vision" into a model (a tiny compressed view into a model)
Then one, in a few years can imagine "laser views" - that slice through fractals of models to find the result. Resulting in tiny agents that have a heat-seeking-fractal-laser that can navigate giant data based on a method of knowing instantaneously what to exclude (meaning the path is defined by the walls that you already know you do not want to hit, so your steps are always that which helps you forward)
--
Or am I stating something obvious to all you brainiacs?
(no shame, I like thinking out loud)