My CS Masters included lots of ML & AI but that was some time ago. While the underlying science hasn't changed much the tools recently have. To collect my thoughts and keep on track my goals are:
Work to sift through the cruft. AI now smells a ton like the old "Cyber" or "Big Data" days. There are so many companies out there that claim to provide AI services and capabilities. Its a salable keyword that I think is abused too much.
Solve real-world business use cases. How does one leverage AI models to help a business grow?
Review and put to use some of the fancy new frameworks and tools that are out there.
Create a totally on-prem solution.
Include federation, trust, sharing of training data & models, resource discovery, and annotated inference endpoints for programmatic consumption.
TODO / Progress:
RAG GUI & LLM Management for local knowledge bases (In Progress)
Choose and deploy a vector DB. (In Progress)
https://qdrant.tech/ - (choice made)
OpenAPI / gRPC / Restful endpoint management (In Progress)
Python based local LLM integration. (DONE)
Chaining of models with execution plans. (Pushed to v2.0)
ONNX model management and preparation for use as simplified web service endpoints for ingestion by client applications. (DONE)
ONNX model parsing and relational DB to React web app to view details. (DONE)
Render and execute Jupyter notebooks. Initially just Python based but built in a way to extend to Rust, C#, and other languages. (DONE for Python)
In-browser Python editor and execution GUI, (DONE)
In-browser Pytorch model trainer (DONE)
Support for CUDA based acceleration. I need to make sure CUDA supporting hardware is detected and leveraged. (DONE)
Execute AI code in a enclave that keeps the host operating system safe from malicious code. (DONE)
Capable of creating various collections of operating environments with specific library versions. New libraries can break old known working AI code. (DONE)
Capable of coordinating and setting limits on host CPU time, cores, and memory. (DONE)
Create a pipeline / scheduler to coordinate multiple execution requests.
Concurrent execution of multiple AI models works, (DONE)
Quartz.net or custom non-preemptive scheduler for AI tasks (Pushed to v2.0)
Package my tool for others to use. (Long Way Off!)
Before RAG completion I wanted to make sure that at a basic level I could ask questions about information I passed to my local LLM.
I passed some info about Ligers and got some pretty good responses!
I'd like to tackle both but am leaning towards RAG. Ok for sure RAG.
While fine tuning is neat, it requires access to model weights. Not all available models out there provide this.
Fine tuning also has the need for major hardware to update weights as you train.
From here I need to settle on a vector database and to ensure the embedding model my chosen LLM uses will work with it.
Resources:
From this article I want open source, free (obviously), active development, and reasonable speed. I don't need the fastest, but something dedicated over a relational DB wrapper like PostgreSQL has.
Qdrant appears to be the forerunner for me. Docker images and lots of support.
I decided to take a step back and hack at a new version of the model GUI. I figured a model might be used in many capacities. So I'm working on an approach where models can be curated in many ways.
OK, I did some GUI learnding ( Ralph Wiggum ) also.
Highlights:
Proper react class / react functional component abstraction and organization.
Boostrap-React usage. It is kinda neat. I need to get into theming next.
Implementing something I call plans. Plans:
Prep a single model in many ways for usage.
Chain models in specific and conditional ways.
Bind assets like DLLs, domain data, etc, for use in the "in" and "out" tensor prep code.
I'm currently working on a web GUI for model management. This encompasses all the prep needed to integrate models as web services for client applications to then use.
DONE:
Model ingestion with parsing of all relevant details and tensor info into a relational DB.
Execution of models and retrieval of inference results on the backend.
Browser coding of C# for input & output tensor preparation. Models need very specific inputs which take code to massage. The resnet image classifier for example needs image x/y scaling/pruning and tensor values to be within an expected mean and standard deviation.
Bind expected mimetypes for inputs. If a model expects JPEGs for image classification, then ensure only images are uploaded to the model.
Basic inference works! Dog image is correctly categorized!
TODO:
More needs to be wired up between the GUI and backend.
GUI work. (gross)
I created a web GUI to view and execute Jupyter notebooks. Jupyter notebooks are a collection of human readable domain content, data sets, and the code needed execute things. Think of a live Calc book with notes, live graphs, and code to walk a student through how an integral works. These notebooks can be used for any domain.
The basics:
Web based
Select and upload notebook via web GUI
WIP rendering of the Notebook (more complete as I find the time). This is just rendering the embedded Markdown code.
Code is parsed from notebook
New secure execution environment is spun up on the server
Code is passed to execution environment
Code is executed
Output from execution is passed back to web GUI
Execution environment is disposed
Growing list of features:
Web based
Support for many languages. Only Python for now.
Themes
Download of edited script
Local browser state saving added in case browser crashes
Execution of code works in the same way as the Jupyter Notebook Runner above
PyTorch can now properly detect CUDA capable hardware wherever my server is deployed. If no NVIDIA hardware is found then code is executed on the CPU only.
My NVIDIA GTX 1080 graphics card might be old but it does a good job accelerating matrix math. Pytortch is all tensors or multi-dimensional matrices. CUDA is the layer that binds the Pytorch AI code to the matrix capabilities of the graphics card.
I just need an excuse to buy a NVIDIA 4090. Too bad they're almost $2k. Womp womp....
I can hack AI Python from my phone if I want. I'd rather do a Worlde on the commode but you do you.
I ran a test of 4 long running (and different) AI models that were setup to run CUDA backed Pytorch on my single NVIDIA GPU.
All tests ran at the same time and fully shared the single GPU! No errors reported.
Training and downloading of models is now supported. Right now only basic Pytorch models are supported but I've also been tinkering with ONNX models used by Microsoft Azure.
Up next, some sort of GUI to execute these models and get results. This might take a while!
I need to think about:
How I handle the model interface specifications?
What types of web endpoints an application might need access to for passing data to and executing against models?
Where to save models, growing training data sets, and model versions? Over time as I get more data to train with I'd want to compare different model versions and how accurate they are.
I've been reviewing a few of the available AI libraries. Long story short I've needed to brush up on my Python.
PyTorch is the front runner for me. Nicely supported, many libraries, CUDA support, and no EOL in sight.
Scikit-Learn looks really neat with lots of machine learning capabilities. When I have time I'll for sure look more at this project!
Pandas seems nice but is not as prevalent as PyTorch. I'll give it a shot after a bit more PyTorch experience under my belt.
Tensorflow & possibly Keras from Google is on the way out. I believe it is being replaced by another internal tool called JAX.