Index

Building a GenAI application 'from scratch' can be a very daunting process considering the the stack that is involved. Quite fortunately, many tools, services, and libraries exist to accelerate a full-stack GenAI solution. It would also be worthwhile to consider building or buying.

Lets first look at the components that need to be put together.

The stack¶

Layer	Component	Description
Layer 4: Management	📊 Monitoring	Tools for monitoring the AI system's performance and health.
	🛡 Compliance	Uses observability to ensure the system is operating within legal and ethical boundaries.
Layer 3: Application	🖥 UI/UX Front ends	GUIs and interfaces are specifically designed for streamlined connection with GenAI models.
	📝 System evaluators	Systems for assessing the performance and effectiveness of AI systems.
	🧩 Orchestration Tools	Languages and services to create and coordinate LLM-chains, agents workflows involving memory.
	🗄 Memory	Methods of storing/indexing and retrieving documents.
	📊 Prompt Management	Systems to manage and refine the prompts used in conversational AI.
	🔧 Model Optimization	Methods of enabling models to fulfill customer requirements.
Layer 2: Models	🚀 Model Serving	Services to deploy and coordinate model inference at scale.
	💻 Computation	Providers of computational resources, specifically GPUs, for AI processing.
	🔄 ML Ops	ML operations enable efficient coordination around Model training and tracking.
	🏋️ Model Training	Tools safety of AI systems.
	📊 Model comparisons	Methods of evaluating and comparing models across baselines and benchmarks.
	🧠 Pretrained Models	Pre-built models offering a range of capabilities and uses.
	📚 AI software libraries	Higher level languages that enable AI/ML training.
Layer 1: Data	🧼 Data Processing	Tools for cleaning, normalizing, and preparing data for analysis.
	🔄 ETL + Data Pipelines	Tools to find, extract, transform, and load data, and to manage data flow.
	🗃 Databases	Services for structured data storage and retrieval.
	📈 Data set solutions	Places where one can obtain data for training and using models effectively.

How start?¶

When developing AI-enabled products, consider the following components

1. Requirements ¶

The client's requirements are determined by the specific target audience you're catering to. Concentrating on a smaller audience helps to minimize initial requirements and might assist in the quick creation of a minimum viable product (MVP). The needs of the audience can be expanded or altered as required. Typically, the requirements demand quick and satisfactory results.

Compute Requirements ¶

There are two primary, and often competing factors to consider when when assessing the model deployment requirements.

Latency
Accuracy

Keep in mind that the performance will not be evaluated just based on model-computation, but the entire orchestration and end-user UI/UX.

Costs¶

Costs

2. Servable Model ¶

The models must be capable of delivering the required content with an acceptable latency to meet the requirements.

You might decide to rely on an API to handle model responses. Alternatively you may use an pre-trained model, To reduce development costs using smaller/cheaper models may be preferred to get a working solution.

However, for wider scale deployment it will be crucial to optimize your models' serving. Using services that try to optimize this for you, like OpenRouter may be helpful.

Orchestration and Back-end compute ¶

Methods will require orchestrating the GenAI interactions, fusing memory and other information. These may work together or independently from back end You may also need additional tools and libraries for your solution.

Front-end Interface ¶

Finally, you'll need to present the results to the end-user effectively. Look into our discussion on front ends for best practices and excellent solutions for your model output.

Remember that needs will evolve as your understanding of all the above factors shifts. So it's crucial to start with a base that you can iterate from, especially if your solution involves a data flywheel.

Monitoring Gen()AI¶

For reasons related to quality, ethics, and regulation, it is both useful, and at times required, to record both inputs, and outputs from an LLM. Particularly in systems that may be used in non low-risk settings, monitoring is an essential component of Gen()AI. Also known as LLM observability, monitoring can people-in-the-loop, as well as automated systems to observe and adapt the system to both inputs and outputs that are undesired or dangerous.

Timeline¶

It should have been done yesterday, yes. But how soon is the solution actually needed?

Budget Considerations¶

The allocated budget will affect your tool's monetization strategy.

Useful References¶

LLMs from scratch provides a quality series of Jupyter notebooks revealing how to build LLMs from scratch.

Emerging Architectures for LLM Applications A detailed discussion of the components and their interactions using orchestration systems.

LLM Patterns An impressively thorough and well-written discussion on LLMs and patterns within them

Important patterns mentioned (references to discussions herein):
* Evaluating and comparing * Retreival Augmented Generation (RAG) * Fine tuning * Caching to reduce latency. * Guardrails to ensure output (and input) quality. * Data Flywheel to use data collection and feedback to improve model and experience * Cascade Breaking models up into smaller simpler tasks instead of big ones. * Monitoring to ensure value is being derived * Effective (defensive) UX to ensure the models can be used well.

Here are some other overviews to assist you in understanding the practical aspects of Generative AI, particularly with regards to GPT and large language models.

Index

The stack¶

How start?¶

1. Requirements¶

Compute Requirements¶

Costs¶

2. Servable Model¶

Orchestration and Back-end compute¶

Front-end Interface¶

Monitoring Gen()AI¶

Timeline¶

Budget Considerations¶

Useful References¶

1. Requirements ¶

Compute Requirements ¶

2. Servable Model ¶

Orchestration and Back-end compute ¶

Front-end Interface ¶