Skip to content


Building a GenAI application 'from scratch' can be a very daunting process considering the the stack that is involved. Quite fortunately, many tools, services, and libraries exist to accelerate a full-stack GenAI solution. It would also be worthwhile to consider building or buying.

Lets first look at the components that need to be put together.

The stack

Layer Component Description
Layer 4: Management 📊 Monitoring Tools for monitoring the AI system's performance and health.
🛡 Compliance Uses observability to ensure the system is operating within legal and ethical boundaries.
Layer 3: Application 🖥 UI/UX Front ends GUIs and interfaces are specifically designed for streamlined connection with GenAI models.
📝 System evaluators Systems for assessing the performance and effectiveness of AI systems.
🧩 Orchestration Tools Languages and services to create and coordinate LLM-chains, agents workflows involving memory.
🗄 Vector Database Methods of storing/indexing and retrieving documents.
📊 Prompt Management Systems to manage and refine the prompts used in conversational AI.
🔧 Model Optimization Methods of enabling models to fulfill customer requirements.
Layer 2: Models 🚀 Model Serving Services to deploy and coordinate model inference at scale.
💻 Computation Providers of computational resources, specifically GPUs, for AI processing.
🔄 ML Ops ML operations enable efficient coordination around Model training and tracking.
🏋️ Model Training Tools safety of AI systems.
📊 Model comparisons Methods of evaluating and comparing models across baselines and benchmarks.
🧠 Pretrained Models Pre-built models offering a range of capabilities and uses.
📚 AI software libraries Higher level languages that enable AI/ML training.
Layer 1: Data 🧼 Data Processing Tools for cleaning, normalizing, and preparing data for analysis.
🔄 ETL + Data Pipelines Tools to find, extract, transform, and load data, and to manage data flow.
🗃 Databases Services for structured data storage and retrieval.
📈 Data set solutions Places where one can obtain data for training and using models effectively.

How start?

When developing AI-enabled products, consider the following components

1. Requirements

The client's requirements are determined by the specific target audience you're catering to. Concentrating on a smaller audience helps to minimize initial requirements and might assist in the quick creation of a minimum viable product (MVP). The needs of the audience can be expanded or altered as required. Typically, the requirements demand quick and satisfactory results.

Compute Requirements

There are two primary, and often competing factors to consider when when assessing the model deployment requirements.

  • Latency
  • Accuracy

Keep in mind that the performance will not be evaluated just based on model-computation, but the entire orchestration and end-user UI/UX.



2. Servable Model

The models must be capable of delivering the required content with an acceptable latency to meet the requirements.

You might decide to rely on an API to handle model responses. Alternatively you may use an pre-trained model, To reduce development costs using smaller/cheaper models may be preferred to get a working solution.

However, for wider scale deployment it will be crucial to optimize your models' serving. Using services that try to optimize this for you, like OpenRouter may be helpful.

Orchestration and Back-end compute

Methods will require orchestrating the GenAI interactions, fusing memory and other information. These may work together or independently from back end You may also need additional tools and libraries for your solution.

Front-end Interface

Finally, you'll need to present the results to the end-user effectively. Look into our discussion on front ends for best practices and excellent solutions for your model output.

Remember that needs will evolve as your understanding of all the above factors shifts. So it's crucial to start with a base that you can iterate from, especially if your solution involves a data flywheel.

Monitoring Gen()AI

For reasons related to quality, ethics, and regulation, it is both useful, and at times required, to record both inputs, and outputs from an LLM. Particularly in systems that may be used in non low-risk settings, monitoring is an essential component of Gen()AI. Also known as LLM observability, monitoring can people-in-the-loop, as well as automated systems to observe and adapt the system to both inputs and outputs that are undesired or dangerous.


It should have been done yesterday, yes. But how soon is the solution actually needed?

Budget Considerations

The allocated budget will affect your tool's monetization strategy.

Useful References

LLMs from scratch provides a quality series of Jupyter notebooks revealing how to build LLMs from scratch.

Emerging Architectures for LLM Applications A detailed discussion of the components and their interactions using orchestration systems.


LLM Patterns An impressively thorough and well-written discussion on LLMs and patterns within them

Important patterns mentioned (references to discussions herein):
* Evaluating and comparing * Retreival Augmented Generation (RAG) * Fine tuning * Caching to reduce latency. * Guardrails to ensure output (and input) quality. * Data Flywheel to use data collection and feedback to improve model and experience * Cascade Breaking models up into smaller simpler tasks instead of big ones. * Monitoring to ensure value is being derived * Effective (defensive) UX to ensure the models can be used well. image

Here are some other overviews to assist you in understanding the practical aspects of Generative AI, particularly with regards to GPT and large language models.