Skip to content

Pre trained models

🚧 It is impossible to keep up manually 🚧

Because it is not possible to manually maintain open-source references, we refer the reader to the Hugging face OPen LLM Leaderboard

Still, below, we relay several important and foundational ones.

Pre trained models

Because of the costs associated with aggregating sufficient data and performing large-scale training it is often preferrable to start with pre-trained models. They can be both open source and closed source in origin, and choosing between the two of them will be an important decisions related to project requirements.

Whatever pre-trained model that you use, to ensure they meet technical, customer, and organizational requirements it is important to by compare and evaluate them.

Below we share important models.

APIs based model usage

Open Source

Llama Herds

Llama 3

Trained on 15T Multilingual tokens, with a max of 405B trainable parameters, the Meta team introduces a powerful bevy of new models with generally SOTA performance, especially for opensource models. They employed a more powerful data selection and synthesis strategy, with a generally simple post-training procedure using SFT, reejction sampling and DPO optimization. image

They also use '4D Parallelism to combine tensor parallelism (TP) pipeline parallelism (PP) Context parallelism (CP) and data parallelism (DP)

image

Finally, it is also multimodel. Here is their training approach:

image

MultiModal

📋
MOLMO

Developments A very high quality set of source set of models. They succeed with very high quality image captioning using the voice recordings to ensure annotations of images are done well and with high quality. Blog Paper

Text-focused

Llama is a library and set of models that has an expanding community due to the generally open-source nature of high-quality Llama 2 model.

Llama 2: Open Foundation and Fine-Tuned Chat Models A nearly open source set of 7B-70B models with quality performance" image

Sept, 2023 GitHub Repo stars Mistral Transformer

Announcement Hugging Face image

[Qwen]

Open-source : Qwen-72B and Qwen-1.8B! Including Base, Chat and Quantized versions.

🌟 Qwen-72B has been trained on high-quality data consisting of 3T tokens, boasting a larger parameter scale and more training data to achieve a comprehensive performance upgrade. Additionally, we have expanded the context window length to 32K and enhanced the system prompt capability, allowing users to customize their own AI assistant with just a single prompt.

🎁 Qwen-1.8B is our additional gift to the research community, striking a balance between maintaining essential functionalities and maximizing efficiency, generating 2K-length text content with just 3GB of GPU memory.

🤗 https://huggingface.co/Qwen 🤖 https://github.com/QwenLM/Qwen

Vision focused

Abstract

https://arxiv.org/pdf/2403.17297.pdf The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution.

Speech-to-text

📋
Moshi: a speech-text foundation model for real time dialogue

Closed Source

Evaluation of OpenAI o1: Opportunities and Challenges of AGI

The OpenAI reveals the next level of generation of improved models by directly integrating 'chain of thought' into the process. This allows it to 'think before it answer' allowing complex reasoning tasks and more transprent explanations to be provided. Importantly, o1 scales its imporformance with increased compute during inference, as opposed to during training. This leads it to SOTA results on anumber of components.

They also introduce the AGI-benchmark 1.0 to enable AI research and evaluation on 27 distinct categories looking at five major cognitive faculties: Reasoning, Creation & Design, Diagnosis, Reflection and Planning.

Importantly, they also demonstrate inference time scaling laws that show improved accuracy when more tokens are used at inference/generation time.

![image](https://github.com/user-attachments/assets/2a1d10ee-63c4-483f-be67-5170ee5c4d78)

These appear to be generally reproducable here Their system card is here

Gemini

Report Tech Report image image