Skip to content

Models

Browse our entire catalog of models:

Automatic Speech Recognition

Automatic speech recognition (ASR) models convert a speech signal, typically an audio input, to text.

ModelDescription
whisper-tiny-en
Beta
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.
whisper
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Image Classification

Image classification models take an image input and assigns it labels or classes.

ModelDescription
resnet-50
50 layers deep image classification CNN trained on more than 1M images from ImageNet

Translation

Translation models convert a sequence of text from one language to another.

ModelDescription
m2m100-1.2b
Multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation

Image-to-Text

Image to text models output a text from a given image. Image captioning or optical character recognition can be considered as the most common applications of image to text.

ModelDescription
llava-1.5-7b-hf
Beta
LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.
uform-gen2-qwen-500m
Beta
UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

Text-to-Image

Generates images from input text. These models can be used to generate and modify images based on text prompts.

ModelDescription
dreamshaper-8-lcm
Beta
Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.
stable-diffusion-v1-5-img2img
Beta
Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion.
stable-diffusion-v1-5-inpainting
Beta
Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.
stable-diffusion-xl-base-1.0
Beta
Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts.
stable-diffusion-xl-lightning
Beta
SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.

Text Classification

Sentiment analysis or text classification is a common NLP task that classifies a text input into labels or classes.

ModelDescription
distilbert-sst-2-int8
Distilled BERT model that was finetuned on SST-2 for sentiment classification

Object Detection

Object detection models can detect instances of objects like persons, faces, license plates, or others in an image. This task takes an image as input and returns a list of detected objects, each one containing a label, a probability score, and its surrounding box coordinates.

ModelDescription
detr-resnet-50
Beta
DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images).

Text Generation

Family of generative text models, such as large language models (LLM), that can be adapted for a variety of natural language tasks.

ModelDescription
deepseek-coder-6.7b-base-awq
Beta
Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.
deepseek-coder-6.7b-instruct-awq
Beta
Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.
deepseek-math-7b-base
Beta
DeepSeekMath is initialized with DeepSeek-Coder-v1.5 7B and continues pre-training on math-related tokens sourced from Common Crawl, together with natural language and code data for 500B tokens.
deepseek-math-7b-instruct
Beta
DeepSeekMath-Instruct 7B is a mathematically instructed tuning model derived from DeepSeekMath-Base 7B. DeepSeekMath is initialized with DeepSeek-Coder-v1.5 7B and continues pre-training on math-related tokens sourced from Common Crawl, together with natural language and code data for 500B tokens.
discolm-german-7b-v1-awq
Beta
DiscoLM German 7b is a Mistral-based large language model with a focus on German-language applications. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.
falcon-7b-instruct
Beta
Falcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.
gemma-2b-it-lora
Beta LoRA
This is a Gemma-2B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
gemma-7b-it-lora
Beta LoRA
This is a Gemma-7B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
gemma-7b-it
Beta LoRA
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.
hermes-2-pro-mistral-7b
Beta Function calling
Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes! Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.
llama-2-13b-chat-awq
Beta
Llama 2 13B Chat AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Llama 2 variant.
llama-2-7b-chat-fp16
Full precision (fp16) generative text model with 7 billion parameters from Meta
llama-2-7b-chat-hf-lora
Beta LoRA
This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.
llama-2-7b-chat-int8
Deprecated
Quantized (int8) generative text model with 7 billion parameters from Meta
llama-3-8b-instruct-awq
Beta
Quantized (int4) generative text model with 8 billion parameters from Meta.
llama-3-8b-instruct
Beta
Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.
llama-3.1-8b-instruct-awq
Beta
Quantized (int4) generative text model with 8 billion parameters from Meta.
llama-3.1-8b-instruct-fp8
Beta
Llama 3.1 8B quantized to FP8 precision
llama-3.1-8b-instruct
Beta
The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.
llamaguard-7b-awq
Beta
Llama Guard is a model for classifying the safety of LLM prompts and responses, using a taxonomy of safety risks.
mistral-7b-instruct-v0.1-awq
Beta
Mistral 7B Instruct v0.1 AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Mistral variant.
mistral-7b-instruct-v0.1
LoRA
Instruct fine-tuned version of the Mistral-7b generative text model with 7 billion parameters
mistral-7b-instruct-v0.2-lora
Beta LoRA
The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.
mistral-7b-instruct-v0.2
Beta LoRA
The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2. Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1: 32k context window (vs 8k context in v0.1), rope-theta = 1e6, and no Sliding-Window Attention.
neural-chat-7b-v3-1-awq
Beta
This model is a fine-tuned 7B parameter LLM on the Intel Gaudi 2 processor from the mistralai/Mistral-7B-v0.1 on the open source dataset Open-Orca/SlimOrca.
openchat-3.5-0106
Beta
OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT - a strategy inspired by offline reinforcement learning.
openhermes-2.5-mistral-7b-awq
Beta
OpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune, a continuation of OpenHermes 2 model, which trained on additional code datasets.
phi-2
Beta
Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4T tokens from multiple passes on a mixture of Synthetic and Web datasets for NLP and coding.
qwen1.5-0.5b-chat
Beta
Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud.
qwen1.5-1.8b-chat
Beta
Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud.
qwen1.5-14b-chat-awq
Beta
Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.
qwen1.5-7b-chat-awq
Beta
Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.
sqlcoder-7b-2
Beta
This model is intended to be used by non-technical users to understand data inside their SQL databases.
starling-lm-7b-beta
Beta
We introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO).
tinyllama-1.1b-chat-v1.0
Beta
The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.
una-cybertron-7b-v2-bf16
Beta
Cybertron 7B v2 is a 7B MistralAI based model, best on it's series. It was trained with SFT, DPO and UNA (Unified Neural Alignment) on multiple datasets.
zephyr-7b-beta-awq
Beta
Zephyr 7B Beta AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Zephyr model variant.

Text Embeddings

Feature extraction models transform raw data into numerical features that can be processed while preserving the information in the original dataset. These models are ideal as part of building vector search applications or Retrieval Augmented Generation workflows with Large Language Models (LLM).

ModelDescription
bge-base-en-v1.5
BAAI general embedding (bge) models transform any given text into a compact vector
bge-large-en-v1.5
BAAI general embedding (bge) models transform any given text into a compact vector
bge-small-en-v1.5
BAAI general embedding (bge) models transform any given text into a compact vector

Summarization

Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text.

ModelDescription
bart-large-cnn
Beta
BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. You can use this model for text summarization.