Radiance logo
Cloud
AI Inference Engine

Introduction

Radiance provides an OpenAI-compatible API for running high-performance inference on distributed edge nodes. You can use standard OpenAI client libraries by simply changing the de>base_url.

Base URL
https://api.radiance.cloud/v1

Authentication

All API requests (except de>/models) require an API Key. You must include this key in the de>Authorization header.

Authorization: Bearer YOUR_API_KEY

Endpoints

GET List Models

Retrieve a list of available models, their specific capabilities, and pricing.

curl https://api.radiance.cloud/v1/models

Response Example

{
  "object": "list",
  "data": [
    {
      "id": "DeepSeek-V3",
      "object": "model",
      "name": "DeepSeek V3",
      "context_length": 131072,
      "pricing": { "prompt": "0.3", "completion": "1.0" },
      "supported_sampling_parameters": ["temperature", "top_p", "max_tokens"]
    },
    {
      "id": "Llama-3.3-70B-Instruct",
      "pricing": { "prompt": "0.15", "completion": "0.50" }
    }
  ]
}

POST Chat Completions

Create a model response for a chat conversation. Fully compatible with OpenAI's Chat API.

Request Example: DeepSeek V3

curl https://api.radiance.cloud/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "DeepSeek-V3",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum entanglement."}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

Request Example: Llama 3.3 (with frequency penalty)

curl https://api.radiance.cloud/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "Llama-3.3-70B-Instruct",
    "messages": [{"role": "user", "content": "Write a creative story."}],
    "temperature": 0.8,
    "frequency_penalty": 0.5,
    "presence_penalty": 0.2
  }'

Parameters

NameTypeDescription
modelstringThe ID of the model (e.g., de>DeepSeek-V3, de>Llama-3.3-70B-Instruct).
messagesarrayA list of messages comprising the conversation so far.
temperaturenumberSampling temperature (0.0 to 2.0). Higher = more random.
top_pnumberNucleus sampling. Consider tokens with top_p probability mass.
max_tokensintegerThe maximum number of tokens to generate.
streambooleanIf true, partial message deltas will be sent as SSE.
frequency_penaltynumber-2.0 to 2.0. Penalizes new tokens based on their existing frequency.
(Supported by Llama 3.3/3.2)
presence_penaltynumber-2.0 to 2.0. Penalizes new tokens based on whether they appear in text.
(Supported by Llama 3.3/3.2)
repetition_penaltynumber> 1.0. Penalizes repetition.
(Supported by Qwen 2.5)

Response Example

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "DeepSeek-V3",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nQuantum entanglement is a phenomenon..."
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Client Libraries

You can use the official OpenAI Python or Node.js SDKs.

Python Example
from openai import OpenAI

client = OpenAI(
    base_url="https://api.radiance.cloud/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="Qwen-2.5-72B-Instruct",
    messages=[
        {"role": "system", "content": "You are a coding expert."},
        {"role": "user", "content": "Write a Rust function to reverse a string."}
    ],
    temperature=0.2
)

print(response.choices[0].message.content)