Skip to main content
warning

🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.

Guild to Python Engine

Introduction​

To run python program, we need python environment and python intepreter to running the different process from the main cortex process. All requests to The python program will be routed through cortex with Http API protocol.

The python-engine acts like a process manager, mange all python processes. Each python program will be treated as a model and has it own model.yml template

Python engine cpp implementation​

The python-engine implemented the EngineI Interface with the following map:

  • LoadModel: Load the python program and start the python process
  • UnloadModel: Stop the python process
  • GetModelStatus: Send health check requests to the python processes
  • GetModels: Get running python program

Beside the EngineI interface, the python-engine also implemented the HandleInference and HandleRouteRequest method:

  • HandleInference: Send inference request to the python process
  • HandleRouteRequest: route any types of requests to the python process

Python engine is a built in engine of cortex-cpp, so that it will automatically loaded when load model, users don't need to download engine or load/unload engine like working with llama-cpp engine.

Python program implementation​

Each python program will be treated as python model. Each python model has it own model.yml template:


id: ichigo-0.5:fp16-linux-amd64
model: ichigo-0.5:fp16-linux-amd64
name: Ichigo Wrapper
version: 1
port: 22310
script: src/app.py
log_path: ichigo-wrapper.log
log_level: INFO
command:
- python
files:
- /home/thuan/cortexcpp/models/cortex.so/ichigo-0.5/fp16-linux-amd64
depends:
- ichigo-0.4:8b-gguf-q4-km
- whispervq:fp16-linux-amd64
- fish-speech:fp16-linux-amd64
engine: python-engine
extra_params:
device_id: 0
fish_speech_port: 22312
ichigo_model: ichigo-0.4:8b-gguf-q4-km
ichigo_port: 39281
whisper_port: 3348

ParameterDescriptionRequired
idUnique identifier for the model, typically includes version and platform information.Yes
modelSpecifies the variant of the model, often denoting size or quantization details.Yes
nameThe human-readable name for the model, used as the model_id.Yes
versionThe specific version number of the model.Yes
portThe network port on which the Python program will listen for requests.Yes
scriptPath to the main Python script to be executed by the engine. This is relative path to the model folderYes
log_pathFile location where logs will be stored for the Python program's execution. log_path is relative path of cortex data folderNo
log_levelThe level of logging detail (e.g., INFO, DEBUG).No
commandThe command used to launch the Python program, typically starting with 'python'.Yes
filesFor python models, the files is the path to folder contains all python scripts, model binary and environment to run the programNo
dependsDependencies required by the model, specified by their identifiers. The dependencies are other modelsNo
engineSpecifies the engine to use, which in this context is 'python-engine'.Yes
extra_paramsAdditional parameters that may be required by the model, often including device IDs and network ports of dependencies models. This extra_params will be passed when running python scriptNo

Ichigo python with cortex​

Ichigo python is built in model in cortex that support chat with audio.

Downloads models​

Ichigo python requires 4 models to run:

  • ichigo-0.5
  • whispervq
  • ichigo-0.4
  • fish-speech (this model is required if user want to use text to speech mode)

Firstly, you need to download these models, remember to chose the correct version base on your device and operating system. for example for linux amd64:


> curl --location '127.0.0.1:39281/v1/models/pull' \
--header 'Content-Type: application/json' \
--data '{"model":"ichigo-0.5:fp16-linux-amd64"}'
> curl --location '127.0.0.1:39281/v1/models/pull' \
--header 'Content-Type: application/json' \
--data '{"model":"ichigo-0.4:8b-gguf-q4-km"}'
> curl --location '127.0.0.1:39281/v1/models/pull' \
--header 'Content-Type: application/json' \
--data '{"model":"whispervq:fp16-linux-amd64"}'
> curl --location '127.0.0.1:39281/v1/models/pull' \
--header 'Content-Type: application/json' \
--data '{"model":"fish-speech:fp16-linux-amd64"}'

Start model​

Each python model will run it owns server with a port defined in model.yml, you can update model.yml to change the port. for the ichigo-0.5 model, it has extra_params that need to be defined correctly: extra_params: device_id: 0 fish_speech_port: 22312 ichigo_model: ichigo-0.4:8b-gguf-q4-km ichigo_port: 39281 whisper_port: 3348

To start model just need to send API:


> curl --location '127.0.0.1:39281/v1/models/start' \
--header 'Content-Type: application/json' \
--data '{
"model":"ichigo-0.5:fp16-linux-amd64"
}'

Then the model will start all dependencies model of ichigo

Check Status​

You can check the status of the model by sending API:


curl --location '127.0.0.1:39281/v1/models/status/fish-speech:fp16-linux-amd64'

Inference​

You can send inference request to the model by sending API:


> curl --location '127.0.0.1:39281/v1/inference' \
--header 'Content-Type: application/json' \
--data '{
"model":"ichigo-0.5:fp16-linux-amd64",
"engine":"python-engine",
"body":{
"messages": [
{
"role":"system",
"content":"you are helpful assistant, you must answer questions short and concil!"
}
],
"input_audio": {
"data": "base64_encoded_audio_data",
"format": "wav"
},
"model": "ichigo-0.4:8b-gguf-q4km",
"stream": true,
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 2048,
"presence_penalty": 0,
"frequency_penalty": 0,
"stop": [
"<|eot_id|>"
],
"output_audio": true
}}'

Stop Model​

You can stop the model by sending API:


> curl --location '127.0.0.1:39281/v1/models/stop' \
--header 'Content-Type: application/json' \
--data '{
"model":"ichigo-0.5:fp16-linux-amd64"
}'

Cortex also stop all dependencies of this model.

Route requests​

Beside from that, cortex also support route any kind of request to python program through the route request endpoint.


curl --location '127.0.0.1:39281/v1/route/request' \
--header 'Content-Type: application/json' \
--data '{
"model":"whispervq:fp16",
"path":"/inference",
"engine":"python-engine",
"method":"post",
"transform_response":"{ {%- set first = true -%} {%- for key, value in input_request -%} {%- if key == \"tokens\" -%} {%- if not first -%},{%- endif -%} \"{{ key }}\": {{ tojson(value) }} {%- set first = false -%} {%- endif -%} {%- endfor -%} }",
"body": {
"data": "base64 data",
"format": "wav"
}
}
'

Add new python model​

Python model implementation​

The implementation of a python program can follow this implementation. The python server should expose at least 2 endpoint:

  • /health : for checking status of server.
  • /inference : for inferencing purpose.

Exemple of the main entrypoint src/app.py:


import argparse
import os
import sys
from pathlib import Path
from contextlib import asynccontextmanager
from typing import AsyncGenerator, List
import uvicorn
from dotenv import load_dotenv
from fastapi import APIRouter, FastAPI
from common.utility.logger_utility import LoggerUtility
from services.audio.audio_controller import AudioController
from services.audio.implementation.audio_service import AudioService
from services.health.health_controller import HealthController
def create_app() -> FastAPI:
routes: List[APIRouter] = [
HealthController(),
AudioController()
]
app = FastAPI()
for route in routes:
app.include_router(route)
return app
def parse_argument():
parser = argparse.ArgumentParser(description="Ichigo-wrapper Application")
parser.add_argument('--log_path', type=str,
default='Ichigo-wrapper.log', help='The log file path')
parser.add_argument('--log_level', type=str, default='INFO',
choices=['DEBUG', 'INFO', 'WARNING', 'ERROR', 'TRACE'], help='The log level')
parser.add_argument('--port', type=int, default=22310,
help='The port to run the Ichigo-wrapper app on')
parser.add_argument('--device_id', type=str, default="0",
help='The port to run the Ichigo-wrapper app on')
parser.add_argument('--package_dir', type=str, default="",
help='The package-dir to be extended to sys.path')
parser.add_argument('--whisper_port', type=int, default=3348,
help='The port of whisper vq model')
parser.add_argument('--ichigo_port', type=int, default=39281,
help='The port of ichigo model')
parser.add_argument('--fish_speech_port', type=int, default=22312,
help='The port of fish speech model')
parser.add_argument('--ichigo_model', type=str, default="ichigo:8b-gguf-q4-km",
help='The ichigo model name')
args = parser.parse_args()
return args
if __name__ == "__main__":
args = parse_argument()
LoggerUtility.init_logger(__name__, args.log_level, args.log_path)
env_path = Path(os.path.dirname(os.path.realpath(__file__))
) / "variables" / ".env"
AudioService.initialize(
args.whisper_port, args.ichigo_port, args.fish_speech_port, args.ichigo_model)
load_dotenv(dotenv_path=env_path)
app: FastAPI = create_app()
print("Server is running at: 0.0.0.0:", args.port)
uvicorn.run(app=app, host="0.0.0.0", port=args.port)

The parse_argument must include parameters to integrate with cortex:

  • port
  • log_path
  • log_level

The python server can also have extra parameters and need to be defined in extra_params part of model.yml When starting server, the parameters will be override by the parameters in model.yml

When finished python code, you need to trigger this CI so that the latest code will be pushed to cortexso huggingface. After pushed to HF, user can download and use it. The CI will clone and checkout approriate branch of your repo and navigate to the correct folder base on input parameters.The CI needs 5 parameters:

  • Path to model directory in github repo: the path to folder contains all model scripts for running python program
  • name of repo to be checked out: name of github repo
  • branch to be checked out: name of branch to be checked out
  • name of huggingface repo to be pushed: name of huggingface repo to be pushed (e.g. cortexso/ichigo-0.5)
  • prefix of hf branch: The prefix of branch name (e.g fp16)

Python venv package​

For packaging python venv, you need to prepare a requirements.txt and a requirements.cuda.txt file in the root of your project. The requirements.txt file should contain all the dependencies for your project, and the requirements.cuda.txt file should contain all the dependencies that require CUDA. The requirements.txt will be used to build venv for MacOS. The requirements.cuda.txt will be used to build venv for Linux and Windows.

After finished you need to trigger this CI. After the CI is finished, the venv for 4 os will be build and pushed to HuggingFace and it can be downloaded and used by users. The CI will clone and checkout approriate branch of your repo and navigate to the correct folder contains requirements.txt base on input parameters.The CI needs 5 parameters:

  • Path to model directory in github repo: the path to folder contains all model scripts for running python program
  • name of repo to be checked out: name of github repo
  • name of model to be release: name of the model that we are building venv for (e.g whispervq)
  • branch to be checked out: name of branch to be checked out
  • name of huggingface repo to be pushed: name of huggingface repo to be pushed (e.g. cortexso/ichigo-0.5)
  • prefix of hf branch: The prefix of branch name (e.g fp16)