Service

GitHub - kvcache-ai/ktransformers: A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

kvcache-ai

2025.11.23

·GitHub·by Anonymous

#LLM#Inference#Fine-tuning#Heterogeneous Computing#Optimization

Key Points

1KTransformers is a research project focused on efficient inference and fine-tuning of large language models by leveraging CPU-GPU heterogeneous computing.
2It is structured into two core modules: `kt-kernel` for high-performance inference with CPU optimizations like AMX/AVX and MoE, and `kt-sft` for resource-efficient fine-tuning, including LoRA and LLaMA-Factory integration.
3The framework demonstrates significant performance improvements, such as fine-tuning massive models with reduced GPU memory and achieving high inference throughput through hybrid hardware utilization.

Service

kvcache-ai

2025.11.23

·GitHub·by Anonymous

#LLM#Inference#Fine-tuning#Heterogeneous Computing#Optimization

1KTransformers is a research project focused on efficient inference and fine-tuning of large language models by leveraging CPU-GPU heterogeneous computing.
2It is structured into two core modules: `kt-kernel` for high-performance inference with CPU optimizations like AMX/AVX and MoE, and `kt-sft` for resource-efficient fine-tuning, including LoRA and LLaMA-Factory integration.
3The framework demonstrates significant performance improvements, such as fine-tuning massive models with reduced GPU memory and achieving high inference throughput through hybrid hardware utilization.