
P800 96GB PCIe
VRAM
96ГБ
ARCH
XPU‑P

Manufacturer
Baidu Kunlun is a line of AI accelerators from the Chinese company Baidu, designed for cloud computing, edge devices, and AI tasks including neural network training and inference. Developments are based on the proprietary XPU architecture with a focus on high performance in NLP, computer vision, and multimodal model tasks. Architecture and Specifications Kunlun is built on Samsung Foundry's 14-nm process with 2.5D packaging and HBM2 memory (up to 16 GB, bandwidth 512 GB/s). The base K200 model delivers 256 TOPS in INT8, 64 TOPS in INT16/FP16 at TDP 150 W and PCIe 4.0 x8 interface; it outperforms NVIDIA T4 by 2–3 times in tests like BERT and YOLOv3 in QPS. The architecture includes XPU-SDNN for tensor operations and XPU clusters with SIMD cores for scalar tasks. New Developments In 2025, Baidu announced Kunlun M100 (for MoE model inference, launch in early 2026) and M300 (for training multimodal models, 2027). They support PaddlePaddle, TensorFlow, PyTorch frameworks via graph compiler and XPU C/C++. Tianchi 256 supernodes enable connectivity for up to 256 chips with 4x higher bandwidth than previous versions. Applications and Performance Kunlun chips are optimized for Ernie Bot and real-world tasks (NLP, vision, speech), with software masking differences from NVIDIA GPUs. They are used in Baidu's cloud and show 3x efficiency over FPGA/GPU in inference.
https://www.kunlunxin.com/→Products
3