Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
Zhipu GLM-5V-Turbo Technical Report: Design2Code super Claude Opus4.6, directly generate code from the screenshot
According to Beating Monitoring, Zhipu AI released the GLM-5V-Turbo technical report. The model was launched on Z.ai API and OpenRouter in early April; this release is a methodological disclosure supplement, and the model has not been open-sourced. GLM-5V-Turbo is Zhipu’s first multimodal programming foundation model, supporting around 200K context length, and can connect with agent frameworks such as Claude Code and OpenClaw. Unlike most approaches that treat vision as an attachment to language models, this model integrates visual perception into the entire reasoning, planning, tool invocation, and execution process from the pre-training stage.
The model architecture has three key design elements. First is the new visual encoder CogViT, which is pre-trained through dual teacher distillation using SigLIP2 and DINOv3, then aligned with contrastive learning on 8 billion bilingual Chinese-English image-text data. Second is multimodal multi-token prediction (MMTP), which replaces direct visual embedding transmission with a shared learnable <|image|> special token, reducing communication complexity across pipeline stages and making training more stable. Third is joint reinforcement learning over more than 30 tasks, covering perception, reasoning, and agent execution at three levels.
The improvements during the RL phase are widely distributed: 2D image localization +4.8%, video understanding +5.6%, 3D localization +7.7%, OCR +4.2%, chart understanding +7.7%, GUI agent (OSWorld) +4.9%, multimodal search tool invocation +3.5%. The team notes in the paper that multi-task RL differs from the common cross-domain interference seen in SFT, with each capability stably improving together, and reasoning patterns learned in one domain even transferring to others.
Specific benchmark scores: Design2Code 94.8, surpassing Claude Opus by 4.6; OSWorld 62.3, AndroidWorld 75.7; multimodal search MMSearch 72.9, BrowseComp-VL 51.9; pure text programming on CC-Bench-V2 backend (22.8), frontend (68.4), and code repository exploration (72.2) all outperform its pure text foundation GLM-5-Turbo. MMSearch-Plus scored 30.0, nearly 8 times higher than the previous generation GLM-4.6V; the self-developed visual deep search benchmark ImageMining scored 30.7.