• bitcoinBitcoin (BTC) $ 81,504.00
  • ethereumEthereum (ETH) $ 2,392.86
  • tetherTether (USDT) $ 0.999913
  • xrpXRP (XRP) $ 1.42
  • bnbBNB (BNB) $ 633.05
  • usd-coinUSDC (USDC) $ 0.999880
  • solanaSolana (SOL) $ 85.87
  • tronTRON (TRX) $ 0.338661
  • staked-etherLido Staked Ether (STETH) $ 2,265.05
  • figure-helocFigure Heloc (FIGR_HELOC) $ 1.03
  • dogecoinDogecoin (DOGE) $ 0.112345
  • whitebitWhiteBIT Coin (WBT) $ 60.50
  • usdsUSDS (USDS) $ 0.999900
  • hyperliquidHyperliquid (HYPE) $ 43.85
  • cardanoCardano (ADA) $ 0.259033
  • wrapped-stethWrapped stETH (WSTETH) $ 2,779.67
  • leo-tokenLEO Token (LEO) $ 10.32
  • bitcoin-cashBitcoin Cash (BCH) $ 460.65
  • wrapped-bitcoinWrapped Bitcoin (WBTC) $ 76,243.00
  • moneroMonero (XMR) $ 413.22
  • binance-bridged-usdt-bnb-smart-chainBinance Bridged USDT (BNB Smart Chain) (BSC-USD) $ 0.998762
  • zcashZcash (ZEC) $ 429.08
  • wrapped-beacon-ethWrapped Beacon ETH (WBETH) $ 2,466.93
  • chainlinkChainlink (LINK) $ 9.76
  • canton-networkCanton (CC) $ 0.150238
  • stellarStellar (XLM) $ 0.160762
  • wrapped-eethWrapped eETH (WEETH) $ 2,465.31
  • the-open-networkToncoin (TON) $ 1.84
  • usd1-wlfiUSD1 (USD1) $ 0.999882
  • susdssUSDS (SUSDS) $ 1.08
  • daiDai (DAI) $ 0.999581
  • memecoreMemeCore (M) $ 3.42
  • litecoinLitecoin (LTC) $ 55.80
  • coinbase-wrapped-btcCoinbase Wrapped BTC (CBBTC) $ 76,366.00
  • avalanche-2Avalanche (AVAX) $ 9.45
  • hedera-hashgraphHedera (HBAR) $ 0.090797
  • suiSui (SUI) $ 0.979678
  • wethWETH (WETH) $ 2,268.37
  • ethena-usdeEthena USDe (USDE) $ 0.999326
  • shiba-inuShiba Inu (SHIB) $ 0.000006
  • rainRain (RAIN) $ 0.007501
  • usdt0USDT0 (USDT0) $ 0.998824
  • paypal-usdPayPal USD (PYUSD) $ 1.00
  • crypto-com-chainCronos (CRO) $ 0.069668
  • hashnote-usycCircle USYC (USYC) $ 1.12
  • bittensorBittensor (TAO) $ 287.26
  • tether-goldTether Gold (XAUT) $ 4,572.56
  • global-dollarGlobal Dollar (USDG) $ 0.999953
  • blackrock-usd-institutional-digital-liquidity-fundBlackRock USD Institutional Digital Liquidity Fund (BUIDL) $ 1.00
  • ethena-staked-usdeEthena Staked USDe (SUSDE) $ 1.22
  • pax-goldPAX Gold (PAXG) $ 4,573.06
  • polkadotPolkadot (DOT) $ 1.28
  • uniswapUniswap (UNI) $ 3.38
  • mantleMantle (MNT) $ 0.644948
  • world-liberty-financialWorld Liberty Financial (WLFI) $ 0.064535
  • skySky (SKY) $ 0.082185
  • pi-networkPi Network (PI) $ 0.179355
  • okbOKB (OKB) $ 86.03
  • falcon-financeFalcon USD (USDF) $ 0.997853
  • aster-2Aster (ASTER) $ 0.679969
  • little-pepe-5Little Pepe (LILPEPE) $ 2.16
  • htx-daoHTX DAO (HTX) $ 0.000002
  • pepePepe (PEPE) $ 0.000004
  • syrupusdcsyrupUSDC (SYRUPUSDC) $ 1.15
  • nearNEAR Protocol (NEAR) $ 1.29
  • ondo-financeOndo (ONDO) $ 0.320946
  • ripple-usdRipple USD (RLUSD) $ 0.999974
  • usddUSDD (USDD) $ 1.00
  • bitget-tokenBitget Token (BGB) $ 2.07
  • aaveAave (AAVE) $ 93.95
  • internet-computerInternet Computer (ICP) $ 2.55
  • ethereum-classicEthereum Classic (ETC) $ 8.95
  • morphoMorpho (MORPHO) $ 2.29
  • ondo-us-dollar-yieldOndo US Dollar Yield (USDY) $ 1.13
  • bfusdBFUSD (BFUSD) $ 0.999300
  • janus-henderson-anemoy-treasury-fundJanus Henderson Anemoy Treasury Fund (JTRSY) $ 1.10
  • kucoin-sharesKuCoin (KCS) $ 8.56
  • algorandAlgorand (ALGO) $ 0.120003
  • jupiter-perpetuals-liquidity-provider-tokenJupiter Perpetuals Liquidity Provider Token (JLP) $ 4.00
  • polygon-ecosystem-tokenPOL (ex-MATIC) (POL) $ 0.098671
  • united-stablesUnited Stables (U) $ 1.00
  • superstate-short-duration-us-government-securities-fund-ustbSuperstate Short Duration U.S. Government Securities Fund (USTB) (USTB) $ 11.07
  • quant-networkQuant (QNT) $ 68.03
  • ethenaEthena (ENA) $ 0.108655
  • jito-staked-solJito Staked SOL (JITOSOL) $ 124.46
  • eutblSpiko EU T-Bills Money Market Fund (EUTBL) $ 1.23
  • render-tokenRender (RENDER) $ 1.87
  • kelp-dao-restaked-ethKelp DAO Restaked ETH (RSETH) $ 2,404.69
  • blockchain-capitalBlockchain Capital (BCAP) $ 105.77
  • cosmosCosmos Hub (ATOM) $ 1.90
  • binance-peg-wethBinance-Peg WETH (WETH) $ 2,262.26
  • kaspaKaspa (KAS) $ 0.034258
  • rocket-pool-ethRocket Pool ETH (RETH) $ 2,631.35
  • nexoNEXO (NEXO) $ 0.903456
  • gatechain-tokenGate (GT) $ 7.30
  • binance-bridged-usdc-bnb-smart-chainBinance Bridged USDC (BNB Smart Chain) (USDC) $ 0.999945
  • worldcoin-wldWorldcoin (WLD) $ 0.244701
  • aptosAptos (APT) $ 0.990710
  • wbnbWrapped BNB (WBNB) $ 759.61
  • stable-2​​Stable (STABLE) $ 0.034025
  • ignition-fbtcFunction FBTC (FBTC) $ 76,389.00
  • filecoinFilecoin (FIL) $ 0.957094
  • arbitrumArbitrum (ARB) $ 0.119540
  • justJUST (JST) $ 0.084743
  • pudgy-penguinsPudgy Penguins (PENGU) $ 0.011548
  • pump-funPump.fun (PUMP) $ 0.001850
  • syrupusdtsyrupUSDT (SYRUPUSDT) $ 1.11
  • flare-networksFlare (FLR) $ 0.007631
  • jupiter-exchange-solanaJupiter (JUP) $ 0.189471
  • vechainVeChain (VET) $ 0.007335
  • binance-staked-solBinance Staked SOL (BNSOL) $ 108.24
  • skyaiSkyAI (SKYAI) $ 0.624584
  • beldexBeldex (BDX) $ 0.079912
  • ousgOUSG (OUSG) $ 115.13
  • xdce-crowd-saleXDC Network (XDC) $ 0.029448
  • ghoGHO (GHO) $ 0.999866
  • usdtbUSDtb (USDTB) $ 0.999974
  • new-x-ceo-is-backNEW X CEO IS BACK (XFLOKI) $ 0.506041
  • bridged-usdc-polygon-pos-bridgePolygon Bridged USDC (Polygon PoS) (USDC.E) $ 0.999720
  • dashDash (DASH) $ 45.31
  • solv-btcSolv Protocol BTC (SOLVBTC) $ 76,461.00
  • bonkBonk (BONK) $ 0.000007
  • lombard-staked-btcLombard Staked BTC (LBTC) $ 76,491.00
  • terra-lunaTerra Luna Classic (LUNC) $ 0.000101
  • usual-usdUsual USD (USD0) $ 0.998398
  • hash-2Provenance Blockchain (HASH) $ 0.010478
  • clbtcclBTC (CLBTC) $ 76,920.00
  • official-trumpOfficial Trump (TRUMP) $ 2.36
  • siren-2Siren (SIREN) $ 0.752678
  • yldsYLDS (YLDS) $ 0.999992
  • midnight-3Midnight (NIGHT) $ 0.031098
  • virtual-protocolVirtuals Protocol (VIRTUAL) $ 0.759493
  • pancakeswap-tokenPancakeSwap (CAKE) $ 1.52
  • stakewise-v3-osethStakeWise Staked ETH (OSETH) $ 2,419.84
  • true-usdTrueUSD (TUSD) $ 0.998431
  • a7a5A7A5 (A7A5) $ 0.012112
  • kinetic-staked-hypeKinetiq Staked HYPE (KHYPE) $ 33.97
  • megausdMegaUSD (USDM) $ 1.00
  • tbtctBTC (TBTC) $ 70,942.00
  • dexeDeXe (DEXE) $ 10.31
  • wrappedm-by-m0WrappedM by M0 (WM) $ 1.00
  • fetch-aiArtificial Superintelligence Alliance (FET) $ 0.212615
  • edgexedgeX (EDGE) $ 1.32
  • venice-tokenVenice Token (VVV) $ 9.60
  • chilizChiliz (CHZ) $ 0.042242
  • euro-coinEURC (EURC) $ 1.17
  • aerodrome-financeAerodrome Finance (AERO) $ 0.459187
  • c8ntinuumc8ntinuum (CTM) $ 0.087592
  • blockstackStacks (STX) $ 0.231611
  • mantle-staked-etherMantle Staked Ether (METH) $ 2,455.82
  • adi-tokenADI (ADI) $ 4.02
  • sei-networkSei (SEI) $ 0.060146
  • polygon-pos-bridged-dai-polygon-posPolygon PoS Bridged DAI (Polygon POS) (DAI) $ 0.999983
  • tezosTezos (XTZ) $ 0.372656
  • janus-henderson-anemoy-aaa-clo-fundJanus Henderson Anemoy AAA CLO Fund (JAAA) $ 1.03
  • resolv-wstusrResolv wstUSR (WSTUSR) $ 1.13
  • cocaCOCA (COCA) $ 1.30
  • first-digital-usdFirst Digital USD (FDUSD) $ 0.999211
  • bianrensheng币安人生 (BinanceLife) (币安人生) $ 0.392543
  • spx6900SPX6900 (SPX) $ 0.410832
  • doge-strategyDoge Strategy (DOGESTR) $ 0.288297
  • liquid-staked-ethereumLiquid Staked ETH (LSETH) $ 2,406.26
  • injective-protocolInjective (INJ) $ 3.79
  • arbitrum-bridged-wbtc-arbitrum-oneArbitrum Bridged WBTC (Arbitrum One) (WBTC) $ 76,200.00
  • usxUSX (USX) $ 0.999630
  • monadMonad (MON) $ 0.031936
  • sun-tokenSun Token (SUN) $ 0.019229
  • curve-dao-tokenCurve DAO (CRV) $ 0.242167
  • wrapped-flareWrapped Flare (WFLR) $ 0.009961
  • layerzeroLayerZero (ZRO) $ 1.45
  • humanityHumanity (H) $ 0.198409
  • l2-standard-bridged-weth-baseL2 Standard Bridged WETH (Base) (WETH) $ 2,266.86
  • ether-fiEther.fi (ETHFI) $ 0.429157
  • build-onBUILDon (B) $ 0.366140
  • steakhouse-usdc-morpho-vaultSteakhouse USDC Morpho Vault (STEAKUSDC) $ 1.12
  • gnosisGnosis (GNO) $ 134.15
  • kinesis-goldKinesis Gold (KAU) $ 146.79
  • decredDecred (DCR) $ 19.56
  • unibaseUnibase (UB) $ 0.138332
  • binance-peg-xrpBinance-Peg XRP (XRP) $ 1.59
  • celestiaCelestia (TIA) $ 0.369358
  • ether-fi-liquid-ethEther.Fi Liquid ETH (LIQUIDETH) $ 2,443.47
  • zebec-networkZebec Network (ZBCN) $ 0.003442
  • renzo-restaked-ethRenzo Restaked ETH (EZETH) $ 2,421.84
  • hastra-primePRIME (PRIME) $ 1.04
  • bitcoin-svBitcoin SV (BSV) $ 16.42
  • noonNoon (NOON) $ 0.751949
  • sbtc-2sBTC (SBTC) $ 77,039.00
  • pendlePendle (PENDLE) $ 1.92
  • conflux-tokenConflux (CFX) $ 0.061922
  • flokiFLOKI (FLOKI) $ 0.000033
  • jupiter-staked-solJupiter Staked SOL (JUPSOL) $ 115.56
  • bittorrentBitTorrent (BTT) $ 0.00000032
  • savings-usddSavings USDD (SUSDD) $ 1.03
  • lido-daoLido DAO (LDO) $ 0.375348
  • usdgoUSDGO (USDGO) $ 1.00
  • apenftAINFT (NFT) $ 0.00000032
  • msolMarinade Staked SOL (MSOL) $ 133.18
  • doublezeroDoubleZero (2Z) $ 0.090147
  • arbitrum-bridged-weth-arbitrum-oneArbitrum Bridged WETH (Arbitrum One) (WETH) $ 2,265.06

US Government Says China’s Best AI Models Lag Behind. Experts Aren’t So Sure

0 0


In brief

  • CAISI’s evaluation ranked DeepSeek V4 Pro eight months behind the U.S. frontier, using an IRT-based scoring system across nine benchmarks including two private, unverifiable datasets.
  • The cost comparison excluded all U.S. models deemed too expensive or too weak—leaving only GPT-5.4 mini, against which DeepSeek was still cheaper on five out of seven benchmarks.
  • Stanford’s 2026 AI Index found the U.S.-China performance gap on public leaderboards had collapsed to 2.7%.

A U.S. government institute published its verdict on China’s most powerful AI: eight months behind, and the more time passes, the wider the gap gets. The internet read the methodology and started asking questions.

CAISI—the Center for AI Standards and Innovation, a unit inside NIST—released its evaluation of DeepSeek V4 Pro on May 1. The conclusion: DeepSeek’s open-weight flagship “lags behind the frontier by about 8 months.”

CAISI also calls it the most capable Chinese AI model it has evaluated to date.

The scoring system

CAISI doesn’t average benchmark scores like most evaluators do. Instead, it applies Item Response Theory—a statistical method from standardized testing—to estimate each model’s latent capability by tracking which problems it solves and which it doesn’t, across nine benchmarks in five domains: cybersecurity, software engineering, natural sciences, abstract reasoning, and math.

The IRT-estimated Elo scores: GPT-5.5 at 1,260 points, Anthropic’s Claude Opus 4.6 at 999. DeepSeek V4 Pro scores around 800 (±28), which is very close to GPT-5.4 mini at 749. In CAISI’s system, DeepSeek sits closer to the old generation of GPT mini than to Opus.

The points system in benchmarks score models the way standardized tests score students—not by raw percentage correct, but by weighting which problems they solve and which they miss, producing a points estimate that only means something relative to other models in the same evaluation. The more points, the better the model is in general terms, with the best model’s score becoming the reference point to see how capable a model is.

It’s impossible to reproduce CAISI’s results because two of the nine benchmarks are non-public, and in those two benchmarks is where the gap is widest. For example, GPT-5.5 scored 71% on CTF-Archive-Diamond, one of CAISI’s cybersecurity tests with DeepSeek registering around 32%.

On public benchmarks, the picture shifts. GPQA-Diamond—PhD-level science reasoning, scored as percentage correct—placed DeepSeek at 90%, one point behind Opus 4.6’s 91%. Math olympiad benchmarks (OTIS-AIME-2025, PUMaC 2024, SMT 2025) put DeepSeek at 97%, 96%, and 96%. On SWE-Bench Verified—real GitHub bug fixes, scored as percentage resolved—DeepSeek scored 74% to GPT-5.5’s 81%. DeepSeek’s own technical report claims V4 Pro matches Opus 4.6 and GPT-5.4.

For cost comparison, CAISI filtered out any U.S, model that performed significantly worse or cost significantly more per token than DeepSeek. Only one model cleared the bar: GPT-5.4 mini. That’s the entire U.S. frontier, filtered to a single entry.

DeepSeek came out cheaper on 5 of 7 benchmarks even beating OpenAI’s tiniest and least capable AI model.

The counterargument: Is the gap bigger or smaller?

Criticizing CAISI’s methodology doesn’t fully vindicate DeepSeek. The AI developer under the pseudonym Ex0bit pushed back directly: “There’s no ‘gap’, and no one’s 8 months behind. We’ve been trolled on every closed U.S drop and flexed on with open weights.”

There’s no ‘gap’, and no one’s 8 months behind. We’ve been trolled on every closed U.S drop and flexed on with open weights. pic.twitter.com/kl0kAecmyO

— Eric (@Ex0byt) May 2, 2026

The Artificial Analysis Intelligence Index v4.0—a rating system tracking frontier model intelligence across 10 evaluations—shows OpenAI near 60 points and DeepSeek in the low 50s as of May 2026, compressed far tighter than a year ago.

Based on standardized benchmarks, their methodology shows the gap is actually getting smaller.

When DeepSeek first emerged in January 2025, the question was whether China had already caught up. U.S. labs scrambled to respond. Stanford’s 2026 AI Index—released April 13—reports the Arena leaderboard gap between Claude Opus 4.6 and China’s Dola-Seed-2.0 Preview is shrinking, separated now by only 2.7%.

CAISI plans to release a fuller IRT methodology write up in the near future.





Source link

Leave A Reply

Your email address will not be published.