The sectors that Yann LeCun and Zhu Xiaohu are not optimistic about are quietly making money overseas.
Text by | Zhou Xinyu
Edited by | Su Jianxun
Source | AI Emergence (ID: AIEmergence)
Cover image source | Company official website "The belief that video models are unprofitable is a collective misjudgment by investors."
Since 2025, the widely circulated AI wealth - creation stories have generally existed in two sectors: Agents represented by Manus and AI hardware represented by Plaud.
However, beyond these appealing AI application stories of Agents and hardware, an old and once - cooled sector - video generation models - is now propelling a group of domestic AI companies to take off:
According to the monitoring of Feifan Data, in June 2025, the ARR (Annual Recurring Revenue) of Kuaishou's Keling AI on both App and Web reached 100 million US dollars. Among startups, the ARR of MiniMax's Conch AI and Vidu of Shengshu Technology on the Web alone has also reached about 10 million US dollars.
Multiple insiders revealed to "AI Emergence" that the actual subscription revenues of these products are even higher.
Even more, the cash flow that large language models have yet to turn positive has first achieved a positive result in the video generation sector.
Feifan Data shows that the monthly turnover of PixVerse, the video generation model under its umbrella, has reached 840,000 US dollars. According to the official statement, the subscription revenue can cover most of the company's costs and expenses, and the cash flow is close to positive.
At the 2025 Beijing AI Innovation Conference, Huang Weilin, the person - in - charge of ByteDance's Seed image and video generation, gave an optimistic prediction: The annualized revenue (ARR) of leading video generation products is expected to reach 100 million US dollars this year and may grow to 500 million to 1 billion US dollars next year.
But just a year ago, Sora - like video generation models were unpopular in China. The reason is that large - scale video models are too expensive, and the returns are uncertain, making it unaffordable for ordinary companies.
Tencent Technology reported that Wang Changhu, the former person - in - charge of ByteDance's visual technology, was dissuaded by Zhu Xiaohu of Jinshajiang Venture Capital when he founded Aishi Technology: "You'd better go back to work. There is no chance for large models in China." In September 2024, when MiniMax launched its video generation model, Conch AI, caught between Kuaishou and ByteDance, it was once bearish by the market.
Even Baidu, a representative of big - company players, judged in the Q3 director's meeting in 2024 that video models are "unprofitable": "The investment cycle for video generation like Sora is too long. It may take 10 or 20 years to get business returns." Yann LeCun, the chief scientist of Meta AI, also criticized the limitations of video models in understanding the physical world from a technical perspective.
An investor who once gave up investing in Wang Changhu gave a judgment that represented the consensus at that time: The ROI of video models cannot turn positive in the short term, and startups will be eliminated by 2 - 3 large companies, just like in the language model sector.
Indeed, in 2024, many domestic video startups were on the verge of collapse: they had difficulty in raising funds and couldn't find PMF. For example, Luying Technology, an AI video startup that received investments from Redpoint China and BlueRun Ventures, was acquired in December 2024.
However, in less than a year, the ARR of Aishi Technology made the above - mentioned investor change his view. He told "AI Emergence" that he "regretted it deeply": "The belief that video models are unprofitable is a collective misjudgment by investors."
The wealth - creation experiences of these Chinese AI video companies, which have "slapped the face" of public opinion with real money, can be summarized as the result of the combined effect of three elements: sector, market, and marketing.
Let's start with the sector.
Facts have proven that even though video generation technology is in an earlier stage than language technology, consumers are more inclusive. The reason is that the video generation field is a sector where demand is driven by aesthetics.
"Different data strategies of each enterprise, even less - mature technology and biases in training data, will lead to different video generation styles." An investor told us, "Video creation is a market with diverse aesthetics, and each video model has its own consumers."
For example, many users have found that Kuaishou's Keling AI is very good at generating shots related to food and food - live - streaming. This is also considered to be related to the rich food - live - streaming video resources on Kuaishou's short - video platform.
As for the market - going global has long been a well - known consensus in the AI industry, especially the European and American markets where users have stronger payment capabilities and higher acceptance of new products.
For example, MiniMax's Conch AI was resisted by creators when it launched a subscription - based payment system in China, but it previously obtained six times the number of users in overseas markets compared to the domestic market and achieved an ARR of tens of millions of US dollars.
However, in addition to the inherent advantages of the overseas market, the "cost - performance" niche occupied by domestic video models overseas is also worth noting.
Many industry insiders believe that the relatively limited funds and computing power have forced domestic AI video startups to spend a lot of effort on cost optimization, which has instead given them a price advantage when going global.
For example, when generating videos of the same length and resolution, the costs of models from startups such as Conch AI and Vidu are only 1/10 - 1/6 of Sora's.
Finally, let's look at marketing.
It can be found that video social media such as TikTok and YouTube play a crucial role in the growth strategies of AI video companies.
An employee of Aishi Technology once told "AI Emergence" that an important turning point for the accelerated growth of its video model PixVerse was at the end of 2024, when the total exposure of the venom special effect on short - video platforms such as TikTok and Douyin exceeded 100 million times. An investor also mentioned that Pika's "Pinch" special effect and Conch AI's "Half - cat" were the key drivers of growth.
Pika's Pinch special effect. Image source: Pika's official Xiaohongshu
"For current model companies, ranking high on technology lists is no longer sufficient to drive growth." The above - mentioned investor summarized, "Everyone should actively find or even create scalable demands."
In the video creation scenario, in addition to improving productivity, creators also need incentives such as getting traffic. "The popular gameplay created by AI video companies actually meets the creators' need for incentives."
Good news for entrepreneurs is that according to the ranking released by a16z, in January 2025, the user visits of Conch AI (ranked 12th) exceeded those of OpenAI's Sora (ranked 23rd) and Kuaishou's Keling AI (ranked 20th).
This means that unlike the language model sector with an oligopoly, the pattern of the video generation sector is still undetermined, and startups still have plenty of opportunities.
Wang Changhu once told "AI Emergence" that although the video generation sector is developing rapidly, it is currently at a stage between GPT - 2 and GPT - 3. There are still many technical difficulties to be overcome at this stage, which will be opportunities for startups.
However, while replicating the self - sustaining business model, it should also be realized that the entry bonus period of the video generation sector is gradually fading. Moreover, the pressure on existing video model companies to stay in the game will increase.
In March 2024, Wang Changhu judged in a media interview that it would be difficult for startups entering the market later to have a chance. The reason is: "If a startup fails to raise enough funds, accumulate users, teams, and technology in the first - stage development, it may not have enough resources to stay in the subsequent competition."
An AI investor also corroborated this point from the side. She told "AI Emergence" that even though the pattern in the video generation field is not yet set, it is difficult to allocate investments to new entrants, "unless a company becomes a dark horse like DeepSeek."
She also pointed out that the amount of financing that video companies can obtain is an order of magnitude less than that of language model companies. "As time goes by, the resource disadvantages of startups will become more prominent." She summarized, "Keling AI and Jimeng AI have an advantage in continuous technological iteration."
This cruel reality also forces current AI video startups in the game to accelerate their self - sustaining pace.
This article is from the WeChat official account "36Kr", written by Zhou Xinyu, and is published with authorization.
