Summary

IFLYTEK is a well-deserved leader in artificial intelligence in China, maintains international advanced technology level in the fields of speech and language, natural language understanding, machine learning reasoning, and autonomous learning. The IT infrastructure team of IFLYTEK needs to provide stable and high-performance training storage platforms for various AI teams and business units, and meanwhile manages nearly one thousand high-performance GPU servers. The performance of storage platform used for training directly affects training efficiency of business units.

Challenge

In order to meet the training requirements of different AI business units, the platform for training support have the following characteristics:

  • High-bandwidth, low-latency read and write performance to ensure efficient usage of GPU.
  • Billions of small files and large files mixed read and write scenarios.
  • Thousands of high-performance computing nodes concurrent access support.
  • Provide seamless data access capabilities for containerized training tasks.

Results

Unlimited scalability

The data capacity has reached nearly 10PB in a few months, storing 14 billion of audios, videos, and pictures used for training practice. And the peak bandwidth of a single cluster has reached 10GB/s.

Reduce training time

Compared with other storages, YRCloudFile's high bandwidth and low latency can saturate computing servers such as GPUs, reducing a single training time from one week to two days.

Improve training accuracy

As the single training time is shortened, it becomes possible for algorithm engineers to perform more iterations on the model. The more iterations be done, supplemented by algorithm optimization, the better training accuracy of IFLYTEK can be achieved.

Providing Outstanding Next-generation Cloud Storage and Comprehensive Premium ServicesProviding Outstanding Next-generation
Cloud Storage