Design of MKV player based on SMP8654 platform

MKV is a new multimedia packaging format that supports multiple video and audio encoding formats. It can encapsulate up to 16 channels of audio in different formats and subtitle streams in different languages ​​into one file, and is widely used in high-definition movies. , More and more videos and movies use MKV as their packaging format. The ability to support the MKV packaging format is an important indicator of the performance of high-definition players. This paper proposes a design and implementation of MKV player based on the SMP8654 platform, and further optimizes the characteristics of embedded systems and high-definition media to provide smooth playback of MKV files.

1 MKV package format

MKV is called Matroska Video, which is a new multimedia packaging format. Multimedia packaging format is also called multimedia container (MulTImedia Container), it is different from H264, MPEG-2, MPEG-4 and other encoding formats, it only provides a "shell" for multimedia encoding, itself does not involve encoding. MKV is a standard developed by the open source organization Matroska Development Team. It includes three parts: MKV (Matroska Video), MKA (Matroska Audio), and MKS (Matroska SubTItles). The latter two formats are for audio and subtitles, respectively, and have few applications. The purpose of MKV is to replace traditional packaging formats such as AVI. AVI is a package format introduced by Microsoft in 1992. The meaning is Audio Video InteracTIve, which is to store video and audio code together. Microsoft introduced the improved version AVl2.0 of AVI in 1996. There are many restrictions on the AVI format. There can only be one video track and one audio track, and there can be some additional tracks, such as text. The AVI format does not provide any control functions. In general, the traditional packaging format mainly based on AVI is outdated and can only contain a few audio and video formats, and it is not open enough and has poor scalability. Because of this, it has led to the birth of new multimedia packaging formats like Matroska.

Compared with the traditional packaging format, MKV has the following advantages: support variable bit rate (VBR), support error detection and soft subtitle repair, support streaming, strong openness and cross-platform compatibility, support more than 16 channels of audio Stream and subtitle stream, etc. The biggest feature of Matroska is that it can accommodate almost all types of video, audio and subtitle streams. In addition to H.264, it can also include MPEG4, MPEG2, Ac3, AAC and other video and audio formats, even the very closed RealMedia and QuicklTIme. It was included in it, and their audio and video were reorganized to achieve better results.

Because the MKV packaging format itself has many advantages, with the popularity of the Internet and high-definition movies, the MKV format has been widely used, and more and more high-definition movies on the Internet adopt the MKV format. However, MKV is a standard developed and promoted by open source organizations, lacking the support of large commercial companies, resulting in the lack of a complete and effective design implementation of MKV file playback. This problem is particularly severe on embedded platforms with limited performance and resources. Although many high-definition players currently provide support for the MKV format, most of the implementation solutions have problems with imperfect support and low playback efficiency. When playing high-bitrate movies, there will be no smoothness, picture freeze, etc. Problems, affecting the viewing effect. This paper proposes a design and implementation scheme of an MKV player based on the SMP8654 platform, and optimizes the characteristics of the embedded system to better meet the requirements for smooth playback of MKV files.

2 Overall design of hardware platform and software

The hardware platform takes the SMP8654 chip as the core and connects to peripheral devices such as RAM, SATA hard disk, Flash memory, and input and output devices through the bus. SMP8654 is a multimedia playback SoC solution launched by Sigma Design. It integrates a powerful multimedia processor, a robust content assurance system, a new DDR2 memory controller, multiple on-chip CPUs, and a complete system peripheral interface. From the perspective of media playback, smp8654 provides an advanced decoding engine that fully supports high-definition video decoding, and can support hardware decoding of MPEG1, MPEG-2, MPEG-4, H.264, WMV9, VCl, and AVS formats. Support high-performance graphics acceleration, support multi-standard audio decoding and advanced display processing capabilities. In order to facilitate the development of applications by third-party manufacturers, Siena Design provides development kits and development frameworks related to chips. The work of this article is also based on this framework for secondary development. In terms of software platforms, since a perfect playback system is already quite complex, it is not suitable for directly manipulating the underlying hardware to complete functions, and requires the support of the operating system. In this project, the operating system is uclinux, and the file system is Romafs. uclinux is a Linux customized for embedded systems. It has the main advantages of the stability of the standard Linux operating system and powerful network functions, but it is not as complicated as the standard Li-nux. It is mainly aimed at the absence of MMU (Memory Management Unit) Microcontroller. Romfs is a file system specially designed for embedded systems. It has small size, good reliability, and fast reading speed. It is a file system commonly used in embedded systems.


The playback process of media files generally includes the following steps: system initialization, judging file types, file parsing, setting hardware decoders, and audio and video decoding, where file parsing and audio and video decoding are key parts. Since SMP8654 integrates a perfect audio and video hardware decoder, the decoding work is mainly done by hardware. We only need to send the audio and video data to the corresponding decoding buffer as required. The overall software architecture is shown in Figure 2.

3 System key technology design and implementation

3.1 MKV file analysis

MKV file parsing is mainly to parse the various components of the MKV format to obtain the necessary audio and video parameters and media data. MKV is an encapsulation format. The actual video and audio data are encapsulated in a certain sub-module. To obtain the actual data, the file must first be parsed, and the file analysis runs through the entire process of playback. Whether it can be effectively and correctly parsed is related to the accuracy of reading data, which in turn affects the playback effect. The MKV format uses variable-length encoding, which can reduce storage space. On the other hand, it also brings new problems to the analysis.

The MKV file format is based on EBML (Extensible Binary MetaLanguagel). EBML is an extensible binary meta-language similar to the XML format. It uses variable-length integer storage to save space. The basic structure of EBML is a typical TLV structure. There are three parts:

The ID attribute type, size is the size of the following data part, and the data part is the actual data of the attribute identified by ID. Both ID and size are variable-length coded integers. The length of the integer is length = "1" + [number of leADIng zero bits]. The number of leading zeros is at most 7, that is, it can represent an integer of up to 56 bits. Numbers larger than 56 bits are not allowed in the file.

The salient feature of the MKV file format is modular, structured storage. Each higher-level element consists of several sub-level elements, up to the most basic element, each element is a TLV structure. A standard MKV file consists of two parts: EBML Header and Segment. EBML Header is composed of sub-elements such as EBMLVersion and DocType, and contains relevant information such as file version and document type. The Segment part saves the actual data of the video and audio of the media file, and its data part can be divided into several sub-elements such as SeekHead, Tracks, and Cluster (Table 1). All elements can be processed in accordance with a unified process. We can imitate the idea of ​​TCP / IP protocol layering, for each function of each layer is completed with a function, use a lower layer function to complete this function, and can be called by a higher layer function. When parsing a file, starting from the top level of the file, whenever the element at the upper level resolves to a certain sub-element, this function is called to perform the next level of parsing, and the parsing of the file can be completed until the end of the file. The entire MKV parsing and calling process is shown in Figure 3. Hea-der Parse and Segment Parse are the element parsing functions of the top layer of the file, and Cluster Parse, Tracks_Parse, etc. are the parsing functions of the next-level constituent elements. ebml_read_ele-ment_idebml read element length is the analytic function of the bottommost basic element.

3.2 Set hardware decoder audio and video core parameters

Tracks are used to describe the information of each multimedia stream contained in the file. One multimedia stream is described by one TrackEntry, and all tracks must be described in one Tracks. A TrackEntry mainly includes: TrackNumber (ID to determine which stream belongs to), TrackType (video, audio or subTItle), TimeScale (timestamp unit), CodecID (encoding format); CodecPrivate (private data required by different encoding formats) For video, the following information is also included: PixelWidth, PixelHeight, etc. For audio, track also contains the following information: channels, Sampling Frequency, etc. These are the key parameters about whether the audio and video can be correctly decoded and played. They need to be obtained during analysis, and then set by hardware manipulation functions.

Cluster contains the actual data, a Cluster block, usually media data spanning a few seconds, a file has thousands of Cluster. Each Cluster has several BlockGroups. According to the starting pts and duration of Cluster and BlockGroup, the actual pts of the current Block can be calculated. PTS is an important data used to determine the playing time, and is also the key information for audio and video synchronization. This part of the information should be set to the hardware decoder at the same time as the video or audio data is sent.

3.3 Performance optimization

Movies packaged in MKV are usually high-definition movies with a resolution of 1920 × 1080. Even if advanced encoding formats such as H.264 are used, the bit rate is still very high. At the same time, MKV supports variable bit rate. Variable bit rate can reduce the file size, but drastically fluctuating bit rate will make playback not smooth. In high-definition files, the bit rate is generally between 10 and 30M / ps, up to 60Mp / s. With such a high bit rate, if no special processing is performed, it will be prone to problems such as jams and unsmooth playback. To solve this problem, we consider two aspects.

In terms of parsing, the efficiency of parsing is related to whether the data can be read into the buffer as soon as possible. If the processing time is too long, the buffer will be empty for a period of time, and then there will be a stall. MKV files usually contain one channel of video, multiple channels of audio, and multiple channels of subtitles. Only one channel of audio and one channel of subtitles are selected during playback. The data of the other channels can be regarded as invalid data. During parsing, it is possible to determine whether this data is valid data or invalid data currently required for playback according to the mark of the Block header. If it is valid data, continue parsing and send the audio and video data into the buffer. If it is invalid data, do not perform parsing and move the file pointer directly to the next block, which can greatly speed up file parsing and data reading speed.

In terms of playback, the normal processing flow during playback is to read one frame of data, and then send it to the hardware decoder. After receiving the signal that the hardware decoder is empty, read the data of the next frame. If you are dealing with the playback of files with low bitrates, there will be no problem in doing so, but when the file resolution is higher and the bitrate is higher, the time for parsing and reading and decoding time will increase. To solve this problem, we have designed a buffer FIFO in memory, which is equivalent to a sliding window (Figure 4). The buffer can store several frames (a frame is a block, and the number of buffers stored depends on the size of the frame. Wait). When the buffer is not full, read a block in the file and parse it, then send the actual data to the end of the buffer. When it is found that the hardware buffer is free, the data of the head of the FIFO is directly sent from the memory to the hardware buffer, and there is no need to read the file. Because there are multiple frames in the buffer, a certain buffer can be provided, so that data can still be provided in time when the bit rate fluctuates, to avoid problems such as jams caused by the empty hardware buffer and unsmooth playback.

4 Conclusion

This article details the characteristics of the MKV package format. And based on SMP8654, it proposes a design and implementation scheme of MKV player. It has been verified that it can achieve smooth playback of high-definition MKV files and has been actually applied to products. Next, we will do further research. Based on the MKV player, design a general media player framework for multiple packaging formats, integrate other formats such as FLV and FLAC, and provide better scalability. Facilitate the subsequent expansion of other packaging formats.

Absolute Encoder

Absolute rotary Encoder measure actual position by generating unique digital codes or bits (instead of pulses) that represent the encoder`s actual position. Single turn absolute encoders output codes that are repeated every full revolution and do not output data to indicate how many revolutions have been made. Multi-turn absolute encoders output a unique code for each shaft position through every rotation, up to 4096 revolutions. Unlike incremental encoders, absolute encoders will retain correct position even if power fails without homing at startup.

Absolute Encoder,Through Hollow Encoder,Absolute Encoder 13 Bit,14 Bit Optical Rotary Encoder

Jilin Lander Intelligent Technology Co., Ltd , https://www.landerintelligent.com

Posted on