This document contains technical documentation for developers.

Playmode Graph

This graph is a representation, how VDR changes the playmode and which commands are called in the device. It is the base graph for the following state diagrams. Simply walk through the graph from top to bottom. Every box start with the VDR command (Play(), Pause(), Forward(), Backward()) and includes the current playmodes the command is executed on.

graph TD
%% Playmodes

    Player --> ToPlayPlay["**Play():**<br>pmPause<br>pmPlay<br>pmSlow(forward)"]
    Player --> ToPlayPause["**Pause():**<br>pmStill<br>pmPause"]
    ToPlayPlay --------> DevicePlay["DevicePlay()"]
    ToPlayPause --------> DevicePlay["DevicePlay()"]

    Player--> ToFreezeForward["**Forward():**<br>pmSlow(forward)"]
    ToFreezeForward --------> DeviceFreeze["DeviceFreeze()"]

    Player--> ToMuteForward["**Forward():**<br>pmStill<br>pmPause"]
    ToMuteForward ------> DeviceMute["DeviceMute()"]

    Player --> ToClearPlay["**Play():**<br>pmStill<br>pmFast(forward)<br>pmFast(backward)<br>pmSlow(backward)"]
    Player --> ToClearPause["**Pause():**<br>pmFast(forward)<br>pmFast(backward)<br>pmSlow(backward)"]
    Player--> ToClearForward["**Forward():**<br>pmPlay<br>pmFast(forward)<br>pmFast(backward)<br>pmSlow(backward)"]
    Player --> ToClearBackward["**Backward():**<br>pmPlay<br>pmFast(forward)<br>pmFast(backward)<br>pmSlowforward<br>pmSlow(backward)"]
    Player --> ToMuteBackward["**Backward():**<br>pmStill<br>pmPause"]

    Player --> ToClearStillPicture["**Goto():**<br>Mark editing<br>(pause after edit)"]
    ToClearStillPicture ---> DeviceClear["DeviceClear()"]
    DeviceClear ---> ClearToStillPicture["**Goto():**<br>Mark editing<br>(pause after edit)"]
    ClearToStillPicture ----> DeviceStillPicture["DeviceStillPicture()"]

    Player --> ToClearStillPicture2["**Goto():**<br>Mark editing<br>(play after edit)"]
    ToClearStillPicture2 ---> DeviceClear["DeviceClear()"]
    DeviceClear ---> ClearToPlayStillPicture["**Goto():**<br>Mark editing<br>(play after edit)"]
    ClearToPlayStillPicture ----> DevicePlay["DevicePlay()"]

    Player --> ToClearSkip["**SkipSeconds():**<br>Yellow/Green jumping"]
    ToClearSkip ---> DeviceClear["DeviceClear()"]
    DeviceClear ---> ClearToPlaySkip["**SkipSeconds():**<br>Yellow/Green jumping"]
    ClearToPlaySkip ----> DevicePlay["DevicePlay()"]

    ToClearPlay ---> DeviceClear["DeviceClear()"]
    ToClearPause ---> DeviceClear["DeviceClear()"]
    ToClearForward ---> DeviceClear["DeviceClear()"]
    ToClearBackward ---> DeviceClear["DeviceClear()"]
    ToMuteBackward ------> DeviceMute["DeviceMute()"]

    Player --> ToFreeze["**Pause():**<br>pmPlay<br>pmSlow(forward)"]
    ToFreeze --------> DeviceFreeze["DeviceFreeze()"]

    DeviceClear ---> ClearToPlayPlay["**Play():**<br>pmStill<br>pmFast(forward)<br>pmFast(backward)<br>pmSlow(backward)"]
    DeviceClear ---> ClearToPlayBackward["**Backward():**<br>pmFast(backward)"]
    DeviceClear ---> ClearToPlayForward["**Forward():**<br>pmFast(forward)"]
    ClearToPlayPlay ----> DevicePlay["DevicePlay()"]
    ClearToPlayForward ----> DevicePlay["DevicePlay()"]
    ClearToPlayBackward ----> DevicePlay["DevicePlay()"]

    DeviceClear ---> ClearToMuteForward["**Forward():**<br>pmPlay<br>pmFast(backward)<br>pmSlow(backward)"]
    DeviceClear ---> ClearToMuteBackward["**Backward():**<br>pmPlay<br>pmFast(forward)<br>pmSlow(forward)"]
    ClearToMuteForward --> DeviceMute["DeviceMute()"]
    ClearToMuteBackward --> DeviceMute["DeviceMute()"]

    DeviceMute --> ToTrickSpeedForward["**Forward():**<br>pmStill<br>pmPause<br>pmPlay<br>pmFast(backward)<br>pmSlow(backward)"]
    DeviceMute --> ToTrickSpeedBackward["**Backward():**<br>pmStill<br>pmPause<br>pmPlay<br>pmFast(forward)<br>pmSlow(forward)"]
    ToTrickSpeedForward --> DeviceTrickSpeed["DeviceTrickSpeed"]
    ToTrickSpeedBackward --> DeviceTrickSpeed["DeviceTrickSpeed"]

    DeviceClear ---> ClearToFreezePause["**Pause():**<br>pmFast(forward)<br>pmFast(backward)<br>pmSlow(backward)"]
    DeviceClear ---> ClearToFreezeBackward["**Backward():**<br>pmSlow(backward)"]
    ClearToFreezePause ----> DeviceFreeze["DeviceFreeze()"]
    ClearToFreezeBackward ----> DeviceFreeze["DeviceFreeze()"]

    DevicePlay --> pmPlay["pmPlay"]
    DeviceFreeze --> pmPause["pmPause"]
    DeviceStillPicture --> pmStill["pmStill"]
    DeviceTrickSpeed --> pmTrick["pmFast(forward)<br>pmFast(backward)<br>pmSlow(forward)<br>pmSlow(backward)"]

    %% Styling
    classDef player fill:#90caf9,stroke:#1976d2,stroke-width:2px,color:#000000
    classDef endpoint fill:#e57373,stroke:#d32f2f,stroke-width:3px,color:#000000
    classDef commands fill:#81c784,stroke:#388e3c,stroke-width:2px,color:#000000

    class Player, player
    class DeviceClear,DeviceMute, commands
    class DevicePlay,DeviceFreeze,DeviceTrickSpeed,DeviceStillPicture endpoint

State Diagram

This is the model of the state machine implemented in softhddevice.cpp.

stateDiagram-v2
    [*] --> Detached: Initialize

    Detached --> Stop: AttachEvent

    Stop --> Buffering: PlayEvent
    Stop --> Detached: DetachEvent

    Buffering --> Play: BufferingThresholdReachedEvent

    Play --> TrickSpeed: TrickSpeedEvent
    Play --> Stop: StopEvent

DetachEvent
    Play --> Play: StillPictureEvent

PauseEvent

PlayEvent
    Play --> Buffering: BufferUnderrunEvent

    TrickSpeed --> Buffering: PlayEvent
    TrickSpeed --> Stop: StopEvent

DetachEvent
    TrickSpeed --> TrickSpeed: StillPictureEvent

PauseEvent

TrickSpeedEvent

    classDef stopState fill:#e57373,stroke:#d32f2f,stroke-width:2px,color:#000
    classDef playState fill:#81c784,stroke:#388e3c,stroke-width:2px,color:#000
    classDef trickspeedState fill:#64b5f6,stroke:#1976d2,stroke-width:2px,color:#000
    classDef detachState fill:#fd00fd,stroke:#ca00ca,stroke-width:2px,color:#000
    classDef buffering fill:#fff59d,stroke:#fbc02d,stroke-width:2px,color:#000000

    class Stop stopState
    class Play playState
    class TrickSpeed trickspeedState
    class Detached detachState
    class Buffering buffering

Video Data Flow Call Graph

This section shows the complete data flow of video frames from VDR through the plugin to the display hardware.

Overview

The video pipeline consists of 4 main threads:

🔵**VDR Thread** - Receives video data from VDR
🟣**cDecodingThread** - Decodes video packets using FFmpeg
🟠**cFilterThread** - Applies filters (deinterlacing, scaling)
🟢**cDisplayThread** - Syncs with audio and commits frames to DRM/KMS

🔴 Hardware (DRM/KMS display hardware)
🟡 Buffers (frame buffers and queues)

Detailed Call Graph

graph TD
    %% VDR Thread
    VDR[VDR Core] --> |PES Packet|PlayVideo["cSoftHdDevice::PlayVideo"]
    PlayVideo --> |PES Packet|PushPes["cVideoStream::PushPesPacket"]
    PushPes --> |Reassambled Codec Packet|CreatePkt["CreateAvPacket"]
    CreatePkt --> |AVPacket|PktQueue["cVideoStream::m_packets<br/>**192**x AVPacket"]

    %% Decoding Thread
    PktQueue --> DecThread["cDecodingThread::Action"]
    DecThread --> DecInput["cVideoStream::DecodeInput"]
    DecInput --> |AVPacket|SendPkt["cVideoDecoder::SendPacket"]
    SendPkt --> |MPEG2/H.264/HEVC Decoding|RecvFrame["cVideoDecoder::ReceiveFrame"]
    RecvFrame --> |AVFrame|DecInput
    DecInput --> |AVFrame|RenderFrame["cVideoRender::RenderFrame"]

    %% Filter Thread Decision
    RenderFrame --> |AVFrame|FormatCheck{Frame format?}
    FormatCheck -->|YUV420P or

interlaced DRM_PRIME| FilterPush["cFilterThread::PushFrame"]
    FormatCheck -->|Progressive DRM_PRIME| RenderRB["cVideoRender::m_drmBufferQueue"]
    FormatCheck -->|NV12 Format| PushFrame["cVideoRender::PushFrame"]

    %% Filter Thread Path
    FilterPush --> FilterQueue["cFilterThread::m_frames<br/>**3**x AVFrame"]
    FilterQueue --> FilterAction["cFilterThread::Action"]
    FilterAction --> |Interlaced DRM_PRIME|HWDeint["FFMPEG filter:<br />HW Deinterlacer"]
    FilterAction --> |YUV420P|SWDeint["FFMPEG filter:<br/>Convert to NV12 with optional SW Deinterlacing"]
    HWDeint -->|Progressive DRM_PRIME| RenderRB["cVideoRender::m_drmBufferQueue<br/>Size: **3**x AVFrame"]
    SWDeint --> |NV12 Format|PushFrame["cVideoRender::PushFrame"]

    %% PushFrame Path (Software frames need buffer prep)
    PushFrame --> |Progressive DRM_PRIME|RenderRB

    %% Display Thread
    RenderRB --> |AVFrame|DispThread["cDisplayThread::Action"]
    DispThread --> DisplayFrame["cVideoRender::DisplayFrame<br/>(A/V sync)<br/>cVideoRender::m_drmBufferPool **36**x cDrmBuffer"]
    DisplayFrame --> |AVFrame & cDrmBuffer|PageFlipVid["cVideoRender::PageFlipVideo"]

    PageFlipVid --> |AVFrame & cDrmBuffer|PageFlip["PageFlip"]
    PageFlip --> |cDrmBuffer|CommitBuf["CommitBuffer"]

    CommitBuf --> |cDrmDevice|DrmCommit["drmModeAtomicCommit"]

    DrmCommit --> Hardware["Display Hardware"]

    %% Styling
    classDef vdrThread fill:#90caf9,stroke:#1976d2,stroke-width:2px,color:#000000
    classDef decThread fill:#ce93d8,stroke:#8e24aa,stroke-width:2px,color:#000000
    classDef filterThread fill:#ffb74d,stroke:#f57c00,stroke-width:2px,color:#000000
    classDef displayThread fill:#81c784,stroke:#388e3c,stroke-width:2px,color:#000000
    classDef hardware fill:#e57373,stroke:#d32f2f,stroke-width:3px,color:#000000
    classDef buffer fill:#fff59d,stroke:#fbc02d,stroke-width:2px,color:#000000

    class VDR,PlayVideo,PushPes,CreatePkt vdrThread
    class DecThread,DecInput,SendPkt,RecvFrame,RenderFrame,FilterPush decThread
    class FilterAction,HWDeint,SWDeint filterThread
    class DispThread,DisplayFrame,PageFlipVid,PageFlip,CommitBuf,DrmCommit displayThread
    class Hardware hardware
    class PktQueue,FilterQueue,RenderRB,PushFrame buffer
    class FormatCheck decThread

Frame Routing

The following graph shows the path of the frames from the decoder to the display output device, also considering trick speed mode.

flowchart TD
    FilterThreadForceProgressive["Filter Thread<br/>_only conversion to NV12_"]
    FilterThreadProgressive["Filter Thread<br/>_only conversion to NV12_"]
    FilterThreadHwDeinterlacer["Filter Thread<br/>_deinterlace_v4l2m2m_"]
    FilterThreadSwDeinterlacer["Filter Thread<br/>_bwdif=1:-1:0_"]

    Decoder --> TrickSpeedDecision{Mode?}
    TrickSpeedDecision --> |Trick Speed|TrickSpeedFormat{Format?}
    TrickSpeedFormat --> |__SW decoded__

YUV420P

_progressive or interlaced_

RPI 4&5: 576i MPEG2

RPI4: 1080p HEVC

RPI5: 1080i H.264

RK3399: __None__|FilterThreadForceProgressive
    TrickSpeedFormat --> |__HW decoded__

DRM_PRIME

_progressive or interlaced_

RPI4: 1080i H.264

RPI5: 1080p HEVC

RK3399: all|Display
    PushFrame --> |DRM_PRIME

progressive or interlaced|Display
    FilterThreadForceProgressive --> |NV12

progressive or interlaced|PushFrame

    TrickSpeedDecision --> |Normal Playback|NormalPlaybackFormat{Format?}
    NormalPlaybackFormat --> |__SW decoded__

_YUV420P

interlaced_

RPI 4&5: 576i MPEG2

RPI5: 1080i H.264

RK3399: __None__|FilterThreadSwDeinterlacer
    NormalPlaybackFormat --> |__SW decoded__

_YUV420P

progressive_

RPI5: 720p H.264

RPI4/RK3399: __None__|FilterThreadProgressive
    NormalPlaybackFormat --> |__HW decoded__

_DRM_PRIME

interlaced_

RPI4/RK3399: 1080i H.264

RPI5: __None__|FilterThreadHwDeinterlacer
    NormalPlaybackFormat --> |__HW decoded__

DRM_PRIME

_progressive_

RPI 4&5/RK3399: 1080p HEVC

RPI4/RK3399: 720p H.264|Display
    FilterThreadSwDeinterlacer --> |NV12

progressive|PushFrame
    FilterThreadProgressive --> |NV12

progressive|PushFrame
    FilterThreadHwDeinterlacer --> |DRM_PRIME

progressive|Display

    classDef decThread fill:#ce93d8,stroke:#8e24aa,stroke-width:2px,color:#000000
    classDef filterThread fill:#ffb74d,stroke:#f57c00,stroke-width:2px,color:#000000
    classDef displayThread fill:#81c784,stroke:#388e3c,stroke-width:2px,color:#000000
    classDef buffer fill:#fff59d,stroke:#fbc02d,stroke-width:2px,color:#000000

    class Decoder decThread
    class FilterThreadForceProgressive,FilterThreadProgressive,FilterThreadHwDeinterlacer,FilterThreadSwDeinterlacer filterThread
    class Display displayThread
    class PushFrame buffer

VDR State Management

When managing the VDR states (play/pause/trick speed/...), the following paradigms are followed:

On every call of above VDR methods, we wait at a single and central location, that the display and decoding threads finish their currently processing packet/frame. Then, the threads are locked (halted), and the necessary changes are done in a thread-safe and predictable way, until they resume their normal work.
What should happen in which state is also handled in a single and central location. Therefore, VDR's state is tracked in a variable. When one of PlayMode()/Freeze()/Clear()/... are invoked (I call it "events"), they are handled according to in which state VDR is currently in (play/stop/trickspeed). So, you can clearly see in the code, what happens in a particular state, when a specific event is received. The state transitions are handled in cSoftHdDevice::OnEventReceived() and what shall be done when entering or leaving a state is done in cSoftHdDevice::OnEnteringState()/cSoftHdDevice::OnLeavingState().

This was introduced in PR #91.

Audio/Video Packet Fragmentation Reassembly

The following facts were observed when playing PES streams (live or recorded):

Video (MPEG2, H.264, HEVC)
- A PES packet contains at most one frame (not necessarily a complete one).
- A frame can be fragmented across two PES packets (only 1080p HEVC and 720p H.264, not 576i MPEG2).
- A frame always starts at the beginning of a PES packet.
- A PES packet has only a PTS value when it starts with a new frame.
Audio (MP2, AC-3, E-AC-3, LATM/LOAS, ADTS)
- A PES packet can contain multiple frames (six were seen with MP2).
- A frame can be fragmented across two PES packets (seen with LATM/LOAS, not with MP2).
- An MP2 frame always starts at the beginning of a PES packet.
- An LATM/LOAS frame normally does not start at the beginning of a PES packet.
- There is max one LATM/LOAS frame in a PES packet.

Basically audio and video can use the same reassembly strategy, but audio needs to determine the frame length by reading the codec header. To avoid this complexity for video frame reassembly, two different approaches are implemented.

Video Packet Reassembly

The payload of video PES packets is stored in a fragmentation buffer, when the received PES packet comes without PTS value. When a PES packet with a PTS value arrives (meaning there is a new frame at the start of the PES packet), the current content of the fragmentation buffer is finalized and sent to the decoder. The last received PTS value is used for this frame. After clearing the fragmentation buffer, the PES packet's payload is stored in the now empty fragmentation buffer, and the cycle continues.

Audio Packet Reassembly

Because LATM/LOAS frames (and maybe others) are normally not aligned to PES packet boundaries, the codec's sync word (every codec frame starts with a sync word of usually 11-16 bits) needs to be found in the data.

The challenge is that the codec payload itself can also contain the sync word. Therefore, it's not sufficient to just search for the sync word, but the codec frame structure needs to be parsed to determine whether it's the sync word or just some random data in the middle of a codec payload. Every above-mentioned codec header contains the length of its payload. After the payload, another frame follows immediately in the data stream, again, starting with the sync word. If the second sync word is found at the expected position, we can be pretty sure that the parsed header and its length information are correct. Then, the frame with the first sync word is sent to the decoder, followed by the frame with the second sync word, and so on.

This synchronization mechanism is only done when a stream starts, or when the data after a frame does not start with the sync word. The latter can happen for example on bad reception when garbage is received.

Buffering

The audio and video data is buffered when VDR calls SetPlayMode(pmAudioVideo), Clear() or when the buffer underruns during playback.

The first audio PTS value and the first video PTS value which VDR sends (and which are received in PlayVideo()/PlayAudio()) differ in most cases (up to 3.5s were observed). The subset having only video or only audio is dropped, so that playback starts at the first frame where video and audio are present. However, the first received video frame is not dropped but displayed immediately after Clear() is called. This comes into handy when using SkipSeconds, having a responsive experience when seeking in a recording.

To calculate if the buffer fill levels are sufficient to start playback, the following algorithm is implemented:

Start decoding audio and video as soon as packets arrive, but do not start playback, yet.
On each Play*() invocation, find the oldest PTS in each buffer (audio and video).
Use the buffer with the newer of both values to calculate its fill level (this will be the buffer having audio and video the whole buffer).
When the fill threshold of that buffer is reached, truncate the above mentioned subset of the other buffer.
Wait until the display output queue is completely filled.
Start playback.

Example: Buffering a H.264 Live Stream

This is a real-life example, switching to the TV station "DMAX" (576i/H.264).

The first video packets (20.890-21.510s) are dropped, because they contain no codec information/I-frame. So, the first frame the decoder outputs is 21.510s, which is also the first frame being displayed.

All audio samples are dropped (21.510-20.302=1.208s), which have PTS values before the initially displayed video frame to let audio and video start in sync.

At the point the buffer threshold is reached (450ms), the audio buffer has a size of 21.886-21.510=0.376s and the video buffer of 22.490-21.510=0.980s.

The audio buffer size is apparently below the threshold in this example. Nevertheless, the buffer threshold is reached, because the the PTS values represent the start of a frame, and the threshold calculation takes the end of the last audio frame into account.

The PTS timestamps in the following timeline are truncated to seconds for simplicity, but represent real values.

timeline
title PTS Timeline: Live Stream Buffering

section Dropped
20.302s: First arrived audio packet
20.890s: First arrived video packet
section Displayed
21.510s: First video packet with a codec syncword being detected: First audio/video frame to be output
21.886s: Last arrived audio packet before the buffer threshold is reached
22.490s: Last arrived video packet before the buffer threshold is reached

The time between the first received PES packet of this stream and the first pageflip are 1.8s in this example.

That the first PTS to be output is determined by the video stream seems to be the common case. If the first audio packet arrives after the first decoded video frame, the initial PTS to be output would be determined by the audio stream.

Side note: For additional responsiveness/optimization, the first video packets (until the first sync word was received), were tried to put through the decoder (opened with the correct codec ID), but the decoder responded with invalid data. Only tried with H.264.

Misc

The audio is initilized lazily (only when playback starts), because there is no sound device, yet, when HDMI is used as sound output, if the TV is off.