SipXmediaLib Overview

From SipXtapi

Jump to: navigation, search



The sipXmediaLib is a robust full featured VoIP audio media processing engine that (Video is in the works, email us to help). sipXmediaLib supports narrow band, wideband and HD audio. It supports RTP streaming [RFC 1889], [RFC 1890], [RFC 3550] and [RFC 3551] over UDP, TCP, multicast and can easily support other RTP transports abstracted by the sipXportLib socket interface. STUN and TURN are also supported using the sipXportLib socket abstraction. The media processing engine is exposed through an abstract API defined in the sipXmediaAdapterLib. sipXmediaLib can be used independently of the sipX SIP stack (e.g. sipXmediaLib is used with reSIProcate in reCon and commercial applications.). sipXmedia can also be used with the sipX SIP stack via the sipXtapi API providing a complete SIP and media engine for server and client VoIP applications. The sipXmediaLib provides a whole suite of capabilities for client and server based VoIP applications:

  • RTP and RTCP Stacks
  • Narrow and Wideband audio processing support
  • Configurable Frame Size Processing (e.g. 10 and 20 milli-seconds frames)
  • RTP Codecs:
    RFC 2833/RFC 4733 DTMF tones
  • Ultra-Wideband Codecs (>8KHz or >16K samples/sec):
    • Opus (RFC 6716)
    • Speex UWB
    • Linear 16 PCM (22.05K, 24K, 32K, 44.1K and 48K samples/sec)
  • Wideband Codecs (3k-8KHz, 8K-16K samples/sec):
    • Opus (RFC 6716)
    • Speex WB
    • G.722
    • G.722.2 (AMR-WB)
    • AAC
    • Linear 16 PCM (11.025K and 16K samples/sec)
  • Narrowband Codecs (300-4KHz or <=8K samples/sec):
  • STUN and TURN support for NAT/firewall traversal
  • Flexible Conference Bridge
    • Run-time configurable mixing
    • Arbitrary participant-to-participant mixing weights
    • Fixed-point or floating-point mixing weights
    • Whisper Functionality
    • Side-bar Conversations
  • DTMF Tone Generator
  • Record Resources (to buffer and to file)
  • Playback from file or buffer
  • Mixer Resources
  • Acoustic Echo Cancellation
  • Dejitter/Jitter Buffer
  • Pluggable Codecs
  • Configurable and Dynamically Editable Resource Topology
  • Many commercially available plugins (e.g. AGC, VAD, Adaptive Dejitter, PLC, Speaker Selection)



  • resource (MpResource) - a component with one or more inputs and/or outputs that provides a source, sink or filter type processing on media frames. Think of a component stereo system with inputs and output connected via patch cords between the components.
  • frame (MpBuf, MpAudioBuf, MpVideoBuf) - a single chunk of audio or video media that gets pass through resources in a flowgraph. Audio frame usually contains 10 or 20 milliseconds of 16-bit linear audio at 8K, 16K, 32K or 48K samples per second.
  • flowgraph (MpFlowGraphBase, MpTopologyGraph, MpCallFlowraph (depricated)) - set of media device inputs and outputs and resources chained together for audio processing for a single call or conference.
  • resource topology (MpResourceTopology)) - the order and arrangement in which the resources and their inputs and outputs of a given flowgraph are connected.
  • resource factory (MpResourceFactory) - factory for abstractly constructing named resource types.

sipXmedia Architecture

The sipXmedia subsystem chains sets of resources together in a flowgraph to perform the media processing for a call or conference. Each resource has zero or more inputs and zero or more outputs. The output(s) of a resource can be connected to the input(s) of one or more other resources. This chain of resources are connected in a defined resource topology all contained in a single flowgraph. A flowgraph can contain the resources and zero or more RTP streams for either a single simple call or a number of calls connected via a bridge resource to form a conference. A flowgraph can also be used just to control the inputs and outputs from the local devices (e.g. mic and speaker). The device drivers and device managers provide abstractions to support different operating systems and arbitrary numbers of input or output devices. So a single mic can be used as input to multiple flowgraphs as well as multiple mics can be used by a single flowgraph. On the output side multiple flowgraphs can output to a single speaker or each flowgraph can output to a separate speaker. Currently there is a single media processing task that processes each flowgraph one at a time. When a flowgraph is process a single frame of media is processed at a time (e.g. in the case of audio a 10 or 20 millisecond frame of audio). When the topology of a flowgraph is changed (e.g. upon initial construction or upon modification of the topology from adding or deleting resources) the flowgraph first calculates the resource processing dependency order. That is the order in which frames of media must pass through all of the resources. The order is indicated by the way in which the inputs and outputs of all the resources in a flowgraph are connected. Once the order of processing is known the input frames are passed to each resource one at a time and the resource processes its inputs and provides outputs if any to be passed on to any resources connected to its outputs.

The sipXmedia subsystem supports narrow band, wideband and HD audio in the internal handling of audio data. By default a flowgraph runs at 8000 samples per second with 16 bit samples. However each flowgraph when it is created via the CpTopologyGraphFactoryImpl::createMediaInterface method can be set to a specific audio resolution. Typically sipXmedia is used at 8000, 16,000, 32,000 or 48,000 samples per second with 16 bit samples. sipXmedia will do the correct up and down sampling to the audio device drivers (e.g. mic and speaker) as well as the codecs that are used.

Note: If you are using the SipXtapi SDK, and you would like to use wideband you currently need to change the sample rate in the two calls to createMediaInterface in CallManager.cpp.



  • Operations
  • Notifications

Resource Types

TBD - need to provide brief definitions of these


How do I enable wideband and HD audio quality so that I can take advantage of wideband and HD audio devices or codecs?

Set the desired sample rate in the call to CpTopologyGraphFactoryImpl::createMediaInterface. If you are using the SipXtapi SDK set the sample rate via sipxInitialize If your are using the call manager directly change the sample rate in the two calls to createMediaInterface in CallManager.cpp to your desired sample rate (e.g. 16000, 32000, 48000)

Why is there both a MpCallFlowgraph and a MpTopologyFlowgraph?

MpCallFlowgraph is the old way that flowgraphs were put together with hardcoded dependencies on the topology and types of the audio processing components in the flowgraph. Flowgraphs were designed to allow flexible, and customizable to allow the topology that is best for an application. So we re-wrote the MpTopologyFlowgraph to meet those goals. The MpTopologyFlowgraph allows different topologies and arrangements of the MpResource components and provides the ability for developers to use their own proprietary MpResource components without having to contribute them to open source.

Should I use the MpCallFlowgraph or the MpTopologyFlowgraph?

You should always use the MpTopoogyFlowgraph unless you have a legacy application that has some hardcoded dependency on the MpCallFlowgraph.

How do I set which flowgraph type gets used?

Under Windows Topology flowgraph is selected by default. To switch back to old CallFLowgraph (not recommended!) you should disable DISABLE_DEFAULT_PHONE_MEDIA_INTERFACE_FACTORY and ENABLE_TOPOLOGY_FLOWGRAPH_INTERFACE_FACTORY preprocessor defines in sipXmediaAdapterLib project.

Under Linux CallFlowgraph is selected by default for now, but this will change in future. To select Topology flowgraph you should pass --enable-topology-graph to ./configure in sipXmediaAdapterLib.

Personal tools