From bc4650b826b1fcda9d0971010ee692088d65709d Mon Sep 17 00:00:00 2001
From: David Reid <mackron@gmail.com>
Date: Mon, 19 Jul 2021 20:39:37 +1000
Subject: [PATCH] Add documentation for resource management and node graphs.

---
 miniaudio.h                 | 840 ++++++++++++++++++++++++++++++++++--
 research/miniaudio_engine.h | 774 ---------------------------------
 2 files changed, 806 insertions(+), 808 deletions(-)

diff --git a/miniaudio.h b/miniaudio.h
index eb61063c..21059ee1 100644
--- a/miniaudio.h
+++ b/miniaudio.h
@@ -673,14 +673,786 @@ example below:
 Encoders must be uninitialized with `ma_encoder_uninit()`.
 
 
-6. Data Conversion
+
+6. Resource Management
+======================
+Many programs will want to manage sound resources for things such as reference counting and
+streaming. This is supported by miniaudio via the `ma_resource_manager` API.
+
+The resource manager is mainly responsible for the following:
+
+    1) Loading of sound files into memory with reference counting.
+    2) Streaming of sound data
+
+When loading a sound file, the resource manager will give you back a `ma_data_source` compatible
+object called `ma_resource_manager_data_source`. This object can be passed into any
+`ma_data_source` API which is how you can read and seek audio data. When loading a sound file, you
+specify whether or not you want the sound to be fully loaded into memory (and optionally
+pre-decoded) or streamed. When loading into memory, you can also specify whether or not you want
+the data to be loaded asynchronously.
+
+The example below is how you can initialize a resource manager using it's default configuration:
+
+    ```c
+    ma_resource_manager_config config;
+    ma_resource_manager resourceManager;
+
+    config = ma_resource_manager_config_init();
+    result = ma_resource_manager_init(&config, &resourceManager);
+    if (result != MA_SUCCESS) {
+        ma_device_uninit(&device);
+        printf("Failed to initialize the resource manager.");
+        return -1;
+    }
+    ```
+
+You can configure the format, channels and sample rate of the decoded audio data. By default it
+will use the file's native data format, but you can configure it to use a consistent format. This
+is useful for offloading the cost of data conversion to load time rather than dynamically
+converting a mixing time. To do this, you configure the decoded format, channels and sample rate
+like the code below:
+
+    ```c
+    config = ma_resource_manager_config_init();
+    config.decodedFormat     = device.playback.format;
+    config.decodedChannels   = device.playback.channels;
+    config.decodedSampleRate = device.sampleRate;
+    ```
+
+In the code above, the resource manager will be configured so that any decoded audio data will be
+pre-converted at load time to the device's native data format. If instead you used defaults and
+the data format of the file did not match the device's data format, you would need to convert the
+data at mixing time which may be prohibitive in high-performance and large scale scenarios like
+games.
+
+Asynchronicity is achieved via a job system. When an operation needs to be performed, such as the
+decoding of a page, a job will be posted to a queue which will then be processed by a job thread.
+By default there will be only one job thread running, but this can be configured, like so:
+
+    ```c
+    config = ma_resource_manager_config_init();
+    config.jobThreadCount = MY_JOB_THREAD_COUNT;
+    ```
+
+By default job threads are managed internally by the resource manager, however you can also self
+manage your job threads if, for example, you want to integrate the job processing into your
+existing job infrastructure, or if you simply don't like the way the resource manager does it. To
+do this, just set the job thread count to 0 and process jobs manually. To process jobs, you first
+need to retrieve a job using `ma_resource_manager_next_job()` and then process it using
+`ma_resource_manager_process_job()`:
+
+    ```c
+    config = ma_resource_manager_config_init();
+    config.jobThreadCount = 0;                            // Don't manage any job threads internally.
+    config.flags = MA_RESOURCE_MANAGER_FLAG_NON_BLOCKING; // Optional. Makes `ma_resource_manager_next_job()` non-blocking.
+
+    // ... Initialize your custom job threads ...
+
+    void my_custom_job_thread(...)
+    {
+        for (;;) {
+            ma_resource_manager_job job;
+            ma_result result = ma_resource_manager_next_job(pMyResourceManager, &job);
+            if (result != MA_SUCCESS) {
+                if (result == MA_NOT_DATA_AVAILABLE) {
+                    // No jobs are available. Keep going. Will only get this if the resource manager was initialized
+                    // with MA_RESOURCE_MANAGER_FLAG_NON_BLOCKING.
+                    continue;
+                } else if (result == MA_CANCELLED) {
+                    // MA_RESOURCE_MANAGER_JOB_QUIT was posted. Exit.
+                    break;
+                } else {
+                    // Some other error occurred.
+                    break;
+                }
+            }
+
+            ma_resource_manager_process_job(pMyResourceManager, &job);
+        }
+    }
+    ```
+
+In the example above, the `MA_RESOURCE_MANAGER_JOB_QUIT` event is the used as the termination indicator, but you can
+use whatever you would like to terminate the thread. The call to `ma_resource_manager_next_job()`
+is blocking by default, by can be configured to be non-blocking by initializing the resource
+manager with the `MA_RESOURCE_MANAGER_FLAG_NON_BLOCKING` configuration flag.
+
+When loading a file, it's sometimes convenient to be able to customize how files are opened and
+read. This can be done by setting `pVFS` member of the resource manager's config:
+
+    ```c
+    // Initialize your custom VFS object. See documentation for VFS for information on how to do this.
+    my_custom_vfs vfs = my_custom_vfs_init();
+
+    config = ma_resource_manager_config_init();
+    config.pVFS = &vfs;
+    ```
+
+If you do not specify a custom VFS, the resource manager will use the operating system's normal
+file operations. This is default.
+
+To load a sound file and create a data source, call `ma_resource_manager_data_source_init()`. When
+loading a sound you need to specify the file path and options for how the sounds should be loaded.
+By default a sound will be loaded synchronously. The returned data source is owned by the caller
+which means the caller is responsible for the allocation and freeing of the data source. Below is
+an example for initializing a data source:
+
+    ```c
+    ma_resource_manager_data_source dataSource;
+    ma_result result = ma_resource_manager_data_source_init(pResourceManager, pFilePath, flags, &dataSource);
+    if (result != MA_SUCCESS) {
+        // Error.
+    }
+
+    // ...
+
+    // A ma_resource_manager_data_source object is compatible with the `ma_data_source` API. To read data, just call
+    // the `ma_data_source_read_pcm_frames()` like you would with any normal data source.
+    result = ma_data_source_read_pcm_frames(&dataSource, pDecodedData, frameCount, &framesRead);
+    if (result != MA_SUCCESS) {
+        // Failed to read PCM frames.
+    }
+
+    // ...
+
+    ma_resource_manager_data_source_uninit(pResourceManager, &dataSource);
+    ```
+
+The `flags` parameter specifies how you want to perform loading of the sound file. It can be a
+combination of the following flags:
+
+    ```
+    MA_DATA_SOURCE_STREAM
+    MA_DATA_SOURCE_DECODE
+    MA_DATA_SOURCE_ASYNC
+    ```
+
+When no flags are specified (set to 0), the sound will be fully loaded into memory, but not
+decoded, meaning the raw file data will be stored in memory, and then dynamically decoded when
+`ma_data_source_read_pcm_frames()` is called. To instead decode the audio data before storing it in
+memory, use the `MA_DATA_SOURCE_DECODE` flag. By default, the sound file will be loaded
+synchronously, meaning `ma_resource_manager_data_source_init()` will only return after the entire
+file has been loaded. This is good for simplicity, but can be prohibitively slow. You can instead
+load the sound asynchronously using the `MA_DATA_SOURCE_ASYNC` flag. This will result in
+`ma_resource_manager_data_source_init()` returning quickly, but no data will be returned by
+`ma_data_source_read_pcm_frames()` until some data is available. When no data is available because
+the asynchronous decoding hasn't caught up, `MA_BUSY` will be returned by
+`ma_data_source_read_pcm_frames()`.
+
+For large sounds, it's often prohibitive to store the entire file in memory. To mitigate this, you
+can instead stream audio data which you can do by specifying the `MA_DATA_SOURCE_STREAM` flag. When
+streaming, data will be decoded in 1 second pages. When a new page needs to be decoded, a job will
+be posted to the job queue and then subsequently processed in a job thread.
+
+When loading asynchronously, it can be useful to poll whether or not loading has finished. Use
+`ma_resource_manager_data_source_result()` to determine this. For in-memory sounds, this will
+return `MA_SUCCESS` when the file has been *entirely* decoded. If the sound is still being decoded,
+`MA_BUSY` will be returned. Otherwise, some other error code will be returned if the sound failed
+to load. For streaming data sources, `MA_SUCCESS` will be returned when the first page has been
+decoded and the sound is ready to be played. If the first page is still being decoded, `MA_BUSY`
+will be returned. Otherwise, some other error code will be returned if the sound failed to load.
+
+For in-memory sounds, reference counting is used to ensure the data is loaded only once. This means
+multiple calls to `ma_resource_manager_data_source_init()` with the same file path will result in
+the file data only being loaded once. Each call to `ma_resource_manager_data_source_init()` must be
+matched up with a call to `ma_resource_manager_data_source_uninit()`. Sometimes it can be useful
+for a program to register self-managed raw audio data and associate it with a file path. Use the
+`ma_resource_manager_register_*()` and `ma_resource_manager_unregister_*()` APIs to do this.
+`ma_resource_manager_register_decoded_data()` is used to associate a pointer to raw, self-managed
+decoded audio data in the specified data format with the specified name. Likewise,
+`ma_resource_manager_register_encoded_data()` is used to associate a pointer to raw self-managed
+encoded audio data (the raw file data) with the specified name. Note that these names need not be
+actual file paths. When `ma_resource_manager_data_source_init()` is called (without the
+`MA_DATA_SOURCE_STREAM` flag), the resource manager will look for these explicitly registered data
+buffers and, if found, will use it as the backing data for the data source. Note that the resource
+manager does *not* make a copy of this data so it is up to the caller to ensure the pointer stays
+valid for it's lifetime. Use `ma_resource_manager_unregister_data()` to unregister the self-managed
+data. It does not make sense to use the `MA_DATA_SOURCE_STREAM` flag with a self-managed data
+pointer. When `MA_DATA_SOURCE_STREAM` is specified, it will try loading the file data through the
+VFS.
+
+
+6.1. Resource Manager Implementation Details
+--------------------------------------------
+Resources are managed in two main ways:
+
+    1) By storing the entire sound inside an in-memory buffer (referred to as a data buffer)
+    2) By streaming audio data on the fly (referred to as a data stream)
+
+A resource managed data source (`ma_resource_manager_data_source`) encapsulates a data buffer or
+data stream, depending on whether or not the data source was initialized with the
+`MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_STREAM` flag. If so, it will make use of a
+`ma_resource_manager_data_stream` object. Otherwise it will use a `ma_resource_manager_data_buffer`
+object. Both of these objects are data sources which means they can be used with any
+`ma_data_source_*()` API.
+
+Another major feature of the resource manager is the ability to asynchronously decode audio files.
+This relieves the audio thread of time-consuming decoding which can negatively affect scalability
+due to the audio thread needing to complete it's work extremely quickly to avoid glitching.
+Asynchronous decoding is achieved through a job system. There is a central multi-producer,
+multi-consumer, lock-free, fixed-capacity job queue. When some asynchronous work needs to be done,
+a job is posted to the queue which is then read by a job thread. The number of job threads can be
+configured for improved scalability, and job threads can all run in parallel without needing to
+worry about the order of execution (how this is achieved is explained below).
+
+When a sound is being loaded asynchronously, playback can begin before the sound has been fully
+decoded. This enables the application to start playback of the sound quickly, while at the same
+time allowing to resource manager to keep loading in the background. Since there may be less
+threads than the number of sounds being loaded at a given time, a simple scheduling system is used
+to keep decoding time balanced and fair. The resource manager solves this by splitting decoding
+into chunks called pages. By default, each page is 1 second long. When a page has been decoded, a
+new job will be posted to start decoding the next page. By dividing up decoding into pages, an
+individual sound shouldn't ever delay every other sound from having their first page decoded. Of
+course, when loading many sounds at the same time, there will always be an amount of time required
+to process jobs in the queue so in heavy load situations there will still be some delay. To
+determine if a data source is ready to have some frames read, use
+`ma_resource_manager_data_source_get_available_frames()`. This will return the number of frames
+available starting from the current position.
+
+
+6.1.1. Data Buffers
+-------------------
+When the `MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_STREAM` flag is excluded at initialization time, the
+resource manager will try to load the data into an in-memory data buffer. Before doing so, however,
+it will first check if the specified file has already been loaded. If so, it will increment a
+reference counter and just use the already loaded data. This saves both time and memory. A binary
+search tree (BST) is used for storing data buffers as it has good balance between efficiency and
+simplicity. The key of the BST is a 64-bit hash of the file path that was passed into
+`ma_resource_manager_data_source_init()`. The advantage of using a hash is that it saves memory
+over storing the entire path, has faster comparisons, and results in a mostly balanced BST due to
+the random nature of the hash. The disadvantage is that file names are case-sensitive. If this is
+an issue, you should normalize your file names to upper- or lower-case before initializing your
+data sources.
+
+When a sound file has not already been loaded and the `MMA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_ASYNC`
+is excluded, the file will be decoded synchronously by the calling thread. There are two options
+for controlling how the audio is stored in the data buffer - encoded or decoded. When the
+`MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_DECODE` option is excluded, the raw file data will be stored
+in memory. Otherwise the sound will be decoded before storing it in memory. Synchronous loading is
+a very simple and standard process of simply adding an item to the BST, allocating a block of
+memory and then decoding (if `MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_DECODE` is specified).
+
+When the `MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_ASYNC` flag is specified, loading of the data buffer
+is done asynchronously. In this case, a job is posted to the queue to start loading and then the
+function immediately returns, setting an internal result code to `MA_BUSY`. This result code is
+returned when the program calls `ma_resource_manager_data_source_result()`. When decoding has fully
+completed `MA_RESULT` will be returned. This can be used to know if loading has fully completed.
+
+When loading asynchronously, a single job is posted to the queue of the type
+`MA_RESOURCE_MANAGER_JOB_LOAD_DATA_BUFFER_NODE`. This involves making a copy of the file path and
+associating it with job. When the job is processed by the job thread, it will first load the file
+using the VFS associated with the resource manager. When using a custom VFS, it's important that it
+be completely thread-safe because it will be used from one or more job threads at the same time.
+Individual files should only ever be accessed by one thread at a time, however. After opening the
+file via the VFS, the job will determine whether or not the file is being decoded. If not, it
+simply allocates a block of memory and loads the raw file contents into it and returns. On the
+other hand, when the file is being decoded, it will first allocate a decoder on the heap and
+initialize it. Then it will check if the length of the file is known. If so it will allocate a
+block of memory to store the decoded output and initialize it to silence. If the size is unknown,
+it will allocate room for one page. After memory has been allocated, the first page will be
+decoded. If the sound is shorter than a page, the result code will be set to `MA_SUCCESS` and the
+completion event will be signalled and loading is now complete. If, however, there is more to
+decode, a job with the code `MA_RESOURCE_MANAGER_JOB_PAGE_DATA_BUFFER_NODE` is posted. This job
+will decode the next page and perform the same process if it reaches the end. If there is more to
+decode, the job will post another `MA_RESOURCE_MANAGER_JOB_PAGE_DATA_BUFFER_NODE` job which will
+keep on happening until the sound has been fully decoded. For sounds of an unknown length, the
+buffer will be dynamically expanded as necessary, and then shrunk with a final realloc() when the
+end of the file has been reached.
+
+
+6.1.2. Data Streams
+-------------------
+Data streams only ever store two pages worth of data for each instance. They are most useful for
+large sounds like music tracks in games that would consume too much memory if fully decoded in
+memory. After every frame from a page has been read, a job will be posted to load the next page
+which is done from the VFS.
+
+For data streams, the `MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_ASYNC` flag will determine whether or
+not initialization of the data source waits until the two pages have been decoded. When unset,
+`ma_resource_manager_data_source_init()` will wait until the two pages have been loaded, otherwise
+it will return immediately.
+
+When frames are read from a data stream using `ma_resource_manager_data_source_read_pcm_frames()`,
+`MA_BUSY` will be returned if there are no frames available. If there are some frames available,
+but less than the number requested, `MA_SUCCESS` will be returned, but the actual number of frames
+read will be less than the number requested. Due to the asymchronous nature of data streams,
+seeking is also asynchronous. If the data stream is in the middle of a seek, `MA_BUSY` will be
+returned when trying to read frames.
+
+When `ma_resource_manager_data_source_read_pcm_frames()` results in a page getting fully consumed
+a job is posted to load the next page. This will be posted from the same thread that called
+`ma_resource_manager_data_source_read_pcm_frames()` which should be lock-free.
+
+Data streams are uninitialized by posting a job to the queue, but the function won't return until
+that job has been processed. The reason for this is that the caller owns the data stream object and
+therefore miniaudio needs to ensure everything completes before handing back control to the caller.
+Also, if the data stream is uninitialized while pages are in the middle of decoding, they must
+complete before destroying any underlying object and the job system handles this cleanly.
+
+
+6.1.3. Job Queue
+----------------
+The resource manager uses a job queue which is multi-producer, multi-consumer, lock-free and
+fixed-capacity. The lock-free property of the queue is achieved using the algorithm described by
+Michael and Scott: Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared
+Memory Multiprocessors. In order for this to work, only a fixed number of jobs can be allocated and
+inserted into the queue which is done through a lock-free data structure for allocating an index
+into a fixed sized array, with reference counting for mitigation of the ABA problem. The reference
+count is 32-bit.
+
+For many types of jobs it's important that they execute in a specific order. In these cases, jobs
+are executed serially. For the resource manager, serial execution of jobs is only required on a
+per-object basis (per data buffer or per data stream). Each of these objects stores an execution
+counter. When a job is posted it is associated with an execution counter. When the job is
+processed, it checks if the execution counter of the job equals the execution counter of the
+owning object and if so, processes the job. If the counters are not equal, the job will be posted
+back onto the job queue for later processing. When the job finishes processing the execution order
+of the main object is incremented. This system means the no matter how many job threads are
+executing, decoding of an individual sound will always get processed serially. The advantage to
+having multiple threads comes into play when loading multiple sounds at the time time.
+
+
+
+7. Node Graph
+=============
+miniaudio's routing infrastructure follows a node graph paradigm. The idea is that you create a
+node whose outputs are attached to inputs of another node, thereby creating a graph. There are
+different types of nodes, with each node in the graph processing input data to produce output,
+which is then fed through the chain. Each node in the graph can apply their own custom effects. At
+the end of the graph is an endpoint which represents the end of the chain and is where the final
+output is ultimately extracted from.
+
+Each node has a number of input buses and a number of output buses. An output bus from a node is
+attached to an input bus of another. Multiple nodes can connect their output buses to another
+node's input bus, in which case their outputs will be mixed before processing by the node.
+
+Each input bus must be configured to accept the same number of channels, but input buses and output
+buses can each have different channel counts, in which case miniaudio will automatically convert
+the input data to the output channel count before processing. The number of channels of an output
+bus of one node must match the channel count of the input bus it's attached to. The channel counts
+cannot be changed after the node has been initialized. If you attempt to attach an output bus to
+an input bus with a different channel count, attachment will fail.
+
+To use a node graph, you first need to initialize a `ma_node_graph` object. This is essentially a
+container around the entire graph. The `ma_node_graph` object is required for some thread-safety
+issues which will be explained later. A `ma_node_graph` object is initialized using miniaudio's
+standard config/init system:
+
+    ```c
+    ma_node_graph_config nodeGraphConfig = ma_node_graph_config_init(myChannelCount);
+
+    result = ma_node_graph_init(&nodeGraphConfig, NULL, &nodeGraph);    // Second parameter is a pointer to allocation callbacks.
+    if (result != MA_SUCCESS) {
+        // Failed to initialize node graph.
+    }
+    ```
+
+When you initialize the node graph, you're specifying the channel count of the endpoint. The
+endpoint is a special node which has one input bus and one output bus, both of which have the
+same channel count, which is specified in the config. Any nodes that connect directly to the
+endpoint must be configured such that their output buses have the same channel count. When you read
+audio data from the node graph, it'll have the channel count you specified in the config. To read
+data from the graph:
+
+    ```c
+    ma_uint32 framesRead;
+    result = ma_node_graph_read_pcm_frames(&nodeGraph, pFramesOut, frameCount, &framesRead);
+    if (result != MA_SUCCESS) {
+        // Failed to read data from the node graph.
+    }
+    ```
+
+When you read audio data, miniaudio starts at the node graph's endpoint node which then pulls in
+data from it's input attachments, which in turn recusively pull in data from their inputs, and so
+on. At the very base of the graph there will be some kind of data source node which will have zero
+inputs and will instead read directly from a data source. The base nodes don't literally need to
+read from a `ma_data_source` object, but they will always have some kind of underlying object that
+sources some kind of audio. The `ma_data_source_node` node can be used to read from a
+`ma_data_source`. Data is always in floating-point format and in the number of channels you
+specified when the graph was initialized. The sample rate is defined by the underlying data sources.
+It's up to you to ensure they use a consistent and appropraite sample rate.
+
+The `ma_node` API is designed to allow custom nodes to be implemented with relative ease, but
+miniaudio includes a few stock nodes for common functionality. This is how you would initialize a
+node which reads directly from a data source (`ma_data_source_node`) which is an example of one
+of the stock nodes that comes with miniaudio:
+
+    ```c
+    ma_data_source_node_config config = ma_data_source_node_config_init(pMyDataSource, isLooping);
+
+    ma_data_source_node dataSourceNode;
+    result = ma_data_source_node_init(&nodeGraph, &config, NULL, &dataSourceNode);
+    if (result != MA_SUCCESS) {
+        // Failed to create data source node.
+    }
+    ```
+
+The data source node will use the output channel count to determine the channel count of the output
+bus. There will be 1 output bus and 0 input buses (data will be drawn directly from the data
+source). The data source must output to floating-point (ma_format_f32) or else an error will be
+returned from `ma_data_source_node_init()`.
+
+By default the node will not be attached to the graph. To do so, use `ma_node_attach_output_bus()`:
+
+    ```c
+    result = ma_node_attach_output_bus(&dataSourceNode, 0, ma_node_graph_get_endpoint(&nodeGraph), 0);
+    if (result != MA_SUCCESS) {
+        // Failed to attach node.
+    }
+    ```
+
+The code above connects the data source node directly to the endpoint. Since the data source node
+has only a single output bus, the index will always be 0. Likewise, the endpoint only has a single
+input bus which means the input bus index will also always be 0.
+
+To detach a specific output bus, use `ma_node_detach_output_bus()`. To detach all output buses, use
+`ma_node_detach_all_output_buses()`. If you want to just move the output bus from one attachment to
+another, you do not need to detach first. You can just call `ma_node_attach_output_bus()` and it'll
+deal with it for you.
+
+Less frequently you may want to create a specialized node. This will be a node where you implement
+your own processing callback to apply a custom effect of some kind. This is similar to initalizing
+one of the stock node types, only this time you need to specify a pointer to a vtable containing a
+pointer to the processing function and the number of input and output buses. Example:
+
+    ```c
+    static void my_custom_node_process_pcm_frames(ma_node* pNode, const float** ppFramesIn, ma_uint32* pFrameCountIn, float** ppFramesOut, ma_uint32* pFrameCountOut)
+    {
+        // Do some processing of ppFramesIn (one stream of audio data per input bus)
+        const float* pFramesIn_0 = ppFramesIn[0]; // Input bus @ index 0.
+        const float* pFramesIn_1 = ppFramesIn[1]; // Input bus @ index 1.
+        float* pFramesOut_0 = ppFramesOut[0];     // Output bus @ index 0.
+
+        // Do some processing. On input, `pFrameCountIn` will be the number of input frames in each
+        // buffer in `ppFramesIn` and `pFrameCountOut` will be the capacity of each of the buffers
+        // in `ppFramesOut`. On output, `pFrameCountIn` should be set to the number of input frames
+        // your node consumed and `pFrameCountOut` should be set the number of output frames that
+        // were produced.
+        //
+        // You should process as many frames as you can. If your effect consumes input frames at the
+        // same rate as output frames (always the case, unless you're doing resampling), you need
+        // only look at `ppFramesOut` and process that exact number of frames. If you're doing
+        // resampling, you'll need to be sure to set both `pFrameCountIn` and `pFrameCountOut`
+        // properly.
+    }
+
+    static ma_node_vtable my_custom_node_vtable = 
+    {
+        my_custom_node_process_pcm_frames, // The function that will be called process your custom node. This is where you'd implement your effect processing.
+        NULL,   // Optional. A callback for calculating the number of input frames that are required to process a specified number of output frames.
+        2,      // 2 input buses.
+        1,      // 1 output bus.
+        0       // Default flags.
+    };
+
+    ...
+
+    // Each bus needs to have a channel count specified. To do this you need to specify the channel
+    // counts in an array and then pass that into the node config.
+    ma_uint32 inputChannels[2];     // Equal in size to the number of input channels specified in the vtable.
+    ma_uint32 outputChannels[1];    // Equal in size to the number of output channels specicied in the vtable.
+
+    inputChannels[0]  = channelsIn;
+    inputChannels[1]  = channelsIn;
+    outputChannels[0] = channelsOut;
+    
+    ma_node_config nodeConfig = ma_node_config_init();
+    nodeConfig.vtable          = &my_custom_node_vtable;
+    nodeConfig.pInputChannels  = inputChannels;
+    nodeConfig.pOutputChannels = outputChannels;
+
+    ma_node_base node;
+    result = ma_node_init(&nodeGraph, &nodeConfig, NULL, &node);
+    if (result != MA_SUCCESS) {
+        // Failed to initialize node.
+    }
+    ```
+
+When initializing a custom node, as in the code above, you'll normally just place your vtable in
+static space. The number of input and output buses are specified as part of the vtable. If you need
+a variable number of buses on a per-node bases, the vtable should have the relevant bus count set
+to `MA_NODE_BUS_COUNT_UNKNOWN`. In this case, the bus count should be set in the node config:
+
+    ```c
+    static ma_node_vtable my_custom_node_vtable = 
+    {
+        my_custom_node_process_pcm_frames, // The function that will be called process your custom node. This is where you'd implement your effect processing.
+        NULL,   // Optional. A callback for calculating the number of input frames that are required to process a specified number of output frames.
+        MA_NODE_BUS_COUNT_UNKNOWN,  // The number of input buses is determined on a per-node basis.
+        1,      // 1 output bus.
+        0       // Default flags.
+    };
+
+    ...
+
+    ma_node_config nodeConfig = ma_node_config_init();
+    nodeConfig.vtable          = &my_custom_node_vtable;
+    nodeConfig.inputBusCount   = myBusCount;        // <-- Since the vtable specifies MA_NODE_BUS_COUNT_UNKNOWN, the input bus count should be set here.
+    nodeConfig.pInputChannels  = inputChannels;     // <-- Make sure there are nodeConfig.inputBusCount elements in this array.
+    nodeConfig.pOutputChannels = outputChannels;    // <-- The vtable specifies 1 output bus, so there must be 1 element in this array.
+    ```
+
+In the above example it's important to never set the `inputBusCount` and `outputBusCount` members
+to anything other than their defaults if the vtable specifies an explicit count. They can only be
+set if the vtable specifies MA_NODE_BUS_COUNT_UNKNOWN in the relevant bus count.
+
+Most often you'll want to create a structure to encapsulate your node with some extra data. You
+need to make sure the `ma_node_base` object is your first member of the structure:
+
+    ```c
+    typedef struct
+    {
+        ma_node_base base; // <-- Make sure this is always the first member.
+        float someCustomData;
+    } my_custom_node;
+    ```
+
+By doing this, your object will be compatible with all `ma_node` APIs and you can attach it to the
+graph just like any other node.
+
+In the custom processing callback (`my_custom_node_process_pcm_frames()` in the example above), the
+number of channels for each bus is what was specified by the config when the node was initialized
+with `ma_node_init()`. In addition, all attachments to each of the input buses will have been
+pre-mixed by miniaudio. The config allows you to specify different channel counts for each
+individual input and output bus. It's up to the effect to handle it appropriate, and if it can't,
+return an error in it's initialization routine.
+
+Custom nodes can be assigned some flags to describe their behaviour. These are set via the vtable
+and include the following:
+
+    +-----------------------------------------+---------------------------------------------------+
+    | Flag Name                               | Description                                       |
+    +-----------------------------------------+---------------------------------------------------+
+    | MA_NODE_FLAG_PASSTHROUGH                | Useful for nodes that do not do any kind of audio |
+    |                                         | processing, but are instead used for tracking     |
+    |                                         | time, handling events, etc. Also used by the      |
+    |                                         | internal endpoint node. It reads directly from    |
+    |                                         | the input bus to the output bus. Nodes with this  |
+    |                                         | flag must have exactly 1 input bus and 1 output   |
+    |                                         | bus, and both buses must have the same channel    |
+    |                                         | counts.                                           |
+    +-----------------------------------------+---------------------------------------------------+
+    | MA_NODE_FLAG_CONTINUOUS_PROCESSING      | Causes the processing callback to be called even  |
+    |                                         | when no data is available to be read from input   |
+    |                                         | attachments. This is useful for effects like      |
+    |                                         | echos where there will be a tail of audio data    |
+    |                                         | that still needs to be processed even when the    |
+    |                                         | original data sources have reached their ends.    |
+    +-----------------------------------------+---------------------------------------------------+
+    | MA_NODE_FLAG_ALLOW_NULL_INPUT           | Used in conjunction with                          |
+    |                                         | `MA_NODE_FLAG_CONTINUOUS_PROCESSING`. When this   |
+    |                                         | is set, the `ppFramesIn` parameter of the         |
+    |                                         | processing callback will be set to NULL when      |
+    |                                         | there are no input frames are available. When     |
+    |                                         | this is unset, silence will be posted to the      |
+    |                                         | processing callback.                              |
+    +-----------------------------------------+---------------------------------------------------+
+    | MA_NODE_FLAG_DIFFERENT_PROCESSING_RATES | Used to tell miniaudio that input and output      |
+    |                                         | frames are processed at different rates. You      |
+    |                                         | should set this for any nodes that perform        |
+    |                                         | resampling.                                       |
+    +-----------------------------------------+---------------------------------------------------+
+
+
+If you need to make a copy of an audio stream for effect processing you can use a splitter node
+called `ma_splitter_node`. This takes has 1 input bus and splits the stream into 2 output buses.
+You can use it like this:
+
+    ```c
+    ma_splitter_node_config splitterNodeConfig = ma_splitter_node_config_init(channelsIn, channelsOut);
+
+    ma_splitter_node splitterNode;
+    result = ma_splitter_node_init(&nodeGraph, &splitterNodeConfig, NULL, &splitterNode);
+    if (result != MA_SUCCESS) {
+        // Failed to create node.
+    }
+
+    // Attach your output buses to two different input buses (can be on two different nodes).
+    ma_node_attach_output_bus(&splitterNode, 0, ma_node_graph_get_endpoint(&nodeGraph), 0); // Attach directly to the endpoint.
+    ma_node_attach_output_bus(&splitterNode, 1, &myEffectNode,                          0); // Attach to input bus 0 of some effect node.
+    ```
+
+The volume of an output bus can be configured on a per-bus basis:
+
+    ```c
+    ma_node_set_output_bus_volume(&splitterNode, 0, 0.5f);
+    ma_node_set_output_bus_volume(&splitterNode, 1, 0.5f);
+    ```
+
+In the code above we're using the splitter node from before and changing the volume of each of the
+copied streams.
+
+You can start, stop and mute a node with the following:
+
+    ```c
+    ma_node_set_state(&splitterNode, ma_node_state_started);    // The default state.
+    ma_node_set_state(&splitterNode, ma_node_state_stopped);
+    ma_node_set_state(&splitterNode, ma_node_state_muted);
+    ```
+
+By default the node is in a started state, but since it won't be connected to anything won't
+actually be invoked by the node graph until it's connected. When you stop a node, data will not be
+read from any of it's input connections. You can use this property to stop a group of sounds
+atomically.
+
+You can configure the initial state of a node in it's config:
+
+    ```c
+    nodeConfig.initialState = ma_node_state_stopped;
+    ```
+
+Note that for the stock specialized nodes, all of their configs will have a `nodeConfig` member
+which is the config to use with the base node. This is where the initial state can be configured
+for specialized nodes:
+
+    ```c
+    dataSourceNodeConfig.nodeConfig.initialState = ma_node_state_stopped;
+    ```
+
+When using a specialized node like `ma_data_source_node` or `ma_splitter_node`, be sure to not
+modify the `vtable` member of the `nodeConfig` object.
+
+
+7.1. Timing
+-----------
+The node graph supports starting and stopping nodes at scheduled times. This is especially useful
+for data source nodes where you want to get the node set up, but only start playback at a specific
+time. There are two clocks: local and global.
+
+A local clock is per-node, whereas the global clock is per graph. Scheduling starts and stops can
+only be done based on the global clock because the local clock will not be running while the node
+is stopped. The global clocks advances whenever `ma_node_graph_read_pcm_frames()` is called. On the
+other hand, the local clock only advances when the node's processing callback is fired, and is
+advanced based on the output frame count.
+
+To retrieve the global time, use `ma_node_graph_get_time()`. The global time can be set with
+`ma_node_graph_set_time()` which might be useful if you want to do seeking on a global timeline.
+Getting and setting the local time is similar. Use `ma_node_get_time()` to retrieve the local time,
+and `ma_node_set_time()` to set the local time. The global and local times will be advanced by the
+audio thread, so care should be taken to avoid data races. Ideally you should avoid calling these
+outside of the node processing callbacks which are always run on the audio thread.
+
+There is basic support for scheduling the starting and stopping of nodes. You can only schedule one
+start and one stop at a time. This is mainly intended for putting nodes into a started or stopped
+state in a frame-exact manner. Without this mechanism, starting and stopping of a node is limited
+to the resolution of a call to `ma_node_graph_read_pcm_frames()` which would typically be in blocks
+of several milliseconds. The following APIs can be used for scheduling node states:
+
+    ```c
+    ma_node_set_state_time()
+    ma_node_get_state_time()
+    ```
+
+The time is absolute and must be based on the global clock. An example is below:
+
+    ```c
+    ma_node_set_state_time(&myNode, ma_node_state_started, sampleRate*1);   // Delay starting to 1 second.
+    ma_node_set_state_time(&myNode, ma_node_state_stopped, sampleRate*5);   // Delay stopping to 5 seconds.
+    ```
+
+An example for changing the state using a relative time.
+
+    ```c
+    ma_node_set_state_time(&myNode, ma_node_state_started, sampleRate*1 + ma_node_graph_get_time(&myNodeGraph));
+    ma_node_set_state_time(&myNode, ma_node_state_stopped, sampleRate*5 + ma_node_graph_get_time(&myNodeGraph));
+    ```
+
+Note that due to the nature of multi-threading the times may not be 100% exact. If this is an
+issue, consider scheduling state changes from within a processing callback. An idea might be to
+have some kind of passthrough trigger node that is used specifically for tracking time and handling
+events.
+
+
+
+7.2. Thread Safety and Locking
+------------------------------
+When processing audio, it's ideal not to have any kind of locking in the audio thread. Since it's
+expected that `ma_node_graph_read_pcm_frames()` would be run on the audio thread, it does so
+without the use of any locks. This section discusses the implementation used by miniaudio and goes
+over some of the compromises employed by miniaudio to achieve this goal. Note that the current
+implementation may not be ideal - feedback and critiques are most welcome.
+
+The node graph API is not *entirely* lock-free. Only `ma_node_graph_read_pcm_frames()` is expected
+to be lock-free. Attachment, detachment and uninitialization of nodes use locks to simplify the
+implementation, but are crafted in a way such that such locking is not required when reading audio
+data from the graph. Locking in these areas are achieved by means of spinlocks.
+
+The main complication with keeping `ma_node_graph_read_pcm_frames()` lock-free stems from the fact
+that a node can be uninitialized, and it's memory potentially freed, while in the middle of being
+processed on the audio thread. There are times when the audio thread will be referencing a node,
+which means the uninitialization process of a node needs to make sure it delays returning until the
+audio thread is finished so that control is not handed back to the caller thereby giving them a
+chance to free the node's memory.
+
+When the audio thread is processing a node, it does so by reading from each of the output buses of
+the node. In order for a node to process data for one of it's output buses, it needs to read from
+each of it's input buses, and so on an so forth. It follows that once all output buses of a node
+are detached, the node as a whole will be disconnected and no further processing will occur unless
+it's output buses are reattached, which won't be happening when the node is being uninitialized.
+By having `ma_node_detach_output_bus()` wait until the audio thread is finished with it, we can
+simplify a few things, at the expense of making `ma_node_detach_output_bus()` a bit slower. By
+doing this, the implementation of `ma_node_uninit()` becomes trivial - just detach all output
+nodes, followed by each of the attachments to each of it's input nodes, and then do any final clean
+up.
+
+With the above design, the worst-case scenario is `ma_node_detach_output_bus()` taking as long as
+it takes to process the output bus being detached. This will happen if it's called at just the
+wrong moment where the audio thread has just iterated it and has just started processing. The
+caller of `ma_node_detach_output_bus()` will stall until the audio thread is finished, which
+includes the cost of recursively processing it's inputs. This is the biggest compromise made with
+the approach taken by miniaudio for it's lock-free processing system. The cost of detaching nodes
+earlier in the pipeline (data sources, for example) will be cheaper than the cost of detaching
+higher level nodes, such as some kind of final post-processing endpoint. If you need to do mass
+detachments, detach starting from the lowest level nodes and work your way towards the final
+endpoint node (but don't try detaching the node graph's endpoint). If the audio thread is not
+running, detachment will be fast and detachment in any order will be the same. The reason nodes
+need to wait for their input attachments to complete is due to the potential for desyncs between
+data sources. If the node was to terminate processing mid way through processing it's inputs,
+there's a chance that some of the underlying data sources will have been read, but then others not.
+That will then result in a potential desynchronization when detaching and reattaching higher-level
+nodes. A possible solution to this is to have an option when detaching to terminate processing
+before processing all input attachments which should be fairly simple.
+
+Another compromise, albeit less significant, is locking when attaching and detaching nodes. This
+locking is achieved by means of a spinlock in order to reduce memory overhead. A lock is present
+for each input bus and output bus. When an output bus is connected to an input bus, both the output
+bus and input bus is locked. This locking is specifically for attaching and detaching across
+different threads and does not affect `ma_node_graph_read_pcm_frames()` in any way. The locking and
+unlocking is mostly self-explanatory, but a slightly less intuitive aspect comes into it when
+considering that iterating over attachments must not break as a result of attaching or detaching a
+node while iteration is occuring.
+
+Attaching and detaching are both quite simple. When an output bus of a node is attached to an input
+bus of another node, it's added to a linked list. Basically, an input bus is a linked list, where
+each item in the list is and output bus. We have some intentional (and convenient) restrictions on
+what can done with the linked list in order to simplify the implementation. First of all, whenever
+something needs to iterate over the list, it must do so in a forward direction. Backwards iteration
+is not supported. Also, items can only be added to the start of the list.
+
+The linked list is a doubly-linked list where each item in the list (an output bus) holds a pointer
+to the next item in the list, and another to the previous item. A pointer to the previous item is
+only required for fast detachment of the node - it is never used in iteration. This is an
+important property because it means from the perspective of iteration, attaching and detaching of
+an item can be done with a single atomic assignment. This is exploited by both the attachment and
+detachment process. When attaching the node, the first thing that is done is the setting of the
+local "next" and "previous" pointers of the node. After that, the item is "attached" to the list
+by simply performing an atomic exchange with the head pointer. After that, the node is "attached"
+to the list from the perspective of iteration. Even though the "previous" pointer of the next item
+hasn't yet been set, from the perspective of iteration it's been attached because iteration will
+only be happening in a forward direction which means the "previous" pointer won't actually ever get
+used. The same general process applies to detachment. See `ma_node_attach_output_bus()` and
+`ma_node_detach_output_bus()` for the implementation of this mechanism.
+
+
+
+8. Data Conversion
 ==================
 A data conversion API is included with miniaudio which supports the majority of data conversion
 requirements. This supports conversion between sample formats, channel counts (with channel
 mapping) and sample rates.
 
 
-6.1. Sample Format Conversion
+8.1. Sample Format Conversion
 -----------------------------
 Conversion between sample formats is achieved with the `ma_pcm_*_to_*()`, `ma_pcm_convert()` and
 `ma_convert_pcm_frames_format()` APIs. Use `ma_pcm_*_to_*()` to convert between two specific
@@ -689,7 +1461,7 @@ formats. Use `ma_pcm_convert()` to convert based on a `ma_format` variable. Use
 and channel count as a variable instead of the total sample count.
 
 
-6.1.1. Dithering
+8.1.1. Dithering
 ----------------
 Dithering can be set using the ditherMode parameter.
 
@@ -722,7 +1494,7 @@ dither is not used. It will just be ignored.
 
 
 
-6.2. Channel Conversion
+8.2. Channel Conversion
 -----------------------
 Channel conversion is used for channel rearrangement and conversion from one channel count to
 another. The `ma_channel_converter` API is used for channel conversion. Below is an example of
@@ -758,7 +1530,7 @@ frames.
 Input and output PCM frames are always interleaved. Deinterleaved layouts are not supported.
 
 
-6.2.1. Channel Mapping
+8.2.1. Channel Mapping
 ----------------------
 In addition to converting from one channel count to another, like the example above, the channel
 converter can also be used to rearrange channels. When initializing the channel converter, you can
@@ -860,7 +1632,7 @@ Below are the channel maps used by default in miniaudio (`ma_standard_channel_ma
 
 
 
-6.3. Resampling
+8.3. Resampling
 ---------------
 Resampling is achieved with the `ma_resampler` object. To create a resampler object, do something
 like the following:
@@ -948,12 +1720,12 @@ retrieved in terms of both the input rate and the output rate with
 `ma_resampler_get_input_latency()` and `ma_resampler_get_output_latency()`.
 
 
-6.3.1. Resampling Algorithms
+8.3.1. Resampling Algorithms
 ----------------------------
 The choice of resampling algorithm depends on your situation and requirements.
 
 
-6.3.1.1. Linear Resampling
+8.3.1.1. Linear Resampling
 --------------------------
 The linear resampler is the fastest, but comes at the expense of poorer quality. There is, however,
 some control over the quality of the linear resampler which may make it a suitable option depending
@@ -973,7 +1745,7 @@ The API for the linear resampler is the same as the main resampler API, only it'
 
 
 
-6.4. General Data Conversion
+8.4. General Data Conversion
 ----------------------------
 The `ma_data_converter` API can be used to wrap sample format conversion, channel conversion and
 resampling into one operation. This is what miniaudio uses internally to convert between the format
@@ -1065,10 +1837,10 @@ is required. This can be retrieved in terms of both the input rate and the outpu
 
 
 
-7. Filtering
+9. Filtering
 ============
 
-7.1. Biquad Filtering
+9.1. Biquad Filtering
 ---------------------
 Biquad filtering is achieved with the `ma_biquad` API. Example:
 
@@ -1109,7 +1881,7 @@ registers to 0. Note that changing the format or channel count after initializat
 will result in an error.
 
 
-7.2. Low-Pass Filtering
+9.2. Low-Pass Filtering
 -----------------------
 Low-pass filtering is achieved with the following APIs:
 
@@ -1168,7 +1940,7 @@ chain. If an odd filter order is specified, a first order filter will be applied
 series of second order filters in a chain.
 
 
-7.3. High-Pass Filtering
+9.3. High-Pass Filtering
 ------------------------
 High-pass filtering is achieved with the following APIs:
 
@@ -1184,7 +1956,7 @@ High-pass filters work exactly the same as low-pass filters, only the APIs are c
 `ma_hpf2` and `ma_hpf`. See example code for low-pass filters for example usage.
 
 
-7.4. Band-Pass Filtering
+9.4. Band-Pass Filtering
 ------------------------
 Band-pass filtering is achieved with the following APIs:
 
@@ -1201,7 +1973,7 @@ band-pass filters must be an even number which means there is no first order ban
 unlike low-pass and high-pass filters.
 
 
-7.5. Notch Filtering
+9.5. Notch Filtering
 --------------------
 Notch filtering is achieved with the following APIs:
 
@@ -1212,7 +1984,7 @@ Notch filtering is achieved with the following APIs:
     +-----------+------------------------------------------+
 
 
-7.6. Peaking EQ Filtering
+9.6. Peaking EQ Filtering
 -------------------------
 Peaking filtering is achieved with the following APIs:
 
@@ -1223,7 +1995,7 @@ Peaking filtering is achieved with the following APIs:
     +----------+------------------------------------------+
 
 
-7.7. Low Shelf Filtering
+9.7. Low Shelf Filtering
 ------------------------
 Low shelf filtering is achieved with the following APIs:
 
@@ -1237,7 +2009,7 @@ Where a high-pass filter is used to eliminate lower frequencies, a low shelf fil
 just turn them down rather than eliminate them entirely.
 
 
-7.8. High Shelf Filtering
+9.8. High Shelf Filtering
 -------------------------
 High shelf filtering is achieved with the following APIs:
 
@@ -1254,11 +2026,11 @@ the high shelf filter does the same thing for high frequencies.
 
 
 
-8. Waveform and Noise Generation
-================================
+10. Waveform and Noise Generation
+=================================
 
-8.1. Waveforms
---------------
+10.1. Waveforms
+---------------
 miniaudio supports generation of sine, square, triangle and sawtooth waveforms. This is achieved
 with the `ma_waveform` API. Example:
 
@@ -1302,8 +2074,8 @@ Below are the supported waveform types:
 
 
 
-8.2. Noise
-----------
+10.2. Noise
+-----------
 miniaudio supports generation of white, pink and Brownian noise via the `ma_noise` API. Example:
 
     ```c
@@ -1352,8 +2124,8 @@ Below are the supported noise types.
 
 
 
-9. Audio Buffers
-================
+11. Audio Buffers
+=================
 miniaudio supports reading from a buffer of raw audio data via the `ma_audio_buffer` API. This can
 read from memory that's managed by the application, but can also handle the memory management for
 you internally. Memory management is flexible and should support most use cases.
@@ -1454,7 +2226,7 @@ as an error when returned by `ma_audio_buffer_unmap()`.
 
 
 
-10. Ring Buffers
+12. Ring Buffers
 ================
 miniaudio supports lock free (single producer, single consumer) ring buffers which are exposed via
 the `ma_rb` and `ma_pcm_rb` APIs. The `ma_rb` API operates on bytes, whereas the `ma_pcm_rb`
@@ -1523,7 +2295,7 @@ producer thread.
 
 
 
-11. Backends
+13. Backends
 ============
 The following backends are supported by miniaudio.
 
@@ -1549,7 +2321,7 @@ The following backends are supported by miniaudio.
 
 Some backends have some nuance details you may want to be aware of.
 
-11.1. WASAPI
+13.1. WASAPI
 ------------
 - Low-latency shared mode will be disabled when using an application-defined sample rate which is
   different to the device's native sample rate. To work around this, set `wasapi.noAutoConvertSRC`
@@ -1558,13 +2330,13 @@ Some backends have some nuance details you may want to be aware of.
   will result in miniaudio's internal resampler being used instead which will in turn enable the
   use of low-latency shared mode.
 
-11.2. PulseAudio
+13.2. PulseAudio
 ----------------
 - If you experience bad glitching/noise on Arch Linux, consider this fix from the Arch wiki:
   https://wiki.archlinux.org/index.php/PulseAudio/Troubleshooting#Glitches,_skips_or_crackling.
   Alternatively, consider using a different backend such as ALSA.
 
-11.3. Android
+13.3. Android
 -------------
 - To capture audio on Android, remember to add the RECORD_AUDIO permission to your manifest:
   `<uses-permission android:name="android.permission.RECORD_AUDIO" />`
@@ -1578,7 +2350,7 @@ Some backends have some nuance details you may want to be aware of.
   miniaudio's built-in resampler is to take advantage of any potential device-specific
   optimizations the driver may implement.
 
-11.4. UWP
+13.4. UWP
 ---------
 - UWP only supports default playback and capture devices.
 - UWP requires the Microphone capability to be enabled in the application's manifest (Package.appxmanifest):
@@ -1592,7 +2364,7 @@ Some backends have some nuance details you may want to be aware of.
     </Package>
     ```
 
-11.5. Web Audio / Emscripten
+13.5. Web Audio / Emscripten
 ----------------------------
 - You cannot use `-std=c*` compiler flags, nor `-ansi`. This only applies to the Emscripten build.
 - The first time a context is initialized it will create a global object called "miniaudio" whose
@@ -1606,7 +2378,7 @@ Some backends have some nuance details you may want to be aware of.
 
 
 
-12. Miscellaneous Notes
+14. Miscellaneous Notes
 =======================
 - Automatic stream routing is enabled on a per-backend basis. Support is explicitly enabled for
   WASAPI and Core Audio, however other backends such as PulseAudio may naturally support it, though
diff --git a/research/miniaudio_engine.h b/research/miniaudio_engine.h
index 0b0d5588..c7207b9b 100644
--- a/research/miniaudio_engine.h
+++ b/research/miniaudio_engine.h
@@ -35,780 +35,6 @@ The best resource to use when understanding the API is the function declarations
 extern "C" {
 #endif
 
-/*
-Resource Management
-===================
-Many programs will want to manage sound resources for things such as reference counting and
-streaming. This is supported by miniaudio via the `ma_resource_manager` API.
-
-The resource manager is mainly responsible for the following:
-
-    1) Loading of sound files into memory with reference counting.
-    2) Streaming of sound data
-
-When loading a sound file, the resource manager will give you back a `ma_data_source` compatible
-object called `ma_resource_manager_data_source`. This object can be passed into any
-`ma_data_source` API which is how you can read and seek audio data. When loading a sound file, you
-specify whether or not you want the sound to be fully loaded into memory (and optionally
-pre-decoded) or streamed. When loading into memory, you can also specify whether or not you want
-the data to be loaded asynchronously.
-
-The example below is how you can initialize a resource manager using it's default configuration:
-
-    ```c
-    ma_resource_manager_config config;
-    ma_resource_manager resourceManager;
-
-    config = ma_resource_manager_config_init();
-    result = ma_resource_manager_init(&config, &resourceManager);
-    if (result != MA_SUCCESS) {
-        ma_device_uninit(&device);
-        printf("Failed to initialize the resource manager.");
-        return -1;
-    }
-    ```
-
-You can configure the format, channels and sample rate of the decoded audio data. By default it
-will use the file's native data format, but you can configure it to use a consistent format. This
-is useful for offloading the cost of data conversion to load time rather than dynamically
-converting a mixing time. To do this, you configure the decoded format, channels and sample rate
-like the code below:
-
-    ```c
-    config = ma_resource_manager_config_init();
-    config.decodedFormat     = device.playback.format;
-    config.decodedChannels   = device.playback.channels;
-    config.decodedSampleRate = device.sampleRate;
-    ```
-
-In the code above, the resource manager will be configured so that any decoded audio data will be
-pre-converted at load time to the device's native data format. If instead you used defaults and
-the data format of the file did not match the device's data format, you would need to convert the
-data at mixing time which may be prohibitive in high-performance and large scale scenarios like
-games.
-
-Asynchronicity is achieved via a job system. When an operation needs to be performed, such as the
-decoding of a page, a job will be posted to a queue which will then be processed by a job thread.
-By default there will be only one job thread running, but this can be configured, like so:
-
-    ```c
-    config = ma_resource_manager_config_init();
-    config.jobThreadCount = MY_JOB_THREAD_COUNT;
-    ```
-
-By default job threads are managed internally by the resource manager, however you can also self
-manage your job threads if, for example, you want to integrate the job processing into your
-existing job infrastructure, or if you simply don't like the way the resource manager does it. To
-do this, just set the job thread count to 0 and process jobs manually. To process jobs, you first
-need to retrieve a job using `ma_resource_manager_next_job()` and then process it using
-`ma_resource_manager_process_job()`:
-
-    ```c
-    config = ma_resource_manager_config_init();
-    config.jobThreadCount = 0;                            // Don't manage any job threads internally.
-    config.flags = MA_RESOURCE_MANAGER_FLAG_NON_BLOCKING; // Optional. Makes `ma_resource_manager_next_job()` non-blocking.
-
-    // ... Initialize your custom job threads ...
-
-    void my_custom_job_thread(...)
-    {
-        for (;;) {
-            ma_resource_manager_job job;
-            ma_result result = ma_resource_manager_next_job(pMyResourceManager, &job);
-            if (result != MA_SUCCESS) {
-                if (result == MA_NOT_DATA_AVAILABLE) {
-                    // No jobs are available. Keep going. Will only get this if the resource manager was initialized
-                    // with MA_RESOURCE_MANAGER_FLAG_NON_BLOCKING.
-                    continue;
-                } else if (result == MA_CANCELLED) {
-                    // MA_RESOURCE_MANAGER_JOB_QUIT was posted. Exit.
-                    break;
-                } else {
-                    // Some other error occurred.
-                    break;
-                }
-            }
-
-            ma_resource_manager_process_job(pMyResourceManager, &job);
-        }
-    }
-    ```
-
-In the example above, the `MA_RESOURCE_MANAGER_JOB_QUIT` event is the used as the termination indicator, but you can
-use whatever you would like to terminate the thread. The call to `ma_resource_manager_next_job()`
-is blocking by default, by can be configured to be non-blocking by initializing the resource
-manager with the `MA_RESOURCE_MANAGER_FLAG_NON_BLOCKING` configuration flag.
-
-When loading a file, it's sometimes convenient to be able to customize how files are opened and
-read. This can be done by setting `pVFS` member of the resource manager's config:
-
-    ```c
-    // Initialize your custom VFS object. See documentation for VFS for information on how to do this.
-    my_custom_vfs vfs = my_custom_vfs_init();
-
-    config = ma_resource_manager_config_init();
-    config.pVFS = &vfs;
-    ```
-
-If you do not specify a custom VFS, the resource manager will use the operating system's normal
-file operations. This is default.
-
-To load a sound file and create a data source, call `ma_resource_manager_data_source_init()`. When
-loading a sound you need to specify the file path and options for how the sounds should be loaded.
-By default a sound will be loaded synchronously. The returned data source is owned by the caller
-which means the caller is responsible for the allocation and freeing of the data source. Below is
-an example for initializing a data source:
-
-    ```c
-    ma_resource_manager_data_source dataSource;
-    ma_result result = ma_resource_manager_data_source_init(pResourceManager, pFilePath, flags, &dataSource);
-    if (result != MA_SUCCESS) {
-        // Error.
-    }
-
-    // ...
-
-    // A ma_resource_manager_data_source object is compatible with the `ma_data_source` API. To read data, just call
-    // the `ma_data_source_read_pcm_frames()` like you would with any normal data source.
-    result = ma_data_source_read_pcm_frames(&dataSource, pDecodedData, frameCount, &framesRead);
-    if (result != MA_SUCCESS) {
-        // Failed to read PCM frames.
-    }
-
-    // ...
-
-    ma_resource_manager_data_source_uninit(pResourceManager, &dataSource);
-    ```
-
-The `flags` parameter specifies how you want to perform loading of the sound file. It can be a
-combination of the following flags:
-
-    ```
-    MA_DATA_SOURCE_STREAM
-    MA_DATA_SOURCE_DECODE
-    MA_DATA_SOURCE_ASYNC
-    ```
-
-When no flags are specified (set to 0), the sound will be fully loaded into memory, but not
-decoded, meaning the raw file data will be stored in memory, and then dynamically decoded when
-`ma_data_source_read_pcm_frames()` is called. To instead decode the audio data before storing it in
-memory, use the `MA_DATA_SOURCE_DECODE` flag. By default, the sound file will be loaded
-synchronously, meaning `ma_resource_manager_data_source_init()` will only return after the entire
-file has been loaded. This is good for simplicity, but can be prohibitively slow. You can instead
-load the sound asynchronously using the `MA_DATA_SOURCE_ASYNC` flag. This will result in
-`ma_resource_manager_data_source_init()` returning quickly, but no data will be returned by
-`ma_data_source_read_pcm_frames()` until some data is available. When no data is available because
-the asynchronous decoding hasn't caught up, `MA_BUSY` will be returned by
-`ma_data_source_read_pcm_frames()`.
-
-For large sounds, it's often prohibitive to store the entire file in memory. To mitigate this, you
-can instead stream audio data which you can do by specifying the `MA_DATA_SOURCE_STREAM` flag. When
-streaming, data will be decoded in 1 second pages. When a new page needs to be decoded, a job will
-be posted to the job queue and then subsequently processed in a job thread.
-
-When loading asynchronously, it can be useful to poll whether or not loading has finished. Use
-`ma_resource_manager_data_source_result()` to determine this. For in-memory sounds, this will
-return `MA_SUCCESS` when the file has been *entirely* decoded. If the sound is still being decoded,
-`MA_BUSY` will be returned. Otherwise, some other error code will be returned if the sound failed
-to load. For streaming data sources, `MA_SUCCESS` will be returned when the first page has been
-decoded and the sound is ready to be played. If the first page is still being decoded, `MA_BUSY`
-will be returned. Otherwise, some other error code will be returned if the sound failed to load.
-
-For in-memory sounds, reference counting is used to ensure the data is loaded only once. This means
-multiple calls to `ma_resource_manager_data_source_init()` with the same file path will result in
-the file data only being loaded once. Each call to `ma_resource_manager_data_source_init()` must be
-matched up with a call to `ma_resource_manager_data_source_uninit()`. Sometimes it can be useful
-for a program to register self-managed raw audio data and associate it with a file path. Use the
-`ma_resource_manager_register_*()` and `ma_resource_manager_unregister_*()` APIs to do this.
-`ma_resource_manager_register_decoded_data()` is used to associate a pointer to raw, self-managed
-decoded audio data in the specified data format with the specified name. Likewise,
-`ma_resource_manager_register_encoded_data()` is used to associate a pointer to raw self-managed
-encoded audio data (the raw file data) with the specified name. Note that these names need not be
-actual file paths. When `ma_resource_manager_data_source_init()` is called (without the
-`MA_DATA_SOURCE_STREAM` flag), the resource manager will look for these explicitly registered data
-buffers and, if found, will use it as the backing data for the data source. Note that the resource
-manager does *not* make a copy of this data so it is up to the caller to ensure the pointer stays
-valid for it's lifetime. Use `ma_resource_manager_unregister_data()` to unregister the self-managed
-data. It does not make sense to use the `MA_DATA_SOURCE_STREAM` flag with a self-managed data
-pointer. When `MA_DATA_SOURCE_STREAM` is specified, it will try loading the file data through the
-VFS.
-
-
-Resource Manager Implementation Details
----------------------------------------
-Resources are managed in two main ways:
-
-    1) By storing the entire sound inside an in-memory buffer (referred to as a data buffer)
-    2) By streaming audio data on the fly (referred to as a data stream)
-
-A resource managed data source (`ma_resource_manager_data_source`) encapsulates a data buffer or
-data stream, depending on whether or not the data source was initialized with the
-`MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_STREAM` flag. If so, it will make use of a
-`ma_resource_manager_data_stream` object. Otherwise it will use a `ma_resource_manager_data_buffer`
-object. Both of these objects are data sources which means they can be used with any
-`ma_data_source_*()` API.
-
-Another major feature of the resource manager is the ability to asynchronously decode audio files.
-This relieves the audio thread of time-consuming decoding which can negatively affect scalability
-due to the audio thread needing to complete it's work extremely quickly to avoid glitching.
-Asynchronous decoding is achieved through a job system. There is a central multi-producer,
-multi-consumer, lock-free, fixed-capacity job queue. When some asynchronous work needs to be done,
-a job is posted to the queue which is then read by a job thread. The number of job threads can be
-configured for improved scalability, and job threads can all run in parallel without needing to
-worry about the order of execution (how this is achieved is explained below).
-
-When a sound is being loaded asynchronously, playback can begin before the sound has been fully
-decoded. This enables the application to start playback of the sound quickly, while at the same
-time allowing to resource manager to keep loading in the background. Since there may be less
-threads than the number of sounds being loaded at a given time, a simple scheduling system is used
-to keep decoding time balanced and fair. The resource manager solves this by splitting decoding
-into chunks called pages. By default, each page is 1 second long. When a page has been decoded, a
-new job will be posted to start decoding the next page. By dividing up decoding into pages, an
-individual sound shouldn't ever delay every other sound from having their first page decoded. Of
-course, when loading many sounds at the same time, there will always be an amount of time required
-to process jobs in the queue so in heavy load situations there will still be some delay. To
-determine if a data source is ready to have some frames read, use
-`ma_resource_manager_data_source_get_available_frames()`. This will return the number of frames
-available starting from the current position.
-
-
-Data Buffers
-------------
-When the `MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_STREAM` flag is excluded at initialization time, the
-resource manager will try to load the data into an in-memory data buffer. Before doing so, however,
-it will first check if the specified file has already been loaded. If so, it will increment a
-reference counter and just use the already loaded data. This saves both time and memory. A binary
-search tree (BST) is used for storing data buffers as it has good balance between efficiency and
-simplicity. The key of the BST is a 64-bit hash of the file path that was passed into
-`ma_resource_manager_data_source_init()`. The advantage of using a hash is that it saves memory
-over storing the entire path, has faster comparisons, and results in a mostly balanced BST due to
-the random nature of the hash. The disadvantage is that file names are case-sensitive. If this is
-an issue, you should normalize your file names to upper- or lower-case before initializing your
-data sources.
-
-When a sound file has not already been loaded and the `MMA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_ASYNC`
-is excluded, the file will be decoded synchronously by the calling thread. There are two options
-for controlling how the audio is stored in the data buffer - encoded or decoded. When the
-`MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_DECODE` option is excluded, the raw file data will be stored
-in memory. Otherwise the sound will be decoded before storing it in memory. Synchronous loading is
-a very simple and standard process of simply adding an item to the BST, allocating a block of
-memory and then decoding (if `MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_DECODE` is specified).
-
-When the `MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_ASYNC` flag is specified, loading of the data buffer
-is done asynchronously. In this case, a job is posted to the queue to start loading and then the
-function immediately returns, setting an internal result code to `MA_BUSY`. This result code is
-returned when the program calls `ma_resource_manager_data_source_result()`. When decoding has fully
-completed `MA_RESULT` will be returned. This can be used to know if loading has fully completed.
-
-When loading asynchronously, a single job is posted to the queue of the type
-`MA_RESOURCE_MANAGER_JOB_LOAD_DATA_BUFFER_NODE`. This involves making a copy of the file path and
-associating it with job. When the job is processed by the job thread, it will first load the file
-using the VFS associated with the resource manager. When using a custom VFS, it's important that it
-be completely thread-safe because it will be used from one or more job threads at the same time.
-Individual files should only ever be accessed by one thread at a time, however. After opening the
-file via the VFS, the job will determine whether or not the file is being decoded. If not, it
-simply allocates a block of memory and loads the raw file contents into it and returns. On the
-other hand, when the file is being decoded, it will first allocate a decoder on the heap and
-initialize it. Then it will check if the length of the file is known. If so it will allocate a
-block of memory to store the decoded output and initialize it to silence. If the size is unknown,
-it will allocate room for one page. After memory has been allocated, the first page will be
-decoded. If the sound is shorter than a page, the result code will be set to `MA_SUCCESS` and the
-completion event will be signalled and loading is now complete. If, however, there is more to
-decode, a job with the code `MA_RESOURCE_MANAGER_JOB_PAGE_DATA_BUFFER_NODE` is posted. This job
-will decode the next page and perform the same process if it reaches the end. If there is more to
-decode, the job will post another `MA_RESOURCE_MANAGER_JOB_PAGE_DATA_BUFFER_NODE` job which will
-keep on happening until the sound has been fully decoded. For sounds of an unknown length, the
-buffer will be dynamically expanded as necessary, and then shrunk with a final realloc() when the
-end of the file has been reached.
-
-
-Data Streams
-------------
-Data streams only ever store two pages worth of data for each instance. They are most useful for
-large sounds like music tracks in games that would consume too much memory if fully decoded in
-memory. After every frame from a page has been read, a job will be posted to load the next page
-which is done from the VFS.
-
-For data streams, the `MA_RESOURCE_MANAGER_DATA_SOURCE_FLAG_ASYNC` flag will determine whether or
-not initialization of the data source waits until the two pages have been decoded. When unset,
-`ma_resource_manager_data_source_init()` will wait until the two pages have been loaded, otherwise
-it will return immediately.
-
-When frames are read from a data stream using `ma_resource_manager_data_source_read_pcm_frames()`,
-`MA_BUSY` will be returned if there are no frames available. If there are some frames available,
-but less than the number requested, `MA_SUCCESS` will be returned, but the actual number of frames
-read will be less than the number requested. Due to the asymchronous nature of data streams,
-seeking is also asynchronous. If the data stream is in the middle of a seek, `MA_BUSY` will be
-returned when trying to read frames.
-
-When `ma_resource_manager_data_source_read_pcm_frames()` results in a page getting fully consumed
-a job is posted to load the next page. This will be posted from the same thread that called
-`ma_resource_manager_data_source_read_pcm_frames()` which should be lock-free.
-
-Data streams are uninitialized by posting a job to the queue, but the function won't return until
-that job has been processed. The reason for this is that the caller owns the data stream object and
-therefore miniaudio needs to ensure everything completes before handing back control to the caller.
-Also, if the data stream is uninitialized while pages are in the middle of decoding, they must
-complete before destroying any underlying object and the job system handles this cleanly.
-
-
-Job Queue
----------
-The resource manager uses a job queue which is multi-producer, multi-consumer, lock-free and
-fixed-capacity. The lock-free property of the queue is achieved using the algorithm described by
-Michael and Scott: Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared
-Memory Multiprocessors. In order for this to work, only a fixed number of jobs can be allocated and
-inserted into the queue which is done through a lock-free data structure for allocating an index
-into a fixed sized array, with reference counting for mitigation of the ABA problem. The reference
-count is 32-bit.
-
-For many types of jobs it's important that they execute in a specific order. In these cases, jobs
-are executed serially. For the resource manager, serial execution of jobs is only required on a
-per-object basis (per data buffer or per data stream). Each of these objects stores an execution
-counter. When a job is posted it is associated with an execution counter. When the job is
-processed, it checks if the execution counter of the job equals the execution counter of the
-owning object and if so, processes the job. If the counters are not equal, the job will be posted
-back onto the job queue for later processing. When the job finishes processing the execution order
-of the main object is incremented. This system means the no matter how many job threads are
-executing, decoding of an individual sound will always get processed serially. The advantage to
-having multiple threads comes into play when loading multiple sounds at the time time.
-*/
-
-
-/*
-Routing Infrastructure
-======================
-miniaudio's routing infrastructure follows a node graph paradigm. The idea is that you create a
-node whose outputs are attached to inputs of another node, thereby creating a graph. There are
-different types of nodes, with each node in the graph processing input data to produce output,
-which is then fed through the chain. Each node in the graph can apply their own custom effects. At
-the end of the graph is an endpoint which represents the end of the chain and is where the final
-output is ultimately extracted from.
-
-Each node has a number of input buses and a number of output buses. An output bus from a node is
-attached to an input bus of another. Multiple nodes can connect their output buses to another
-node's input bus, in which case their outputs will be mixed before processing by the node.
-
-Each input bus must be configured to accept the same number of channels, but input buses and output
-buses can each have different channel counts, in which case miniaudio will automatically convert
-the input data to the output channel count before processing. The number of channels of an output
-bus of one node must match the channel count of the input bus it's attached to. The channel counts
-cannot be changed after the node has been initialized. If you attempt to attach an output bus to
-an input bus with a different channel count, attachment will fail.
-
-To use a node graph, you first need to initialize a `ma_node_graph` object. This is essentially a
-container around the entire graph. The `ma_node_graph` object is required for some thread-safety
-issues which will be explained later. A `ma_node_graph` object is initialized using miniaudio's
-standard config/init system:
-
-    ```c
-    ma_node_graph_config nodeGraphConfig = ma_node_graph_config_init(myChannelCount);
-
-    result = ma_node_graph_init(&nodeGraphConfig, NULL, &nodeGraph);    // Second parameter is a pointer to allocation callbacks.
-    if (result != MA_SUCCESS) {
-        // Failed to initialize node graph.
-    }
-    ```
-
-When you initialize the node graph, you're specifying the channel count of the endpoint. The
-endpoint is a special node which has one input bus and one output bus, both of which have the
-same channel count, which is specified in the config. Any nodes that connect directly to the
-endpoint must be configured such that their output buses have the same channel count. When you read
-audio data from the node graph, it'll have the channel count you specified in the config. To read
-data from the graph:
-
-    ```c
-    ma_uint32 framesRead;
-    result = ma_node_graph_read_pcm_frames(&nodeGraph, pFramesOut, frameCount, &framesRead);
-    if (result != MA_SUCCESS) {
-        // Failed to read data from the node graph.
-    }
-    ```
-
-When you read audio data, miniaudio starts at the node graph's endpoint node which then pulls in
-data from it's input attachments, which in turn recusively pull in data from their inputs, and so
-on. At the very base of the graph there will be some kind of data source node which will have zero
-inputs and will instead read directly from a data source. The base nodes don't literally need to
-read from a `ma_data_source` object, but they will always have some kind of underlying object that
-sources some kind of audio. The `ma_data_source_node` node can be used to read from a
-`ma_data_source`. Data is always in floating-point format and in the number of channels you
-specified when the graph was initialized. The sample rate is defined by the underlying data sources.
-It's up to you to ensure they use a consistent and appropraite sample rate.
-
-The `ma_node` API is designed to allow custom nodes to be implemented with relative ease, but
-miniaudio includes a few stock nodes for common functionality. This is how you would initialize a
-node which reads directly from a data source (`ma_data_source_node`) which is an example of one
-of the stock nodes that comes with miniaudio:
-
-    ```c
-    ma_data_source_node_config config = ma_data_source_node_config_init(pMyDataSource, isLooping);
-
-    ma_data_source_node dataSourceNode;
-    result = ma_data_source_node_init(&nodeGraph, &config, NULL, &dataSourceNode);
-    if (result != MA_SUCCESS) {
-        // Failed to create data source node.
-    }
-    ```
-
-The data source node will use the output channel count to determine the channel count of the output
-bus. There will be 1 output bus and 0 input buses (data will be drawn directly from the data
-source). The data source must output to floating-point (ma_format_f32) or else an error will be
-returned from `ma_data_source_node_init()`.
-
-By default the node will not be attached to the graph. To do so, use `ma_node_attach_output_bus()`:
-
-    ```c
-    result = ma_node_attach_output_bus(&dataSourceNode, 0, ma_node_graph_get_endpoint(&nodeGraph), 0);
-    if (result != MA_SUCCESS) {
-        // Failed to attach node.
-    }
-    ```
-
-The code above connects the data source node directly to the endpoint. Since the data source node
-has only a single output bus, the index will always be 0. Likewise, the endpoint only has a single
-input bus which means the input bus index will also always be 0.
-
-To detach a specific output bus, use `ma_node_detach_output_bus()`. To detach all output buses, use
-`ma_node_detach_all_output_buses()`. If you want to just move the output bus from one attachment to
-another, you do not need to detach first. You can just call `ma_node_attach_output_bus()` and it'll
-deal with it for you.
-
-Less frequently you may want to create a specialized node. This will be a node where you implement
-your own processing callback to apply a custom effect of some kind. This is similar to initalizing
-one of the stock node types, only this time you need to specify a pointer to a vtable containing a
-pointer to the processing function and the number of input and output buses. Example:
-
-    ```c
-    static void my_custom_node_process_pcm_frames(ma_node* pNode, const float** ppFramesIn, ma_uint32* pFrameCountIn, float** ppFramesOut, ma_uint32* pFrameCountOut)
-    {
-        // Do some processing of ppFramesIn (one stream of audio data per input bus)
-        const float* pFramesIn_0 = ppFramesIn[0]; // Input bus @ index 0.
-        const float* pFramesIn_1 = ppFramesIn[1]; // Input bus @ index 1.
-        float* pFramesOut_0 = ppFramesOut[0];     // Output bus @ index 0.
-
-        // Do some processing. On input, `pFrameCountIn` will be the number of input frames in each
-        // buffer in `ppFramesIn` and `pFrameCountOut` will be the capacity of each of the buffers
-        // in `ppFramesOut`. On output, `pFrameCountIn` should be set to the number of input frames
-        // your node consumed and `pFrameCountOut` should be set the number of output frames that
-        // were produced.
-        //
-        // You should process as many frames as you can. If your effect consumes input frames at the
-        // same rate as output frames (always the case, unless you're doing resampling), you need
-        // only look at `ppFramesOut` and process that exact number of frames. If you're doing
-        // resampling, you'll need to be sure to set both `pFrameCountIn` and `pFrameCountOut`
-        // properly.
-    }
-
-    static ma_node_vtable my_custom_node_vtable = 
-    {
-        my_custom_node_process_pcm_frames, // The function that will be called process your custom node. This is where you'd implement your effect processing.
-        NULL,   // Optional. A callback for calculating the number of input frames that are required to process a specified number of output frames.
-        2,      // 2 input buses.
-        1,      // 1 output bus.
-        0       // Default flags.
-    };
-
-    ...
-
-    // Each bus needs to have a channel count specified. To do this you need to specify the channel
-    // counts in an array and then pass that into the node config.
-    ma_uint32 inputChannels[2];     // Equal in size to the number of input channels specified in the vtable.
-    ma_uint32 outputChannels[1];    // Equal in size to the number of output channels specicied in the vtable.
-
-    inputChannels[0]  = channelsIn;
-    inputChannels[1]  = channelsIn;
-    outputChannels[0] = channelsOut;
-    
-    ma_node_config nodeConfig = ma_node_config_init();
-    nodeConfig.vtable          = &my_custom_node_vtable;
-    nodeConfig.pInputChannels  = inputChannels;
-    nodeConfig.pOutputChannels = outputChannels;
-
-    ma_node_base node;
-    result = ma_node_init(&nodeGraph, &nodeConfig, NULL, &node);
-    if (result != MA_SUCCESS) {
-        // Failed to initialize node.
-    }
-    ```
-
-When initializing a custom node, as in the code above, you'll normally just place your vtable in
-static space. The number of input and output buses are specified as part of the vtable. If you need
-a variable number of buses on a per-node bases, the vtable should have the relevant bus count set
-to `MA_NODE_BUS_COUNT_UNKNOWN`. In this case, the bus count should be set in the node config:
-
-    ```c
-    static ma_node_vtable my_custom_node_vtable = 
-    {
-        my_custom_node_process_pcm_frames, // The function that will be called process your custom node. This is where you'd implement your effect processing.
-        NULL,   // Optional. A callback for calculating the number of input frames that are required to process a specified number of output frames.
-        MA_NODE_BUS_COUNT_UNKNOWN,  // The number of input buses is determined on a per-node basis.
-        1,      // 1 output bus.
-        0       // Default flags.
-    };
-
-    ...
-
-    ma_node_config nodeConfig = ma_node_config_init();
-    nodeConfig.vtable          = &my_custom_node_vtable;
-    nodeConfig.inputBusCount   = myBusCount;        // <-- Since the vtable specifies MA_NODE_BUS_COUNT_UNKNOWN, the input bus count should be set here.
-    nodeConfig.pInputChannels  = inputChannels;     // <-- Make sure there are nodeConfig.inputBusCount elements in this array.
-    nodeConfig.pOutputChannels = outputChannels;    // <-- The vtable specifies 1 output bus, so there must be 1 element in this array.
-    ```
-
-In the above example it's important to never set the `inputBusCount` and `outputBusCount` members
-to anything other than their defaults if the vtable specifies an explicit count. They can only be
-set if the vtable specifies MA_NODE_BUS_COUNT_UNKNOWN in the relevant bus count.
-
-Most often you'll want to create a structure to encapsulate your node with some extra data. You
-need to make sure the `ma_node_base` object is your first member of the structure:
-
-    ```c
-    typedef struct
-    {
-        ma_node_base base; // <-- Make sure this is always the first member.
-        float someCustomData;
-    } my_custom_node;
-    ```
-
-By doing this, your object will be compatible with all `ma_node` APIs and you can attach it to the
-graph just like any other node.
-
-In the custom processing callback (`my_custom_node_process_pcm_frames()` in the example above), the
-number of channels for each bus is what was specified by the config when the node was initialized
-with `ma_node_init()`. In addition, all attachments to each of the input buses will have been
-pre-mixed by miniaudio. The config allows you to specify different channel counts for each
-individual input and output bus. It's up to the effect to handle it appropriate, and if it can't,
-return an error in it's initialization routine.
-
-Custom nodes can be assigned some flags to describe their behaviour. These are set via the vtable
-and include the following:
-
-    +-----------------------------------------+---------------------------------------------------+
-    | Flag Name                               | Description                                       |
-    +-----------------------------------------+---------------------------------------------------+
-    | MA_NODE_FLAG_PASSTHROUGH                | Useful for nodes that do not do any kind of audio |
-    |                                         | processing, but are instead used for tracking     |
-    |                                         | time, handling events, etc. Also used by the      |
-    |                                         | internal endpoint node. It reads directly from    |
-    |                                         | the input bus to the output bus. Nodes with this  |
-    |                                         | flag must have exactly 1 input bus and 1 output   |
-    |                                         | bus, and both buses must have the same channel    |
-    |                                         | counts.                                           |
-    +-----------------------------------------+---------------------------------------------------+
-    | MA_NODE_FLAG_CONTINUOUS_PROCESSING      | Causes the processing callback to be called even  |
-    |                                         | when no data is available to be read from input   |
-    |                                         | attachments. This is useful for effects like      |
-    |                                         | echos where there will be a tail of audio data    |
-    |                                         | that still needs to be processed even when the    |
-    |                                         | original data sources have reached their ends.    |
-    +-----------------------------------------+---------------------------------------------------+
-    | MA_NODE_FLAG_ALLOW_NULL_INPUT           | Used in conjunction with                          |
-    |                                         | `MA_NODE_FLAG_CONTINUOUS_PROCESSING`. When this   |
-    |                                         | is set, the `ppFramesIn` parameter of the         |
-    |                                         | processing callback will be set to NULL when      |
-    |                                         | there are no input frames are available. When     |
-    |                                         | this is unset, silence will be posted to the      |
-    |                                         | processing callback.                              |
-    +-----------------------------------------+---------------------------------------------------+
-    | MA_NODE_FLAG_DIFFERENT_PROCESSING_RATES | Used to tell miniaudio that input and output      |
-    |                                         | frames are processed at different rates. You      |
-    |                                         | should set this for any nodes that perform        |
-    |                                         | resampling.                                       |
-    +-----------------------------------------+---------------------------------------------------+
-
-
-If you need to make a copy of an audio stream for effect processing you can use a splitter node
-called `ma_splitter_node`. This takes has 1 input bus and splits the stream into 2 output buses.
-You can use it like this:
-
-    ```c
-    ma_splitter_node_config splitterNodeConfig = ma_splitter_node_config_init(channelsIn, channelsOut);
-
-    ma_splitter_node splitterNode;
-    result = ma_splitter_node_init(&nodeGraph, &splitterNodeConfig, NULL, &splitterNode);
-    if (result != MA_SUCCESS) {
-        // Failed to create node.
-    }
-
-    // Attach your output buses to two different input buses (can be on two different nodes).
-    ma_node_attach_output_bus(&splitterNode, 0, ma_node_graph_get_endpoint(&nodeGraph), 0); // Attach directly to the endpoint.
-    ma_node_attach_output_bus(&splitterNode, 1, &myEffectNode,                          0); // Attach to input bus 0 of some effect node.
-    ```
-
-The volume of an output bus can be configured on a per-bus basis:
-
-    ```c
-    ma_node_set_output_bus_volume(&splitterNode, 0, 0.5f);
-    ma_node_set_output_bus_volume(&splitterNode, 1, 0.5f);
-    ```
-
-In the code above we're using the splitter node from before and changing the volume of each of the
-copied streams.
-
-You can start, stop and mute a node with the following:
-
-    ```c
-    ma_node_set_state(&splitterNode, ma_node_state_started);    // The default state.
-    ma_node_set_state(&splitterNode, ma_node_state_stopped);
-    ma_node_set_state(&splitterNode, ma_node_state_muted);
-    ```
-
-By default the node is in a started state, but since it won't be connected to anything won't
-actually be invoked by the node graph until it's connected. When you stop a node, data will not be
-read from any of it's input connections. You can use this property to stop a group of sounds
-atomically.
-
-You can configure the initial state of a node in it's config:
-
-    ```c
-    nodeConfig.initialState = ma_node_state_stopped;
-    ```
-
-Note that for the stock specialized nodes, all of their configs will have a `nodeConfig` member
-which is the config to use with the base node. This is where the initial state can be configured
-for specialized nodes:
-
-    ```c
-    dataSourceNodeConfig.nodeConfig.initialState = ma_node_state_stopped;
-    ```
-
-When using a specialized node like `ma_data_source_node` or `ma_splitter_node`, be sure to not
-modify the `vtable` member of the `nodeConfig` object.
-
-
-Timing
-------
-The node graph supports starting and stopping nodes at scheduled times. This is especially useful
-for data source nodes where you want to get the node set up, but only start playback at a specific
-time. There are two clocks: local and global.
-
-A local clock is per-node, whereas the global clock is per graph. Scheduling starts and stops can
-only be done based on the global clock because the local clock will not be running while the node
-is stopped. The global clocks advances whenever `ma_node_graph_read_pcm_frames()` is called. On the
-other hand, the local clock only advances when the node's processing callback is fired, and is
-advanced based on the output frame count.
-
-To retrieve the global time, use `ma_node_graph_get_time()`. The global time can be set with
-`ma_node_graph_set_time()` which might be useful if you want to do seeking on a global timeline.
-Getting and setting the local time is similar. Use `ma_node_get_time()` to retrieve the local time,
-and `ma_node_set_time()` to set the local time. The global and local times will be advanced by the
-audio thread, so care should be taken to avoid data races. Ideally you should avoid calling these
-outside of the node processing callbacks which are always run on the audio thread.
-
-There is basic support for scheduling the starting and stopping of nodes. You can only schedule one
-start and one stop at a time. This is mainly intended for putting nodes into a started or stopped
-state in a frame-exact manner. Without this mechanism, starting and stopping of a node is limited
-to the resolution of a call to `ma_node_graph_read_pcm_frames()` which would typically be in blocks
-of several milliseconds. The following APIs can be used for scheduling node states:
-
-    ```c
-    ma_node_set_state_time()
-    ma_node_get_state_time()
-    ```
-
-The time is absolute and must be based on the global clock. An example is below:
-
-    ```c
-    ma_node_set_state_time(&myNode, ma_node_state_started, sampleRate*1);   // Delay starting to 1 second.
-    ma_node_set_state_time(&myNode, ma_node_state_stopped, sampleRate*5);   // Delay stopping to 5 seconds.
-    ```
-
-An example for changing the state using a relative time.
-
-    ```c
-    ma_node_set_state_time(&myNode, ma_node_state_started, sampleRate*1 + ma_node_graph_get_time(&myNodeGraph));
-    ma_node_set_state_time(&myNode, ma_node_state_stopped, sampleRate*5 + ma_node_graph_get_time(&myNodeGraph));
-    ```
-
-Note that due to the nature of multi-threading the times may not be 100% exact. If this is an
-issue, consider scheduling state changes from within a processing callback. An idea might be to
-have some kind of passthrough trigger node that is used specifically for tracking time and handling
-events.
-
-
-
-Thread Safety and Locking
--------------------------
-When processing audio, it's ideal not to have any kind of locking in the audio thread. Since it's
-expected that `ma_node_graph_read_pcm_frames()` would be run on the audio thread, it does so
-without the use of any locks. This section discusses the implementation used by miniaudio and goes
-over some of the compromises employed by miniaudio to achieve this goal. Note that the current
-implementation may not be ideal - feedback and critiques are most welcome.
-
-The node graph API is not *entirely* lock-free. Only `ma_node_graph_read_pcm_frames()` is expected
-to be lock-free. Attachment, detachment and uninitialization of nodes use locks to simplify the
-implementation, but are crafted in a way such that such locking is not required when reading audio
-data from the graph. Locking in these areas are achieved by means of spinlocks.
-
-The main complication with keeping `ma_node_graph_read_pcm_frames()` lock-free stems from the fact
-that a node can be uninitialized, and it's memory potentially freed, while in the middle of being
-processed on the audio thread. There are times when the audio thread will be referencing a node,
-which means the uninitialization process of a node needs to make sure it delays returning until the
-audio thread is finished so that control is not handed back to the caller thereby giving them a
-chance to free the node's memory.
-
-When the audio thread is processing a node, it does so by reading from each of the output buses of
-the node. In order for a node to process data for one of it's output buses, it needs to read from
-each of it's input buses, and so on an so forth. It follows that once all output buses of a node
-are detached, the node as a whole will be disconnected and no further processing will occur unless
-it's output buses are reattached, which won't be happening when the node is being uninitialized.
-By having `ma_node_detach_output_bus()` wait until the audio thread is finished with it, we can
-simplify a few things, at the expense of making `ma_node_detach_output_bus()` a bit slower. By
-doing this, the implementation of `ma_node_uninit()` becomes trivial - just detach all output
-nodes, followed by each of the attachments to each of it's input nodes, and then do any final clean
-up.
-
-With the above design, the worst-case scenario is `ma_node_detach_output_bus()` taking as long as
-it takes to process the output bus being detached. This will happen if it's called at just the
-wrong moment where the audio thread has just iterated it and has just started processing. The
-caller of `ma_node_detach_output_bus()` will stall until the audio thread is finished, which
-includes the cost of recursively processing it's inputs. This is the biggest compromise made with
-the approach taken by miniaudio for it's lock-free processing system. The cost of detaching nodes
-earlier in the pipeline (data sources, for example) will be cheaper than the cost of detaching
-higher level nodes, such as some kind of final post-processing endpoint. If you need to do mass
-detachments, detach starting from the lowest level nodes and work your way towards the final
-endpoint node (but don't try detaching the node graph's endpoint). If the audio thread is not
-running, detachment will be fast and detachment in any order will be the same. The reason nodes
-need to wait for their input attachments to complete is due to the potential for desyncs between
-data sources. If the node was to terminate processing mid way through processing it's inputs,
-there's a chance that some of the underlying data sources will have been read, but then others not.
-That will then result in a potential desynchronization when detaching and reattaching higher-level
-nodes. A possible solution to this is to have an option when detaching to terminate processing
-before processing all input attachments which should be fairly simple.
-
-Another compromise, albeit less significant, is locking when attaching and detaching nodes. This
-locking is achieved by means of a spinlock in order to reduce memory overhead. A lock is present
-for each input bus and output bus. When an output bus is connected to an input bus, both the output
-bus and input bus is locked. This locking is specifically for attaching and detaching across
-different threads and does not affect `ma_node_graph_read_pcm_frames()` in any way. The locking and
-unlocking is mostly self-explanatory, but a slightly less intuitive aspect comes into it when
-considering that iterating over attachments must not break as a result of attaching or detaching a
-node while iteration is occuring.
-
-Attaching and detaching are both quite simple. When an output bus of a node is attached to an input
-bus of another node, it's added to a linked list. Basically, an input bus is a linked list, where
-each item in the list is and output bus. We have some intentional (and convenient) restrictions on
-what can done with the linked list in order to simplify the implementation. First of all, whenever
-something needs to iterate over the list, it must do so in a forward direction. Backwards iteration
-is not supported. Also, items can only be added to the start of the list.
-
-The linked list is a doubly-linked list where each item in the list (an output bus) holds a pointer
-to the next item in the list, and another to the previous item. A pointer to the previous item is
-only required for fast detachment of the node - it is never used in iteration. This is an
-important property because it means from the perspective of iteration, attaching and detaching of
-an item can be done with a single atomic assignment. This is exploited by both the attachment and
-detachment process. When attaching the node, the first thing that is done is the setting of the
-local "next" and "previous" pointers of the node. After that, the item is "attached" to the list
-by simply performing an atomic exchange with the head pointer. After that, the node is "attached"
-to the list from the perspective of iteration. Even though the "previous" pointer of the next item
-hasn't yet been set, from the perspective of iteration it's been attached because iteration will
-only be happening in a forward direction which means the "previous" pointer won't actually ever get
-used. The same general process applies to detachment. See `ma_node_attach_output_bus()` and
-`ma_node_detach_output_bus()` for the implementation of this mechanism.
-*/
-
-
-
 
 /*
 Engine