View Single Post
Old 02-18-2020, 01:25 PM   #21
Human being with feelings
brummbear's Avatar
Join Date: May 2016
Location: out west
Posts: 124

Hey Philipp,
Firstly let me make clear that I am not an expert on this topic... maybe my thoughts are not overly helpful.

Next step I want is to integrate the tag information from the preset files and use that information for coloring and also applying existing tags to not tagged sounds (but that implies the use of a neural network).
I think this is a very interesting (and ambitious) approach. However, I would first try to hone the feature extraction & dimensional reduction, followed by optional clustering (with or without machine learning). One problem I anticipate is the quantity of the training data even with large tagged preset libraries. This may be compensated by high quality of the preceding feature extraction and (optional) clustering. If you decide to add clustering on top of dimensional reduction one could also have a classical algorithm to extract "dominant" tags for that cluster from pre-tagged libraries.

Either way, in my opinion the most interesting part for practical purposes would be browsing by sonic similarity (= the "humble" approach). Having neural network* created tags on top of that would be "icing on the cake". In order to get sonic similarity right there should be flexibility in the current project phase to play with the features that are extracted and used for dimensional reduction and clustering. As speculated earlier it may depend on the sound material what works best, e.g. percussive sounds would potentially benefit if you included spectral flatness to the extracted features.

As you mentioned in the beginning XO seems to make use of TSNE for its percussive sounds and you wanted to expand the concept also for synths. For a universal approach, maybe it would be useful to include a few more features from librosa in the initial extraction, then exclude non helpful features (which will depend on the sound material) before starting TSNE. Finally, an optional clustering process.

Manuel Sanchez uses GMM and SVM for clustering features and matching to tags. He uses it on biological data sets but this could work really well with audio too. Check this out:

* a GMM or a SVM might be more suitable than a neural network for this task. In particular, a GMM could associate multiple characteristics, e.g. a sound can carry multiple tags like "trumpet", "airy", "pad".

Last edited by brummbear; 02-20-2020 at 05:56 PM.
brummbear is offline   Reply With Quote