Ambient music, artificial intelligence, audio reactive, neural network, interactive installation.
Over several weeks, XSICHT has been trained to match faces and audio. With a training batch of tens of thousands of frames, the AI has learned to construct a human face from any given audio input. What happens when we abstract the input? This is the question, XSICHT tries to answer.
Since it is nothing more than a complex concatenation of intertwined non- linear functions that get amplified or dampened, its complexity is often hard to understand, which is why the intrinsic of an AI is called Hidden Layers or a Blackbox.
XSICHT doubles the unpredictability by feeding it not the voices it was trained on, but music, leading to unexpected results when confronted with various genres or instruments. Harmonic piano music, for example, more often leads to the recreation of female faces, while bassline-driven techno mostly resembles male speakers.
A brief technical overview can be split into data and network architecture.
The former is given to XSICHT in form of a 0.2-second-long spectrogram, calculated using the Short Time Fourier Transformation. To enhance the spatial representation of lower frequencies, the spectrogram is logarithmically recalculated to resemble the human sound perception, called a MEL spectrogram.
The latter takes this input and convolutes it down to a 1×1 pixel sized latent space from where the information is used to deconvolute the compressed information. This is called a U-shaped architecture or more common an Image- to-Image GAN network, but here it is used without skip connections between the de- and convolution pipe. During learning, the counterpart of this generator, the discriminator, works in a patch-based manner.
XSICHT gets input from a live dialog between Synthesizers and acoustic instruments produced by Timo Dufner, a voice or prerecorded sounds that harmonize with the visualization.
- Jens Schindel is a computer scientist based in Tübingen, Germany, born in Karlsruhe in 1991. He studied Media Informatics with a strong focus on Visual Computing, Computer Vision and Computer Graphics and later Machine Learning, respectively Neural Networks. In all his studies, the focus always lays on visually appealing content generation, driven by the beauty of mathematical concepts. After experimenting around with generative visualizations, he quickly changed focus on real-time audio-reactive projections in an interdisciplinary context. facebook.com/tschnz
- Timo Dufner is a musician, visual artist and in the field of media and information technology. As a VJ, he performs as an AudioVideo Live Act while he is also part of various production teams in electronic music. The main focus of his work lies in the exploitation of software failures, so-called glitches, real-time processing, live coding, machine learning/AI as well as the direct interaction of sound and image. timodufner.com