RAVE Audio Models: Deep Learning as a Symbolic Form
This project was developed as the primary research component of my MA thesis, “Deep Learning as Symbolic Form,” which applied Erwin Panofsky’s theory of symbolic form to contemporary AI systems. The hands-on methodology reflects my broader approach to critical AI studies: examining computational systems through direct technical engagement rather than purely theoretical analysis.
The project examines how deep learning systems encode and reconstruct meaning through the training of three RAVE (Realtime Audio Variational autoEncoder) models on distinct audio datasets. Each model was trained for 1,166,800 steps using identical hyperparameters on Google Colab A100 GPUs.
Rather than treating these models as neutral tools for audio synthesis, the project investigates how RAVE’s architecture—its encoding process, latent space construction, and generative decoding—imposes formal logic onto training data. The featured audio samples are random 30-second generations from each model’s latent space, revealing consistent aesthetic patterns of fragmentation and rhythmic incoherence across all three models despite their different source material. These shared qualities suggest that the model’s latent space structures outputs independently of dataset content.
This work positions deep learning as a symbolic form where computational architectures mediate reality through processes of statistical abstraction embedded within planetary-scale computational infrastructure. By making visible the interpretive decisions encoded in model training, the project reveals how deep learning functions not as transparent reproduction but as formalized aesthetic and political intervention.