The #!S/Lab is the research department at MASARY Studios, responsible for research and development that contributes to the creative potential of the studio and its members. It does this by focusing on hardware and software R&D that is independent of any ongoing creative projects. The S/Lab contributes to MASARY Studios by developing new software and hardware tools, plugins, and interfaces that broaden the creative palette at the studio's disposal.
Rather than being driven by the needs or agenda of any specific ongoing project, the S/Lab will carry out a research agenda aiming further into the future. While the S/Lab research might inform future project ideas or developments (components, etc.), its immediate goal is to develop ideas to a point that they can be experimented with internally by studio members, giving us a chance to consider new approaches and ways of creating beautiful, inventive, compelling work. We're still in the early days of the S/Lab, so we're still learning how the S/Lab R&D fits into the studio as a whole, but I'm excited to share our initial undertakings here.
Before sharing some of our early research, I want ot explain the name of the S/Lab.
First, the name: #! S/Lab
For those who have done any programming or scripting, the symbol #! might be familiar. #! (pronounced "hash-bang" or alternatively called a "shebang") is an example of an interpreter directive in Unix operating systems. This symbol frequently appears in the first line of a text file / script, where it is followed by the location of a program or interpreter that should be used to execute the script, along with any additional arguments. More simply put, it lets the creator of the script tell the computer how the script should be run (what interpreter to use), without requiring the executing user to manage or know this themselves. This lets us turn a file into an executable, so it can be double-clicked to run.
The S in #! S/Lab stands for a multitude of possibilities. S might be "Super", "Slick", "Surgical", "Something", "Sexy"? The S takes on whatever meaning we require of it when discussing the #! S/Lab. To relate it back to software programming, the S is polymorphic... it inherits the traits of multiple types or entities.
Lab, well... this one is more obvious than the other parts. This is the MASARY Laboratory. Where some new ideas come from. Where we tinker and seed new tools and technologies until they can be used more broadly by the studio. It's the MASARY Studio spawn-point, the place where we can birth new ideas that aren't called for (yet) by any project. It's where we perform surgery on tools of our trade, preparing them for a wider use across the studio.
Bringing it all back together: the #! S/Lab--pronounced "S-Lab" or, as one word: "Slab" (I propose that the hashbang is silent)--is where MASARY develops new technology, tools, and methods that feed the studio's capabilities and long-term desires and growth.
A.I. Research
For the past three months, the S/Lab has been developing a toolset for creating Artificial Intelligence (A.I.) driven images and video. This toolset provides a couple different neural network architectures for us to experiment with (they function in radically different ways, with different processes and mechanisms of employment). Using A.I. and machine learning (M.L.) for artistic/creative purposes is something relatively new, as many of these methods and technologies have only become more widely available and viable in the last few years. Already, there are a growing number of artists exploring artificial intelligence, both in practical as well as theoretical ways. This is work that is incredibly close to my interests, I've been developing M.L./A.I. tools, bots, and agents for the last four years as the primary focus of my dissertation work. This work has led to the creation of A.I. agents capable of live coding musical performances in languages like Haskell; A.I. systems that generate video in real-time in response to performers' movements or audio input; and A.I. agents that write about art, technology, the relationship of the human body to technology, and A.I. as a creatively disruptive set of methodologies and technologies for art making.
My work within the S/Lab at MASARY has, thus far, focused on further developing available tools for generating images and video with A.I. The resultant images and videos can look eerily reminiscent of the original material (the "training" material that the A.I. learns from), while bringing a decidedly "alien" or "other" quality; reinterpreting the material through a non-human lens.
Some of these developments have already been used in MASARY work: we used similar techniques to create some morphing/interpolating iconography for the Harvard iLab President's Innovation Challenge Awards Ceremony this past spring, as well as for creating some of the visuals for the Refractive Choreographies project with Urbanity Dance.
A.I. Overview
Artificial Intelligence has become something of a hot topic and buzzword in recent years, in large part because hardware capabilities of exploded over the past 10 years led by the realization that GPUs can be repurposed to perform the exact kind of massive, parallel calculations needed to train neural networks. It's a term that gets used a lot, in reference to many different things, but it is usually not clearly defined in news and media publications. A.I. has also been the subject of all sorts of predictions and doomsday-esque prophecies. Ray Kurzweil made the idea of the singularity popular. The "Roko's basilisk" thought experiment was proposed as the ultimate, inescapable A.I.-centered doomsday prophecy. Philosophers like Nick Bostrom have spent considerable time writing and publishing about A.I. ethics and the future of humanity as A.I. becomes more and more capable and powerful. Australian artist and theorist Simon Penny recently published Making Sense: Cognition, Computing, Art, and Embodiment, investigating the history of A.I. and philosophies of intelligence alongisde art making and creative practices. Philosopher Reza Negarestani's book Intelligence and Spirit investigates human intelligence and history through the lens of artificial intelligence (specifically through the idea of developing an Artificial General Intelligence (AGI), which continues to be beyond our technological capabilities). The work by these theorists and scholars is incredibly interesting and important as they help frame debate about humanity and A.I., while joining the already large body of thinking and research that exists in the realms of posthumanism (the work of Donna Haraway, N. Katherine Hayles, and Rosi Braidotti immediately come to mind) and media theory (Mark B.N. Hansen, Brian Massumi, McKenzie Wark, Alexander Galloway, amongst many others).
The present reality of A.I., though, is a bit different from how it is generally portrayed and sensationalized in the mainstream media. A.I. today is largely dominated by neural networks. A neural network is a particular computational approach to problem solving that makes use of specific mathematical techniques and methods (large-scale linear algebra combined with training techniques like gradient descent through backpropagation) to find very complex patterns within some set of data. Neural networks are most commonly used for two different problem types: classification and regression.
An example of classificiation might be looking at an image and being able to identify if a human is present in that image of not.
An example of regression might be trying to predict a stock's price tomorrow, given the stock's value over the past year.
Additionally, there are two major categories of learning approaches with neural networks: supervised and unsupervised. (There are other approaches, as well as cases where things aren't quite so clear-cut, but this is a good starting point). Supervised means that we train the system using labeled data (i.e., a collection of images that we've labeled each as "has human" or "no human"). Unsupervised_ means that we don't have any such labels, and we're hoping the neural network can help us find some meaning within our samples.
A.I. Work in the S/Lab
My current work in the MASARY S/Lab has focused on developing two neural network architectures that we can use for image and video generation: a feed-forward neural network and a VQVAE (Vector-Quantized Variational Auto Encoder).
Feed Forward Neural Network
The feed forward network is an older design that's been around for years. With this approach, we treat the problem of generating images as a regression problem per pixel. A single input contains the coordinate for a pixel along with some indicator about how we want to interpolate between images. The output of this model contains the RGB values (Red, Green, and Blue) for that pixel. We can do this over many different pixel coordinates to generate a complete image.
This is an example of a supervised learning approach.
This approach can be trained using a small number of images effectively, and it can learn quickly. Training time is in the range of an hour or two using a set of 100 images. This approach doesn't work very well for larger sets of images, though, because the network requires so much additionally memory. Additionally, once this network has been trained, it's pretty slow to generate new images. It might take 10-30 seconds to generate a single frame. (If we wanted to generate an animation at 60 frames per second, it could take 10-30 minutes to generate a single second worth of animation). This might not seem so bad if you're familiar with CGI and 3D modeling/animation, where rendering can take a lot of time and power, but we ultimately want to move towards something faster and more efficient.
VQVAE
The VQVAE (Vector-Quantized Variational Auto Encoder), in contrast to the feed forward neural network, was only created in 2017 (the original paper can be found here, the sequel paper is here). This network learns to reconstruct a collection of images that it is trained on. Unlike other neural network approaches, where the input and output are different, the VQVAE uses the same images as both input and output. The middle part of the network, though, compresses the images down into a much smaller representation, and, as referenced by the "VQ" past of its name, this smaller representation is actually quantized into a set of discrete classifications (integers), which are then used to rebuild the original image.
This is an example of unsupervised learning. We let the model learn its own patterns, groupings, and structures within the sample data.
We might think of this kind of auto encoder architecture as a learned compression algorithm. (This kind of auto encoder is better named a dimensionality reduction auto encoder; there are other types of and uses for auto encoders).
Additionally, because the network is constructed with convolutional layers, it is actually learning hierarchical features within the image. These features might be edges or corners at a low level, up to larger shapes and arrangements of these edges and corners at a higher level.
Once it's been trained, we can quickly generate new images using this neural network approach. Remember, the previous model, above, took 10-30 seconds per frame. Here, with the VQVAE, we can generate 5-15 frames per second. This means we can quickly realize longer animations and we can explore the trained network faster. This network also works better with larger numbers of images (10,000+), but it does take longer to train (a couple days at least).
Conclusion
I'm super excited to be able to share the #! S/Lab with the MASARY community. This has been an exciting time for the studio, beginning new ventures and developments looking forward to the future. The work shared here is just a taste of what is to come.
The S/Lab is currently continuing to work on A.I. software for generating images and video. At the moment, we're focusing on making this software more easily accessible within the studio, so that members can upload image and video samples to the system for training, and make use of trained models in a variety of platforms and applications for exploration and exporting A.I.-created content for use in projects and beyond. We're also continuing to develop these systems to be able to create higher resolution content, that can train faster, and that can be put to use in real-time applications. Finally, we're continuing to track state of the art research and publications to keep the S/Lab's work on the cutting edge.
I look forward to sharing more of the work of the S/Lab in the near future.
DrJ