For a little while, I’ve been considering the potential impact of generative Artificial Intelligence on various areas of our lives, more recently I’ve had very interesting conversations about applying AI to help the creations of manga/animé. This blurb was writing to help think through the challenges, but why not posting them.
Having witnessed the transition from analog to digital, and from local to cloud computing, I see generative AI as yet another significant shift.
As an engineer, my focus is on enhancing human creativity and simplifying technical barriers to fun and self-expression. I advocate for a human-centric approach to AI, emphasizing user assistance over just output focus. Engaging in R&D is vital, but it should align with addressing real human challenges.
I’m hesitant about the current generative AI approach most companies take. This may stem from my passion for real-time 3D rendering, but I’ve observed a clear trend towards 3D (or 2.5D) in animation and I’m concerned that focusing solely on 2D would be a bad thing for the industry and for storytelling in general.
To help me think through the process, I took [Oshi no Ko] as a recent example of an outstanding manga adaptation.
Current parts of the creative process
- Character Design: Essential for translating manga, encompassing not just size and emotions, but also style elements like outlines, shadows, and blush.
- Storyboarding: Outlines the cuts, camera angles, lens, and motions.
- Environments: Backgrounds and props.
Art Approach/Compositing: Inklines, outlines, smears, shadows, color grading, particles, lights, basically putting everything together.
Animation: I skipped the most time-consuming aspect: traditional animation, involving “hand-drawing” or mocap, like the dance scenes in this example.
This is the very well known/documented process of drawing keyframes and in-betweens. It’s a tedious process a lot of folks have been trying to improve over time. The challenge is to maintain a human touch while also going fast. We see more and more of mix of 3D rendered + 2D drawing on top.
A significant challenge arises in the later stages when directors often want to modify storyboards or change elements like camera angles and lighting. This leads to expensive revisions. In contrast, the video game industry benefits from real-time rendering, allowing for quicker adjustments in animation, lighting, and camera work.
In [Oshi no Ko], only the backgrounds and dance scenes were 3D modeled and animated; the rest was hand-drawn. While large studios are shifting to 3D, the transition is not trivial for 2D animators.
So, where does generative AI fit in?
- Facilitating the 2D-to-3D Transition: Offering real-time previews and more flexible, faster, accessible workflows.
- Streamlining Processes: Letting artists focus on impactful storytelling elements.
Facilitating the 2D-to-3D Transition
The reality is that newer artists will adapt more readily to 3D, but this transition won’t be instantaneous. AI can greatly simplify the process by automating some of the more intricate and tedious aspects. For instance:
Transforming character art into a comprehensive 3D model, including a single face mesh, blendshapes, animation tracks, face normals, and detailed settings for materials, shaders, and shadows. It also involves overlaying 2D assets like outlines, inklines, and blush on the 3D mesh. While the character may not be perfect initially, this step completes a substantial portion of the work. (and IMHO they key to do that is to generate code, not directly 3D assets)
Automating and reusing animations. Animation fundamentally involves keyframes that move specific bones. AI models can rapidly produce high-quality animations and smoothly transition between them. (and they MUST to be manually editable)
Generating backgrounds, environments, and props. Instead of creating full-fledged 3D environments, various shortcuts like skyboxes can be used for static scenes, while normal maps and PBR textures applied to low-poly meshes (generated by diffusion models or extracted from diffusion outputs) can create dynamic settings. A notable (free) example of a partial approach is https://stableprojectorz.com/. In this specific case, processing manga pages to identify and extract recurring props can be immensely helpful. Training a model to recognize these objects and either find matches in an asset database or generate synthetic data to create 3D models and/or materials/textyres can save countless hours, especially for artists less versed in 3D modeling.
Creating rudimentary 3D scenes that can be refined and incorporated into control networks for video-to-video conversions and producing complex animated backgrounds, not limited by current diffusion model constraints.
Producing shaders and materials to ensure consistent and editable colorization.
These are just a few examples, and there’s much more potential, but time is limited.
The focus should be on crafting a genuine “product” experience, where creators can concentrate on storytelling, liberated from technical complexities. Given that anime often follows recognizable patterns, this could be leveraged to offer immediate creative options or opportunities to deviate from the norm. The goal is for a director or even a mangaka to develop a solid episode prototype, which can then be refined by the team. All the elements mentioned in point #1 will underpin this product-focused approach. This strategic direction will define a startup’s success, distinguishing them from merely being a service provider or consultancy. It’s a challenging path, requiring disciplined decision-making and effective success metrics.
In conclusion, employing AI should go beyond mere generation of 2D images or vectors. It’s about reimagining the entire creative process, making each step more adaptable (with a strong emphasis on 3D), automating predictable elements (like materials, shapes, and animations), and providing creators the tools to manually refine and personalize their work. It’s crucial to allow for experimentation and discovery, which will prevent the end result from being just a less expensive version of existing content.