Credits got depleted and can't create AI images anymore? How to run your own image generator for free
Struggles of Running Image Generators: Limited Use and Frustration
New innovations like AI image generators are gathering popularity. With tools like midjourney and DALL-E 3, people can generate amazing images just with their imagination.
The one thing common in all of the tools is that there is pricing involved, So if you plan on generating images for free, then you will be limited either by a free trial or a limited credit system.
For example, if you are a designer and you want to create image assets, mockups, or design ideas, there would be a lot of iterations involved, which is expensive.
Sooner or later, after some usage, your credits will become depleted and you will no longer be able to generate images.
What technology does Image Generation use?
The technology that these image generators use is called Stable Diffusion.
Stable diffusion requires a lot of resources. Making it expensive to be given to everyone for free.
So I thought, Why not run stable diffusion using my own resources? That way, I won't need to deal with payments, and I can create as many images as I want.
So I will be demonstrating by the end of the article how I was able to run AI image generators on my own for absolutely free.
Here you can see that I have given a simple prompt, and I got the image generated in high quality.
Let's see how the stable diffusion technology works, so we can start running it on our own.
How the Tech Behind AI Image Generators Works
To learn about how AI image generation works, we need to know about Stable Diffusion.
We can think of stable diffusion like the real diffusion we have learned in school.
There will be a clear beaker of water, we will add a few drops of dye, The dye diffuses throughout the liquid, until it reaches a state of equilibrium.
Now let's apply the same concept to real stable diffusion.
For training a stable diffusion model, we will start with a process called Forward diffusion.
In forward diffusion, we take an image and add noise to it.
For those who don't know, think of noise like the static you see when the TV gets disconnected.
The type of noise we are adding here is called gaussian noise. This process is done multiple times, so there will be multiple layers of noise.
After performing forward diffusion, Reverse Diffusion is done. This involves reversing the gaussian noise until we get the original image.
The model gradually starts learning how to predict images from noise.
Similarly, forward and reverse diffusion is done on millions of images to properly train the model.
After the training is done, we can make a random noise, and the model will predict the image.
We may have a doubt: How is the model able to generate images from text prompts?
Images used for training have an alt text associated with them, which describes what the image is about.
This way, each image is linked to a text, and the model gradually finds the relationship between the text and the images.
This is how stable diffusion models work in a simple way. Now let's get the stable diffusion running on your own.
Let's get stable diffusion running
Image generation using basic prompts
Now that we have a brief idea of how stable diffusion works, let's start running one on our own.
For this article, I will be using a tool that is very simple to use and configure, called Fooocus
You can setup Fooocus on either on your own machine (Verify you meet the minimum requirements) or through Google colab if you don't meet the minimum requirements. For this demo, I will be showing you how to run it via Google Colab.
You can use the following Google Colab file to start your stable diffusion.
Press the play button, and you can run the following code snippet.
Wait for a while, and you can see the following lines in the logs:
Click the gradio.live
link; this is the link to the UI for stable diffusion.
Once you open the link, you can see a textbox; just type out your prompt and click Generate
For example, I will try to generate an image of the Taj Mahal.
You can see that there are two images generated from the prompt.
There are many other options available in this tool, such as input image, inpaint/outpaint, and describe images, which you can try exploring on your own.
To have more tweaking capabilities, you can press the "advanced" checkbox.
This gives us options like setting the performance, on how fast you would like to see the outputs, the aspect ratio of the image, and the number of images you want to generate.
Tweaking the Image Generator for better outputs
Sometimes we may not be satisfied with the images that are generated.
For example: I'm entering the following prompt.
a cute little bird in the tree, singing,cartoon
I get the following image.
I mentioned Cartoon in the prompt. However, the images I got look realistic, rather than looking like cartoons.
There are 2 solutions to this problem
1. Styles
The first option is to change the style.
Styles are basically additional prompts, you can check the Fooocus style reference to know more about each style.
To select a style, Click Advanced checkbox and Click Style.
I'll try SAI Comic Book
style and generate the output
It gives a good result. However if you want a certain cartoon style, you can try to use LoRA models instead.
2. LoRA Models
LorA Models apply small tweaks to the stable diffusion model, So we can tweak the stable diffusion model to have more information related to cartoons.
If you want a more detailed explanation of LoRA, you can refer our article.
We can utilize a website called CivitAI to download LoRA Models.
Fooocus uses SDXL 1.0
model.
- Go to CivitAI and click models
- Click on filters and select LoRA and SDXL 1.0
After applying the filter, I found a LoRA Model which can be used.
Click the download icon and you will get a J_cartoon.safetensors
file.
You need to place this safetensors file in the following path of Foooocus
Fooocus/models/loras/
Make sure to restart fooocus after this step.
Configuring Image Generator to use the installed LoRA Model
Do the following steps to use the newly installed LoRA model:
- Click the Advanced Checkbox.
- Click Model
- We can see the list of LoRAs being used, on the second slot choose
J_cartoon.safetensors
The LoRA will only get used if the proper Trigger word is used in the prompt.
In the Trigger Words Row, we can see the trigger word is j_cartoon
So, My new prompt will be
a cute little bird in the tree, singing,cartoon,j_cartoon
See, I have included the trigger word in the prompt so that the image will be cartoony.
When clicking generate, I get the following image:.
The output now looks cartoony as desired. In a similar way, we can install any LoRA model and tweak the stable diffusion model to our liking.
Conclusion
We have observed through this article how we can get image generation done with little effort and no cost. The world of stable diffusion is a constantly growing area, and there are so many things to explore. You can even make your own LoRA model based on your images. Other than Fooocus, there are tools like Automatic1111 and ComfyUI, which are more advanced compared to the one we demoed right now.
FeedZap: Read 2X Books This Year
FeedZap helps you consume your books through a healthy, snackable feed, so that you can read more with less time, effort and energy.