Bot Invasion To Automated Defense: My Journey With ML Deployment

In my previous article, "Bots Invaded My Newsletter. Here's How I Fought Back with ML" I shared my experience building a bot-signup detector using machine learning to tackle a surge of unwanted bot signups on my newsletter.

While it wasn't the most sophisticated solution, it was a valuable learning experience.

Few people on Reddit were curious about deploying their own models.

Since I'm still on this journey of learning ML deployment, I wanted to share what I've learned so far about finding an easy and cost-effective approach or even finding more feasible approach from you guys.

Even though I'm just starting out, hopefully this can be helpful for others who are in the same boat!

Backstory Of Identifying The Enemy

I had a bot invasion to my newsletter, I knew how the bots would look(name, email) like.

Email Name
watcher2112@ecocryptolab.com 🔶 Withdrawing 32 911 Dollars. Gо tо withdrаwаl >>> https://forms.yandex.com/cloud/65e6228102848f1a71edd8c9?hs=0cebe66d8b7ba4d5f0159e88dd472e8b& 🔶

Their data were stored in DB collected from signups, I used that for the dataset.

For my weapon of choice, I picked a BERT transformer.

I trained it with a bunch of emails (144 to be exact) to learn the difference between human and bot names and emails; and it was working most of the time.

So it was all ready, just needed to deploy it live and use it in the signup process.

Preparing My Weapon To Fire Against The Enemy

Now that I had my trusty bot detector trained, it was time to figure out how to load it into the battlefield (deployment).

Here's what I learned about deploying a machine learning model in a simple and cost-effective way.

Integrating the bot detector with my newsletter signup process was an exciting adventure.

It felt like discovering a whole new system, just like writing the final line of code that unlocks a new functionality!

Previously I had a transformer which would take the name and email as the input and provide a boolean value indicating if the input signup is bot or not.

For deployment we didn't wanna spin up a new VM and a server to keep listining to the calls or at the same time didn't want our existing services to have this as a piece of them. So went in for AWS Lambda server less deployment.

Can't Use Lambda Straight Away

When I was trying to deploy the transformer model I understood, I cannot use Lambda normally. Because there will be installations like transformers, scikit-learn, and many more.

So the alternate solution was to use Lambda using Docker container images.

This was a good exploration, basically it's a Docker image which you create by installing all the pre deps whatever is necessary for you and host it as a Lambda function.

Docker Container Images (But Too Big!)

I loaded up my previously built transformer of 419 MB .bin file and installed transformers, scikit-learn and may other packages, by the time I built the image it was 9.2 GB!

Clearly that was a horrible solution for such a basic problem.

Logistic Regression - Smaller and Faster

I moved on to Logistic Regression, which took less time to train and prepare the model as compared to the transformer and crazy thing was the binary is 27 KB :D

I went on with adding deps, logic and voila 820 MB Docker image.

So I went ahead and pushed the Docker image to Elastic Container Registry.

ECR is like Docker hub where I can store the Docker image I build.

Then created a Lambda function which uses the docker image from the ECR repo I created earlier, so the cannon was prepared with the load and powder, just had to figure out the firing mechanism.

Firing The Weapon

The initial plan was to trigger the Lambda function directly using the AWS CLI or Boto3 library.

However, I needed a more user-friendly way to activate the bot detector from frontend.

This led me to explore API Gateway.

It's a good service that allows you to create a public endpoint (like a trigger point) that accepts requests and forwards them to your Lambda function behind the scenes.

This was exactly what I needed – a way to invoke the Lambda function using a simple API call.

Integrating the API Gateway with my signup form wasn't completely smooth sailing.

I encountered some challenges mapping the data received by the API Gateway to the format expected by the Lambda function.

Luckily, CloudWatch logs came to the rescue.

With its detailed logs, I could easily debug the issue and get everything working seamlessly.

Killing The Enemy

Now, whenever someone signs up for my newsletter, the API in my frontend form automatically triggers the Lambda function. Here's the magic that happens behind the scenes:

  1. The signup data is sent to the Lambda function.

  2. The function analyzes the data using the trained model to identify potential bots.

  3. If a bot is detected, the function automatically blocks the subscriber using Listmonk's built-in Block API.

  4. Finally, the function sends a notification to my Discord channel, keeping me informed about signup activity (including any blocked bots).

With this system in place, I've successfully automated bot detection and eliminated the need for manual intervention.

This feels like a victory in the fight against newsletter bot signups.

1 powerful reason a day nudging you to read
so that you can read more, and level up in life.

Sent throughout the year. Absolutely FREE.

How to setup a bot detector for yourself

Note: Make sure your ~/.aws/config file had the region specified

lovestaco@i3nux-mint:~$ cat .aws/config
[default]
region = ap-south-1
output = json

Note: Make sure your ~/.aws/credentials file should have secret and access key of IAM user which has the policy access to ECR, Lambda, Cloudwatch, API Gateway

lovestaco@i3nux-mint:~$ cat .aws/credentials
[default]
aws_access_key_id = AKIAQTshortkey
aws_secret_access_key = 0Yorm/longkey

Note: Code for the Lambda setup can be found here Github

Creating an Amazon ECR Repository (ECR):

First we need to create ECR repositary to hold the docker image which we build later.
Below command will create a repo bot_detector_repo

 aws ecr create-repository --repository-name bot_detector

Login to the ECR so that you can push Docker Images to the repo.

aws ecr get-login-password --region ap-south-1 | docker login --username AWS --password-stdin 042888888888.dkr.ecr.ap-south-1.amazonaws.com

Lambda Execution

This function (named handler) takes signup data (name and email) as input, and loads the pre-trained model(dt_model_file.pkl).

It then transforms the data using the model's tools and makes a prediction on whether it's a bot.

Finally, it returns a response indicating if the signup is likely from a human or a sneaky bot.

# lambda_function.py
import json
import os
import pickle
def handler(event, context):
    data_to_test = event["name"] + event["email"]
    local_file_path = "dt_model_file.pkl"
    with open(local_file_path, "rb") as f:
        model_and_vectorizer = pickle.load(f)
    clf, vectorizer = model_and_vectorizer
    data_to_test_transformed = vectorizer.transform([data_to_test])
    prediction = clf.predict(data_to_test_transformed)
    return json.loads({"is_bot": prediction})

Building and Pushing the Docker Image:

To build a Docker Image I am using the python:3.10 image.

# Dockerfile
FROM public.ecr.aws/lambda/python:3.10
COPY requirements.txt ${LAMBDA_TASK_ROOT}
RUN pip install scikit-learn
COPY dt_model_file.pkl ${LAMBDA_TASK_ROOT}
COPY lambda_function.py ${LAMBDA_TASK_ROOT}
CMD [ "lambda_function.handler" ]

Simply build the image

docker build --platform linux/amd64 -t bot_detector_img:latest .

Tag the image with the repo name

 docker tag bot_detector_img:latest 042888888888.dkr.ecr.ap-south-1.amazonaws.com/bot_detector_repo:latest

Push the image to repo

docker push 042888888888.dkr.ecr.ap-south-1.amazonaws.com/bot_detector_repo:latest

Creating and Configuring a Lambda Function:

aws lambda create-function --function-name bot_detector_fn \
--runtime python3.8 \
--role arn:aws:iam::042888888888:role/service-role/bot_detector_fn-role-1nbzlihb \
--handler bot_detector_fn.lambda_handler \
--zip-file fileb://bot_detector_fn.zip \
--timeout 30 \
--memory-size 300 \
--region ap-south-1 \
--tracing-config Mode=PassThrough \
--package-type Image \
--image-config-response '{}'

When creating Lambda function it is better to create using the console if you are uncomfortable with cmd and args.

  1. Go to AWS Lambda
  2. Select Create function -> Container Image
  3. Select the Repositary and image

    That's it the Lambda function is ready now.

Testing and Updating the Function (Optional):

List all the Lambda Functions in the terminal

aws lambda list-functions

If your function execution takes longer than the default 3 sec timeout, you can increase by using the following command.

 aws lambda update-function-configuration \
 --function-name bot_detector_fn --timeout 20

Similarly, if your function requires more memory to process data, you can adjust its memory allocation according to your need.

aws lambda update-function-configuration \
--function-name bot_detector_fn --memory-size 200

Triggering the Lambda Function:

To trigger the Lambda function we can use the following cmd from terminal.

aws lambda invoke --function-name bot_detector_fn \
--payload '{"name":"athreya", "email":"athreyac4@gmail.com"}' out \
--cli-binary-format raw-in-base64-out --output json

Setting up the API Gateway

Since we need to trigger the lambda from frontned, we need to create a REST API.

API Gateway

API Gateway acts as a "front door" for Lambda to receive data and return back the results.

aws apigateway create-rest-api --name bot_detector_api

# output
{
    "id": "hnrlovx11h",
    "name": "bot_detector_api",
    "createdDate": "2024-05-11T18:36:29+05:30",
    "apiKeySource": "HEADER",
    "endpointConfiguration": {
        "types": [
            "EDGE"
        ]
    },
    "disableExecuteApiEndpoint": false,
    "rootResourceId": "o9dbg0yw88"
}

API Gateway Resource

Creates a new resource under the specified REST API.

Resources represent the base URL for a collection of related operations.

aws apigateway create-resource --rest-api-id hnrlovx11h \
--parent-id o9dbg0yw88 --path-part bot_detector_endpoint

# output
{
    "id": "yzg7u4",
    "parentId": "o9dbg0yw88",
    "pathPart": "bot_detector_endpoint",
    "path": "/bot_detector_endpoint"
}

API Gateway Resource

Lets add a HTTP POST method to the specified resource.

This allows frontend to send POST request with name and email to this resource.

aws apigateway put-method --rest-api-id hnrlovx11h \
--resource-id yzg7u4 --http-method POST \
--authorization-type "NONE"

# output
{
    "httpMethod": "POST",
    "authorizationType": "NONE",
    "apiKeyRequired": false
}

API Gateway Resource Method

This tells API Gateway to forward incoming POST requests to the Lambda function.

The type is set to AWS_PROXY, indicating that API Gateway will act as a proxy to forward requests to the Lambda function.

aws apigateway put-integration --rest-api-id hnrlovx11h \
--resource-id yzg7u4 --http-method POST \
--type AWS_PROXY --integration-http-method POST \
--uri <lambda_arn>

API Gateway Stage

A stage represents a snapshot of your API at a particular point in time.
You can have v1, v2, vn

aws apigateway create-deployment --rest-api-id hnrlovx11h \
--stage-name prod

Execute

Once the release of stage has been done you can now use the endpoint.

curl -X POST https://zxfaa2s.execute-api.ap-south-1.amazonaws.com/prod/bot_detector \
-H "Content-Type: application/json" \
-d '{"email": "athreyac4@gmail.com", "name": "Athreya"}'

Final Thoughts

This journey of deploying a machine learning model to fight newsletter bots has been a valuable learning experience.

In my previous article, "Bots Invaded My Newsletter. Here's How I Fought Back with ML ⚔️" I covered building the bot detector model.

Now, we've explored the deployment side – a crucial step for putting your model to practical use.

Here are some resources to help you get started on your own AI/ML adventure:

  • Building a Logistic Regression Model (ipynb file): Logistic_regression.ipynb (This file demonstrates how I built the simpler and more efficient logistic regression model.)
  • Lightweight Model File (23kb): dt_model_file.pkl (Feel free to download and use this pre-trained model for basic bot detection in your own newsletter signup process.)
  • Lambda Function Code Repository: bot_detect_lambda (This repository contains the code for integrating the bot detection model with AWS Lambda for a serverless deployment.)

Spread the Knowledge!

Share this blog post with your friends who are interested in getting started with AI and machine learning.

Want to learn more or connect with me?

Reddit: athreyaaaa
LinkedIn: maneshwar-athreya

Stay Tuned!
Don't miss out! Subscribe now for weekly tech articles delivered straight to your inbox!

FeedZap: Read 2X Books This Year

FeedZap helps you consume your books through a healthy, snackable feed, so that you can read more with less time, effort and energy.