Introduction

This blog post contains a walkthrough of a recent project I worked on to dive deeper into fine-tuning as part of the Mastering LLMs: A Conference For Developers & Data Scientists course by Hamel Husain and Dan Becker.

With the abundance of instruction tuning datasets following the release of open source models, many efforts have been made to curate and translate these datasets into Arabic. Unfortunately, Egyptian Arabic dialect is almost forgotten in these attempts, and this project aims to change that.

Currently, you can use flagship models from OpenAI or Anthropic to translate from English to Egyptian Arabic with excellent results. However, the high cost makes it impractical for individuals to translate the available open source datasets.

This project’s outcome is a translator built using an open source model, making it easier to translate large amounts of data and integrating the Egyptian Arabic dialect into open source models for instruction fine-tuning.

Outline

The blogpost will be split into 4 parts:

Creating finetuning data using GPT-4o
Preparing the dataset for finetuning
Finetuning Llama 3 8B using Axolotl
Comparing the results before and after finetuning

Creating finetuing data using GPT-4o

To fine-tune a translation model, we need translation pairs between English and Egyptian Arabic.

Since my goal was to use this model for translating instruction and conversation datasets, I needed a starting point.

I decided to use the OpenAssistant dataset, selecting a random sample of messages rather than entire conversations. In future iterations of this project, I plan to diversify the sources of translation data, using different datasets like Alpaca or even Wikipedia articles.

from datasets import load_dataset
import pandas as pd

# Load open assistant 2 dataset
ds = load_dataset("OpenAssistant/oasst2")

# Filter for only english
english_ds = ds.filter(lambda x: x["lang"] == "en")["train"]

# Visual check
print(english_ds["text"][1])

Yes, it's possible to fix runny mayonnaise! The most common reason for mayonnaise becoming runny is because the oil was added too quickly or the egg yolk wasn't emulsified properly. Here are some steps you can take to fix it:

1. Separate another egg yolk and place it in a clean, dry bowl.
2. Slowly add the runny mayonnaise to the egg yolk while whisking vigorously.
3. Once all the runny mayonnaise has been added, continue whisking until the mixture has emulsified and thickened.
4. If the mayonnaise is still too runny, you can add another egg yolk and repeat the process.

If the mayonnaise still won't thicken, you can try adding a small amount of dijon mustard or vinegar to the mixture, which can act as emulsifiers and help stabilize the mayonnaise. It's important to add these ingredients slowly and in small amounts to avoid over-thinning the mixture.

While testing GPT-4o for translating from English to Egyptian Arabic, I noticed that direct translations often resulted in poor quality. To mitigate this, I found that translating first into Modern Standard Arabic and then into Egyptian Arabic produced much better results. Therefore, I used GPT-4o to translate each sentence or paragraph into Arabic first, and then into Egyptian Arabic.

Here’s the prompt I used:

SYSTEM_PROMPT = """You are an fluent speaker and expert translator for English, Arabic and Egyptian Arabic. \
Your task is to translate text from English into Egyptian Arabic dialect.

# Steps to Achieve the Best Results:
Step 1: Translate the text from English into Modern Standard Arabic.
Step 2: Translate the text from Modern Standard Arabic into Egyptian dialect.

# Adhere to the Following Instructions:
1. **Always follow the steps presented above.**
2. **Output the two translations as keys in a JSON object:**
    - "ar" for the Modern Standard Arabic translation.
    - "eg" for the Egyptian Arabic dialect translation.
3. **You may change the order of sentences when necessary** to better mimic the style of Arabic and Egyptian dialect.
4. **Your translation should not be literal**; it should capture the essence of the text.
5. **Translate specific English terminologies (e.g., science, computer science, biology) or entities \
(e.g., movies, series, poems, names, programming languages)**, but always keep their original English form within parentheses.
6. **If the text contains code or is entirely code**, do not translate the code part; write it as it is."""

And here, you can see a sample of GPT-4o’s translation using that prompt.

from dotenv import load_dotenv
from openai import OpenAI

import os

load_dotenv()

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[{"role": "system", "content": SYSTEM_PROMPT}, 
            {"role": "user", "content": f"Translate the following text:\n{english_ds["text"][1]}"}],
  temperature=1,
  max_tokens=512,
  top_p=1
)

print(response.choices[0].message.content)

```json
{
  "ar": "نعم، من الممكن إصلاح المايونيز السائل! السبب الأكثر شيوعًا لأن يصبح المايونيز سائلاً هو أن الزيت أضيف بسرعة كبيرة أو أن صفار البيض لم يتم استحلابه بشكل صحيح. إليك بعض الخطوات التي يمكنك اتباعها لإصلاحه:

1. افصل صفار بيضة أخرى وضعه في وعاء نظيف وجاف.
2. أضف المايونيز السائل تدريجياً إلى صفار البيض مع الخفق بقوة.
3. بمجرد إضافة كل المايونيز السائل، استمر في الخفق حتى يتم استحلاب الخليط ويثخن.
4. إذا كان المايونيز لا يزال سائلاً جدًا، يمكنك إضافة صفار بيضة أخرى وتكرار العملية.

إذا لم يثخن المايونيز بعد، يمكنك محاولة إضافة كمية صغيرة من خردل (dijon) أو خل إلى الخليط، حيث يمكن أن تعمل كمستحلبات وتساعد في تثبيت المايونيز. من المهم إضافة هذه المكونات ببطء وبكميات صغيرة لتجنب تخفيف الخليط بشكل زائد.",
  
  "eg": "أيوه، ممكن تصلح المايونيز السايل! أكتر سبب بيخلي المايونيز يبقى سايل هو إن الزيت أضيف بسرعة أو إن صفار البيض مش متجانس كويس. دي شوية خطوات ممكن تعملها لتصلح المشكلة:

1. اعزل صفار بيضة تانية وحطه في طبق نضيف وجاف.
2. أضف المايونيز السايل تدريجياً لصفار البيض وأنت بتخفق بكل قوة.
3. لما تضيف كل المايونيز السايل، كمل في الخفق لحد ما الخليط يتجانس ويكثف.
4. لو المايونيز لسه سايل جداً، ممكن تضيف صفار بيضة تانية وتكرر العملية.

لو المايونيز لسه مش عايز يثخن، جرب تضيف شوية من خردل (dijon) أو خل للخليط، دول بيساعدوا في تجانس المايونيز. المهم تضيف المكونات دي ببطء وبكميات صغيرة عشان ما تدفيش الخليط." 
}
```

After developing the prompt, I utilized the OpenAI batch API to translate a sample of 10K messages from the dataset. The process involved three main steps:

Generating 10 separate JSONL files, each containing prompts for translating different messages.
Submitting these JSONL files to OpenAI’s batch API.
Downloading the results.

Here’s the code snippet that accomplishes this:

Code

import json
from tqdm import tqdm


def generate_jsonl(filename, texts):
    """
    Generate a JSONL file with the specified filename
    """

    # Write jsonl file
    with open(filename, 'w') as file:
        for index in tqdm(range(0, len(texts)), desc="Generating JSONL File"):
            text = texts[index]
            request = {
                "custom_id": str(index),
                "method": "POST",
                "url": "/v1/chat/completions",
                "body": {
                    "model": "gpt-4o",
                    "messages": [
                        {"role": "system", "content": SYSTEM_PROMPT},
                        {"role": "user", "content": f"Translate the following text:\n{text}"}
                    ],
                    "temperature": 1,
                    "top_p": 1,
                    "max_tokens": 2048,
                }
            }
            file.write(json.dumps(request) + '\n')

batch_ids = {}

for start in range(0, 10000, 1000):
    end = start + 1000
    english_ds_sample = english_ds.shuffle(seed=42)["train"].select(range(start, end))
    english_ds_sample_df = english_ds_sample.to_pandas()

    if (start, end-1) in batch_ids.keys():
        batch_status = client.batches.retrieve(batch_ids[(start, end-1)]).status
        if batch_status != 'failed':
            continue

    print(f'Creating batch for ({start}, {end-1})')

    batch_input_fn = f'batch_api_input_{start}_{end-1}.jsonl'
    generate_jsonl(batch_input_fn, english_ds_sample_df["text"].tolist())
    
    batch_input_file = client.files.create(
        file=open(batch_input_fn, "rb"),
        purpose="batch"
    )

    batch_input_file_id = batch_input_file.id

    batch = client.batches.create(
        input_file_id=batch_input_file_id,
        endpoint="/v1/chat/completions",
        completion_window="24h",
        metadata={
        "description": f"OpenAssistant 1K ({start}, {end-1}) sample translation"
        }
    )

    batch_ids[(start, end-1)] = batch.id


# To run this part, you have to wait for some time to check that all batches are completed
for r, batch_id in batch_ids.items():
    start, end = r
    print(f'Downloading ({start}, {end}) batch outputs')
    content = client.files.content(client.batches.retrieve(batch_id).output_file_id)
    content.write_to_file(f"batch_output_{start}_{end}.jsonl")

Preparing the dataset for finetuning

The output from the previous step consists of separate JSONL files, each containing the output for a specific batch. To proceed with fine-tuning, we need to process this output into a suitable format.

Here’s a look at the batch inputs and outputs:

with open('data/batch_api_input_0_999.jsonl') as f:
    english = [json.loads(line) for line in f]

with open('data/batch_output_0_999.jsonl') as f:
    translations = [json.loads(line) for line in f]

english[:1], translations[:1]

([{'custom_id': '0',
   'method': 'POST',
   'url': '/v1/chat/completions',
   'body': {'model': 'gpt-4o',
    'messages': [{'role': 'system',
      'content': 'You are an fluent speaker and expert translator for English, Arabic and Egyptian Arabic. Your task is to translate text from English into Egyptian Arabic dialect.\n\n# Steps to Achieve the Best Results:\nStep 1: Translate the text from English into Modern Standard Arabic.\nStep 2: Translate the text from Modern Standard Arabic into Egyptian dialect.\n\n# Adhere to the Following Instructions:\n1. **Always follow the steps presented above.**\n2. **Output the two translations as keys in a JSON object:**\n    - "ar" for the Modern Standard Arabic translation.\n    - "eg" for the Egyptian Arabic dialect translation.\n3. **You may change the order of sentences when necessary** to better mimic the style of Arabic and Egyptian dialect.\n4. **Your translation should not be literal**; it should capture the essence of the text.\n5. **Translate specific English terminologies (e.g., science, computer science, biology) or entities (e.g., movies, series, poems, names, programming languages)**, but always keep their original English form within parentheses.\n6. **If the text contains code or is entirely code**, do not translate the code part; write it as it is.'},
     {'role': 'user',
      'content': "Translate the following text:\nThanks for the list! I'm especially interested in how these women overcame obstacles in their lives. Are there any resources you can recommend for learning more about their stories?"}],
    'temperature': 1,
    'top_p': 1,
    'max_tokens': 2048}}],
 [{'id': 'batch_req_tuugwkbZhxWZ7fKL5YRQWvHP',
   'custom_id': '0',
   'response': {'status_code': 200,
    'request_id': '161ad42423c0ecf010ce879f31e50e5d',
    'body': {'id': 'chatcmpl-9b76MvbpkO4dfI8dQLWOPFEPo1dZC',
     'object': 'chat.completion',
     'created': 1718632462,
     'model': 'gpt-4o-2024-05-13',
     'choices': [{'index': 0,
       'message': {'role': 'assistant',
        'content': '```json\n{\n  "msa": "شكرًا على القائمة! أنا مهتم بشكل خاص في كيفية تغلب هؤلاء النساء على العقبات في حياتهن. هل هناك أي موارد يمكنك أن توصي بها لمعرفة المزيد عن قصصهن؟",\n  "ea": "شكرًا على القائمة! أنا مهتمة بالذات أعرف إزاي الستات دول قدروا يعدوا العقبات اللي في حياتهم. في مصادر تنصحني بيها عشان أتعرف أكتر على حكاويهم؟"\n}\n```'},
       'logprobs': None,
       'finish_reason': 'stop'}],
     'usage': {'prompt_tokens': 290,
      'completion_tokens': 109,
      'total_tokens': 399},
     'system_fingerprint': 'fp_319be4768e'}},
   'error': None}])

For each sample in these batch files, we need to extract the English text used for translation and the JSON output from GPT-4o, then parse out the Arabic and Egyptian Arabic translations.

For each triplet, I create 6 translation pairs:

Arabic to English
Egyptian Arabic to English
English to Arabic
Egyptian Arabic to Arabic
English to Egyptian Arabic
Arabic to Egyptian Arabic

The main purpose of the model was to translate from English to Egyptian Arabic, but I thought that including the back translation could enhance the model’s capabilities. Additionally, it creates a bridge for translating from English to Arabic and from Arabic to Egyptian Arabic, which proved effective with GPT-4o.

Here is the updated code to handle this process:

Code

import glob

def convert_text_to_dict(text):
    # Remove the '```json' and '```' delimiters
    cleaned_text = text.replace('```json\n', '').replace('```', '').strip()
    
    # Convert the cleaned text into a dictionary, ensuring it does not error out
    try:
        result_dict = json.loads(cleaned_text, strict=False)
    except json.JSONDecodeError as e:
        print("JSON Decode Error:", e)
        # Attempt to clean the text further or handle specific issues
        cleaned_text = cleaned_text.replace('\n', '\\n').replace('\\"', '"').replace('\\\'', "'")
        try:
            result_dict = json.loads(cleaned_text, strict=False)
        except json.JSONDecodeError as e:
            print("JSON Decode Error after further cleaning:", e)
            return None
    
    return result_dict


input_files = sorted(glob.glob("data/batch_api_input*"))
output_files = sorted(glob.glob("data/batch_output*"))

english = []
translations = []

for input_file in input_files:
    with open(input_file) as f:
        english.extend([json.loads(line) for line in f])

for output_file in output_files:
    with open(output_file) as f:
        translations.extend([json.loads(line) for line in f])

This snippet loads the batch inputs and outputs, storing them in separate lists. Now we need to process these inputs and outputs into a format suitable for fine-tuning.

Code

data = []
fail = 0
i = 0

for english_data, arabic_data in tqdm(zip(english, translations)):

    try:
        output_dict = convert_text_to_dict(arabic_data['response']['body']['choices'][0]['message']['content'])
        ar_text = output_dict['ar']
        eg_text = output_dict['eg']

        en_text = english_data['body']['messages'][-1]['content'].split('Translate the following text:\n')[-1]
        
        data.append({"instruction": "Translate the following text to English.",
                    "input": ar_text,
                    "output": en_text,
                    "input_lang": "ar",
                    "output_lang": "en",
                    "id": i})
        
        data.append({"instruction": "Translate the following text to English.",
                    "input": eg_text,
                    "output": en_text,
                    "input_lang": "eg",
                    "output_lang": "en",
                    "id": i})
        
        data.append({"instruction": "Translate the following text to Arabic.",
                    "input": eg_text,
                    "output": ar_text,
                    "input_lang": "eg",
                    "output_lang": "ar",
                    "id": i})
        
        data.append({"instruction": "Translate the following text to Arabic.",
                    "input": en_text,
                    "output": ar_text,
                    "input_lang": "en",
                    "output_lang": "ar",
                    "id": i})
        
        data.append({"instruction": "Translate the following text to Egyptian Arabic.",
                    "input": ar_text,
                    "output": eg_text,
                    "input_lang": "ar",
                    "output_lang": "eg",
                    "id": i})
        
        data.append({"instruction": "Translate the following text to Egyptian Arabic.",
                    "input": en_text,
                    "output": eg_text,
                    "input_lang": "en",
                    "output_lang": "eg",
                    "id": i})
        
        i += 1
        
    except:
        fail += 1

 # Write jsonl file
with open("data/translation-dataset-openai-10k.jsonl", 'w') as file:
    for index in tqdm(range(0, len(data)), desc="Generating JSONL File"):
        row = data[index]
        file.write(json.dumps(row) + '\n')

This code processes each sample, creating six translation pairs for each, and writes them to a JSONL file suitable for fine-tuning.

Due to some JSON parsing errors, the final dataset ended up with around 57K rows instead of 60K.

The next step is to split this dataset into training and testing sets to validate the performance of the model after fine-tuning.

Here’s how you can split the dataset:

Code

import random

random.seed(42)

# Path to your .jsonl file
dataset_path = 'data/translation-dataset-openai-10k.jsonl'
train_dataset_path = 'data/translation-dataset-openai-10k-train.jsonl'
test_dataset_path = 'data/translation-dataset-openai-10k-test.jsonl'

# Initialize an empty list to store the data
train_data_list = []
test_data_list = []

# Open the file and read line by line
with open(dataset_path, 'r', encoding='utf-8') as file:

    # Sample test ids
    lines = file.readlines()
    test_ids = random.sample(list(range(len(lines))), k=len(lines)//10)

    for line in lines:
        data = json.loads(line.strip())  # Parse JSON from each line
        if data["id"] in test_ids:
            test_data_list.append(line)
        else:
            train_data_list.append(line)

train_data_list = list([json.loads(l.strip()) for l in set(train_data_list)])
test_data_list = list([json.loads(l.strip()) for l in set(test_data_list)])

with open(train_dataset_path, 'w') as file:
    for index in tqdm(range(0, len(train_data_list)), desc="Generating Train JSONL File"):
        row = train_data_list[index]
        file.write(json.dumps(row) + '\n')

with open(test_dataset_path, 'w') as file:
    for index in tqdm(range(0, len(test_data_list)), desc="Generating Train JSONL File"):
        row = test_data_list[index]
        file.write(json.dumps(row) + '\n')

After splitting the dataset, you can convert these JSONL files into HuggingFace datasets to make them easier to work with for fine-tuning:

from datasets import Dataset
import pandas as pd

train_dataset_df = pd.read_json(train_dataset_path, lines=True).astype(str)
test_dataset_df = pd.read_json(test_dataset_path, lines=True).astype(str)

train_dataset = Dataset.from_pandas(train_dataset_df)
test_dataset = Dataset.from_pandas(test_dataset_df)

train_dataset.save_to_disk('translation-dataset-v3-train.hf')
test_dataset.save_to_disk('translation-dataset-v3-test.hf')

With these steps, you now have the train and test datasets ready for fine-tuning and validating your model.

Finetuning Llama 3 8B using Axolotl

To give you a brief intro about Axolotl, it is a tool designed to streamline LLM fine-tuning. I like Axolotl because it lets you focus on the data instead of the fine-tuning code, while incorporating the best fine-tuning practices.

In this project, I used a pretty simple finetuning configuration that I’ll provide below. To summarize what the configuration entails:

It loads a Llama 3 8B in 8bit
It uses the alpaca format for finetuning
It finetunes a LoRA adapter instead of doing a full finetune
It uses sample packing to improve finetuning efficiency
It trains the model for 2 epochs, while running 10 evals per epoch
It logs the train and eval loss into weights and biases

The finetuning was carried out on a single A5000 GPU (24 GB VRAM) on Jarvis Labs that costs 0.49$/hr, and took around 10 hours to complete. You can check the weights and biases log over here.

For more info about Axolotl, I highly recommend the documentation, and checking this short video guide by Jarvis Labs that shows how to spin up an instance that uses axolotl over there.

base_model: meta-llama/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
  - path: translation-dataset-v3-train.hf
    type: alpaca
    train_on_split: train

test_datasets:
  - path: translation-dataset-v3-test.hf
    type: alpaca
    split: train

dataset_prepared_path: ./last_run_prepared
output_dir: ./llama_3_translator
hub_model_id: ahmedsamirio/llama_3_translator_v3


sequence_len: 2048
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false

adapter: lora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: en_eg_translator
wandb_entity: ahmedsamirio
wandb_name: llama_3_en_eg_translator_v3

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 2
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 10
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

Comparing the finetuned model with GPT-4o

Of course, the comparison here will go in facor of GPT-4o, but since we were aiming to emulate it’s performance, let’s make a comparison to see how far off our finetuned model is.

I’ll use three random sample responses from the Alpaca dataset.

Code

alpaca_sample = [
    """I had to make a difficult decision when I was working as a project manager at a construction company. I was in charge of a project that needed to be completed by a certain date in order to meet the client’s expectations. However, due to unexpected delays, we were not able to meet the deadline and so I had to make a difficult decision. I decided to extend the deadline, but I had to stretch the team’s resources even further and increase the budget. Although it was a risky decision, I ultimately decided to go ahead with it to ensure that the project was completed on time and that the client’s expectations were met. The project was eventually successfully completed and this was seen as a testament to my leadership and decision-making abilities.""",
    """There are several factors that contribute to an individual's success, such as hard work and dedication, effective communication skills, positive attitude, good time management, a clear vision and specific goals, problem-solving and decision-making skills, willingness to take risks, resilience and adaptability, prioritization and organization, proactivity, self-motivation, personal growth, and the ability to collaborate with others.""",
    """Cats and dogs are both beloved pets, but they have important differences. Dogs are typically more outgoing and energetic, while cats are considered more independent. Dogs tend to be more social and active, enjoying walks and playing with other animals. Cats, on the other hand, tend to be more solitary, preferring to relax and snuggle up in a warm spot. Dogs typically require more care and attention, while cats are more self-sufficient. Despite these differences, cats and dogs remain popular and loving pets."""
]

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

tokenizer = AutoTokenizer.from_pretrained("ahmedsamirio/Egyptian-Arabic-Translator-Llama-3-8B")
model = AutoModelForCausalLM.from_pretrained("ahmedsamirio/Egyptian-Arabic-Translator-Llama-3-8B", load_in_8bit=True, device_map="cuda")
pipe = pipeline(task='text-generation', model=model, tokenizer=tokenizer)

Code

ar_template = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Translate the following text to Arabic.

### Input:
{text}

### Response:
"""

eg_template = """<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Translate the following text to Egyptian Arabic.

### Input:
{text}

### Response:
"""

def get_output(prompt, max_new_tokens):
    out = pipe(prompt, max_new_tokens=max_new_tokens, do_sample=False, temperature=None)
    return out[0]['generated_text'].split("### Response:\n")[-1]

def get_ft_model_translations(text):
    eg_text = get_output(eg_template.format(text=text), 512)
    translations = {"eg": eg_text}
    return translations

def get_gpt4o_translations(text):
    response = client.chat.completions.create(
      model="gpt-4o",
      messages=[{"role": "system", "content": SYSTEM_PROMPT}, 
                {"role": "user", "content": f"Translate the following text:\n{text}"}],
      temperature=1,
      max_tokens=512,
      top_p=1
    )
    translations = convert_text_to_dict(response.choices[0].message.content)
    return translations

def compare_translations(text):
    ft_translations = get_ft_model_translations(text)
    gpt4o_translations = get_gpt4o_translations(text)
    
    print("Original Text:\n")
    print(text)
    print()
    
    print('GPT-4o Translation:\n')
    print(gpt4o_translations['eg'])
    print()
    
    print('Fintuned Model Translation:\n')
    print(ft_translations['eg'])
    print()

Sample 1

Code

compare_translations(alpaca_sample[0])

Original Text:

I had to make a difficult decision when I was working as a project manager at a construction company. I was in charge of a project that needed to be completed by a certain date in order to meet the client’s expectations. However, due to unexpected delays, we were not able to meet the deadline and so I had to make a difficult decision. I decided to extend the deadline, but I had to stretch the team’s resources even further and increase the budget. Although it was a risky decision, I ultimately decided to go ahead with it to ensure that the project was completed on time and that the client’s expectations were met. The project was eventually successfully completed and this was seen as a testament to my leadership and decision-making abilities.

GPT-4o Translation:

اضطريت آخد قرار صعب وأنا كنت شغال كمدير مشروع في شركة مقاولات. كنت مسؤول عن مشروع لازم يخلص في معاد معين عشان نرضي العميل. بس بسبب تأخيرات غير متوقعة، ماقدرناش نلتزم بالميعاد، فكان لازم آخد قرار صعب. قررت أمد الميعاد، بس كان لازم أضغط على فريق العمل وأزود الميزانية. رغم إن القرار كان محفوف بالمخاطر، قررت أمشي فيه عشان المشروع يخلص في وقته ونرضي العميل. في الآخر، المشروع نجح وتنفذ بنجاح وده كان شهادة على قدراتي في القيادة واتخاذ القرارات.

Fintuned Model Translation:

كان لازم أاخد قرار صعب لما كنت شغال كمدير مشروع في شركة بناء. كنت مسؤول عن مشروع كان محتاج يخلص في موعد معين عشان نوفر توقعات العميل. بس بسبب تأخيرات غير متوقعة، ما قدرتش نتوفر على الميعاد واضطرت أاخد قرار صعب. قررت أطول الميعاد، بس كان لازم أزود موارد الفريق أكتر وأزود الميزانية. رغم إن القرار كان محفوف بالمخاطر، قررت أتابع المهمة عشان أتأكد إن المشروع يخلص في الوقت المناسب وإن توقعات العميل تتحقق. المشروع خلص في النهاية بنجاح وده اتشاف كدليل على قيادتي وقدراتي في اتخاذ القرارات.

Sample 2

Code

compare_translations(alpaca_sample[1])

Original Text:

There are several factors that contribute to an individual's success, such as hard work and dedication, effective communication skills, positive attitude, good time management, a clear vision and specific goals, problem-solving and decision-making skills, willingness to take risks, resilience and adaptability, prioritization and organization, proactivity, self-motivation, personal growth, and the ability to collaborate with others.

GPT-4o Translation:

في عوامل كتير بتساهم في نجاح الشخص، زي الشغل الجامد والاجتهاد، مهارات التواصل الفعّالة، النظرة الإيجابية، إدارة الوقت بشكل كويس، رؤية واضحة وأهداف محددة، مهارات حل المشاكل واتخاذ القرار، الرغبة في المخاطرة، المرونة والتكيف، الأولويات والتنظيم، المبادرة، التحفيز الذاتي، النمو الشخصي، والقدرة على التعاون مع الناس التانية.

Fintuned Model Translation:

فيه عوامل كتير بتساهم في نجاح الشخص، زي الشغل الجاد والتفاني، مهارات التواصل الفعّالة، المزاج الإيجابي، إدارة الوقت بشكل كويس، رؤية واضحة وأهداف محددة، مهارات حل المشاكل واتخاذ القرارات، استعداد لتحمل المخاطر، الصمود والقدرة على التكيف، الترتيب والتنظيم، الإقدام، التحفيز الذاتي، النمو الشخصي، والقدرة على التعاون مع الآخرين.

Sample 3

Code

compare_translations(alpaca_sample[2])

Original Text:

Cats and dogs are both beloved pets, but they have important differences. Dogs are typically more outgoing and energetic, while cats are considered more independent. Dogs tend to be more social and active, enjoying walks and playing with other animals. Cats, on the other hand, tend to be more solitary, preferring to relax and snuggle up in a warm spot. Dogs typically require more care and attention, while cats are more self-sufficient. Despite these differences, cats and dogs remain popular and loving pets.

GPT-4o Translation:

القطط والكلاب الحيوانات دي الاتنين محبوبين، بس في اختلافات مهمة بينهم. الكلاب عادةً بتكون أكثر انفتاح ونشاط، في حين إن القطط بتحب تستقل. الكلاب بتحب الاختلاط وبتكون نشيطة، بتمبسط من المشي واللعب مع الحيوانات التانية. لكن القطط بتميل للعزلة، وبتحب تسترخى وتتمدد في مكان دافئ. الكلاب بتطلب رعاية واهتمام أكتر، بس القطط بتعتمد على نفسها أكتر. رغم الاختلافات دي، القطط والكلاب لسه حيوانات أليفة محبوبة وشعبية.

Fintuned Model Translation:

القطط والكلاب هما حيوانات أليفة محبوبة، بس عندهم اختلافات مهمة. الكلاب عادةً بتكون أكتر نشاطًا وطاقة، والقطط بتعتبر أكتر استقلالية. الكلاب عادةً بتكون أكتر اجتماعية ونشطة، وبتستمتع بالمشي واللعب مع الحيوانات التانية. أما القطط، بتكون أكتر وحدة، وبتفضل تستريح وتدوس في مكان دافي. الكلاب عادةً محتاجة عناية أكتر، والقطط بتكون أكتر استقلالية. رغم الاختلافات دي، القطط والكلاب لسه حيوانات أليفة مشهورة ومحبوبة.

Conclusion

As you can see, the results are still not perfect. However, the fine-tuned model is starting to catch up with GPT-4o. With some minor adjustments to the fine-tuning methodology, I believe we can achieve performance closer to that of GPT-4o.

In conclusion, this project demonstrates the potential of fine-tuning large language models to effectively translate English to Egyptian Arabic, addressing a significant gap in existing resources. While the fine-tuned model is not yet on par with GPT-4o, it shows promising results and opens up opportunities for further improvement. By refining the fine-tuning process and expanding the dataset, we can continue to enhance the model’s performance. I hope this walkthrough provides valuable insights and inspires others to explore and contribute to this area. Thank you for following along, and I look forward to sharing more updates as this project progresses.

TL;DR

Used GPT-4o to create translation pairs from English to Modern Standard Arabic and then to Egyptian Arabic.
Generated a dataset from the OpenAssistant/oasst2 messages using GPT-4o and OpenAI’s batch API.
Prepared and processed the dataset for fine-tuning.
Fine-tuned Llama-3-8B using Axolotl, focusing on LoRA adapters and sample packing.
Evaluated the fine-tuned model against GPT-4o using sample texts.
Found that while the fine-tuned model isn’t perfect, it’s a significant step towards accessible Egyptian Arabic translations.