Queryloop

Latest from Queryloop

Stay updated with our latest research findings, product developments, and insights into AI optimization

Filter by:
Sort by:
Zain ul Abideen
March 12, 2025
15 min read
Research

Exploring S1: Experiments and Findings

A detailed analysis of S1, an open-weight language model, with experiments across multiple benchmarks including GPQA, AIME25, and OpenAI Math.

AI
Language Models
SimpleScaling
S1 Model
Benchmarking
Machine Learning
NLP
Queryloop
Zain ul Abideen
January 17, 2025
12 min read
Research

Introduction to SWE Bench & Patch Centric Approach

A comprehensive explanation of SWE-bench for evaluating AI coding agents and a patch-centric approach to solving SWE-bench issues.

The Software Engineering (SWE) Bench was created to evaluate AI coding agents like Devin, which automate tasks such as bug fixes and code improvements. It provides a dataset of repositories with known issues to test how effectively these tools identify and fix bugs. Agentic workflows are submitted to the SWE, tested on these repositories, and evaluated based on the success of their fixes.

AI
Software Engineering
SWE-bench
Patch-Centric
LLM
LangChain
Bug Fixing
Queryloop
Zain ul Abideen
January 17, 2025
8 min read
Research

Building a Coding Agent to Solve SWE-Bench

Learn how we improved our approach to solving SWE-bench problems by flipping the process—making code changes first and then generating patches.

AI
Software Engineering
SWE-bench
Code Editing
LLM
Automation
Bug Fixing
Queryloop
Zain ul Abideen
July 13, 2024
17 min read
Research

MHA vs MQA vs GQA vs MLA

Comparison of Deepseek's new Multi-latent head attention with MHA, MQA, and GQA.

In Transformer decoders, since the attention of tokens is dependent on the preceding tokens, so instead of recalculating the previous context, its Keys and Values are cached. This can significantly speed up the inference but may impose expensive memory overhead as the sequence length and the model dimensions grow. In this context, multiple attention mechanisms have been introduced: Multi-Head Attention, Multi-Query Attention, Grouped-Query Attention, and Multi-Head Latent Attention.

AI
Llm
NLP
Machine Learning
Deep Learning
Zain ul Abideen
July 7, 2024
6 min read
Research

Align Phi3 with CPO-SimPO

Align your LLM with less memory and speed efficient approach than DPO

Aligning LLMs for optimal performance typically starts with Supervised Fine-Tuning (SFT). Commonly, the model is loaded in 4-Bit, and config for LoRA training is applied. The standard practice involves loading the model in 4-bit mode and applying configurations for LoRA (Low-Rank Adaptation) training. Direct Preference Optimization (DPO) is another prominent technique for optimizing models with lower costs. The standard practice involves coupling SFT+DPO to further improve model performance but can be costly. Odds Ratio Preference Optimization (ORPO) replaces the SFT+DPO into a single step with more enhanced performance by adding an odds ratio-based penalty to the conventional negative log-likelihood (NLL) loss for differentiating the generation styles between favored and disfavored responses. Another technique for more stable training and improved performance is CPO-SimPO. It aims to counter SFT's dependency on training data quality for model performance, DPO's memory + speed inefficiency (if dealing with both parametrized and reference policy) and to prevent the generation of long but low-quality sequences. In this blog, I will introduce this technique in detail and further train Phi3-Mini-4K-Instruct on CPO-SimPO.

AI
Machine Learning
Deep Learning
Optimization
CPO
SimPO
Zain ul Abideen
July 7, 2024
15 min read
Research

Best LLM Inference Engine? TensorRT vs vLLM vs LMDeploy vs MLC-LLM

Benchmarking various LLM Inference Engines.

LLMs excel in text generation applications, such as chat and code completion models capable of high understanding and fluency. However, their large size also creates challenges for inference. Basic inference is slow because LLMs generate text tokens by token, requiring repeated calls for each next token. As the input sequence grows, the processing time increases. Additionally, LLMs have billions of parameters, making it difficult to store and manage all those weights in memory. In the effort to optimize LLM inference and serving, there are multiple frameworks and packages and in this blog, I'll use and compare the following inference engines TensorRT-LLM vLLM LMDeploy MLC-LLM

AI
Machine Learning
Deep Learning
LLM
Inference Engine
TensorRT
vLLM
LMDeploy
MLC-LLM
Zain ul Abideen
April 19, 2024
6 min read
Research

Schedule-Free Learning — A New Way to Train Models

Training 3 Llama models for comparison of Cosine Scheduled and Schedule-Free optimizer.

In the realm of machine learning, we are continuously relying on the intricate algorithms and techniques to train our models effectively.

AI
LLM
Machine Learning
NLP
Queryloop
Auzair
April 19, 2024
23 min read
Research

The Future of Database Queries: Evaluating Text-to-SQL and Text-to-NoSQL with AI

Text-to-SQL and Text-to-NoSQL with AI

In the wake of ChatGPT and other large language models (LLMs) gaining prominence, the fascination with Retrieval Augmented Generation (RAG) — essentially conversing directly with your data — has skyrocketed.

LLM
Text to SQL
Text to NoSql
Large Language Models
OpenAI
AI
Queryloop
Zain ul Abideen
April 4, 2024
7 min read
Research

Llama-Bitnet | Training a 1.58 bit LLM

What is 1 bit LLM and How to train 70M Llama-Bitnet?

Vanilla LLMs built upon the Transformer architecture typically operate in 16-bit precision (FP-16 or BF-16) and hence the major computation costs account for the floating point matrix addition and multiplication operations...

AI
NLP
Computer Vision
Machine Learning
Deep Learning
Queryloop
Zain ul Abideen
March 22, 2024
5 min read
Research

ORPO Outperforms SFT+DPO | Train Phi-2 with ORPO

Train Phi-2 with ORPO with LazyOrpo

Before jumping into ORPO, I am going to assume that you are well-acquainted with the process of fine-tuning LLMs for optimal performance. One of the most common technique used for fine-tuning is the Supervised Fine-Tuning (SFT)...

AI
LLM
Machine Learning
NLP
Deep Learning
Queryloop
Zain ul Abideen
March 14, 2024
12 min read
Research

Multi-GPU Training of 70B LLM with Deepspeed and FSDP+Qlora

Train 70–120B LLM on 4xA100s and 2xRTX3090s (Consumer-grade GPUs)

I have been working with bigger models like Mixtral 8x7B, Qwen-120B, and Miqu-70B recently. But the most important thing when playing with bigger models is the amount of compute resources they require during training...

AI
LLM
NLP
Machine Learning
Deep Learning
Queryloop
Zain ul Abideen
February 29, 2024
6 min read
Research

Everything you need to know about Google's new Gemma 7B and 2B Models

Also releasing Gemma-7B-Openhermes and Gemma-2B-Openhermes

Google has been in the LLM space for quite some time now, yet Gemma remains their first open LLM. The release of Gemma has stirred the community and everyone is excited to try it out. Like everyone, I am no exception...

AI
Google
Gemma Model
Machine Learning
Deep learning
Queryloop
Zain ul Abideen
February 17, 2024
11 min read
Research

Best SLM? Stable LM vs Tiny LLama vs Mini CPM vs Qwen 1.5 | War of SLMs

Benchmarking Emotional intelligence evaluation, Code Generation, Text summarization, and Narrative composition.

Small Language Models (SLMs) have been the talk of the town for some time now. Different models are being released almost everyday with the focus to achieve on par results with Large Language Models (LLMs). However, in terms of computational and memory cost, SLMs are already ahead...

AI
Machine Learning
Deep Learning
Language Model
LLM
Queryloop