SQL Interview Questions & Answers

SQL Basics

SQL (Structured Query Language) is the standard language for managing and manipulating relational databases. It enables users to perform operations like querying, updating, and managing data efficiently.

Introduction to SQL

SQL provides a powerful way to interact with databases. It includes commands for creating structures (DDL), modifying data (DML), controlling access (DCL), and handling transactions (TCL). Understanding its fundamentals is crucial for database management.

SQL Data Types

SQL data types define the kind of data a column can hold. Common types include:

Numeric: INT, FLOAT, DECIMAL
Character/String: CHAR, VARCHAR, TEXT
Date/Time: DATE, TIMESTAMP
Boolean: BOOLEAN
Choosing the right data type optimizes storage and query performance.

DDL (Data Definition Language)

DDL commands define and modify database structures.

CREATE – Used to create tables, indexes, and schemas.
ALTER – Modifies an existing table structure (e.g., add/drop columns).
DROP – Deletes tables, views, or databases permanently.
These commands directly impact database design and integrity.

DML (Data Manipulation Language)

DML commands handle data operations within tables.

SELECT – Retrieves data from tables based on conditions.
INSERT – Adds new records to a table.
UPDATE – Modifies existing records.
DELETE – Removes specific records from a table.
Efficient use of DML ensures effective data retrieval and manipulation.

DCL (Data Control Language)

DCL controls user permissions in databases.

GRANT – Provides access rights to users or roles.
REVOKE – Removes previously granted permissions.
This ensures database security by restricting unauthorized access.

TCL (Transaction Control Language)

TCL commands manage database transactions, ensuring consistency.

COMMIT – Saves all pending changes permanently.
ROLLBACK – Reverts the database to a previous state in case of failure.
SAVEPOINT – Creates checkpoints within a transaction for partial rollbacks.
Proper transaction handling maintains data integrity and prevents corruption.

SQL Queries:

SELECT Statements

The SELECT statement retrieves data from one or more tables. It is the most fundamental query in SQL.
Example:

SELECT first_name, last_name FROM employees;

Best Impression Tip: Explain how SELECT can be combined with filtering (WHERE), sorting (ORDER BY), and aggregation (GROUP BY) for powerful data retrieval.

WHERE Clause and Filtering Data

The WHERE clause filters records based on specified conditions.
Example:

SELECT * FROM orders WHERE status = 'Shipped';

Best Impression Tip: Mention that WHERE supports operators like =, >, <, LIKE, IN, and BETWEEN, making filtering flexible and efficient.

ORDER BY Clause

Used to sort query results in ascending (ASC, default) or descending (DESC) order.
Example:

SELECT name, price FROM products ORDER BY price DESC;

Best Impression Tip: Highlight performance optimization by ensuring indexes exist on columns used for sorting.

DISTINCT Keyword

Eliminates duplicate values from query results.
Example:

SELECT DISTINCT category FROM products;

Best Impression Tip: Explain that DISTINCT applies to the entire row unless specified for a single column.

LIMIT/OFFSET

Controls the number of rows returned, useful for pagination.
Example:

SELECT * FROM customers LIMIT 10 OFFSET 20;

Best Impression Tip: Explain how LIMIT optimizes performance by reducing data retrieval and how OFFSET is used for pagination in large datasets.

Practical SQL Queries

Below are the practical scenarios and optimization techniques for each SQL query concept to help you stand out in an interview :

SELECT Statements

🔹 Scenario: Retrieve a list of employees' names and salaries from the HR database.
🔹 Optimization Tip: Avoid using SELECT * unless necessary. Instead, specify only the required columns to improve query performance.

SELECT first_name, last_name, salary FROM employees;

WHERE Clause and Filtering Data

🔹 Scenario: Fetch all orders placed in the last 30 days.
🔹 Optimization Tip: Use indexed columns in the WHERE clause for faster searches. Avoid functions on columns (WHERE DATE(order_date) = …) as they can prevent index usage.

SELECT * FROM orders WHERE order_date >= NOW() - INTERVAL 30 DAY;

ORDER BY Clause

🔹 Scenario: Get the top 5 highest-paid employees.
🔹 Optimization Tip: Ensure an index exists on the column being sorted (salary in this case). Sorting large datasets without indexing can cause performance issues.

SELECT first_name, last_name, salary FROM employees ORDER BY salary DESC LIMIT 5;

DISTINCT Keyword

🔹 Scenario: Find all unique product categories in an inventory.
🔹 Optimization Tip: Instead of DISTINCT, consider using GROUP BY when you need additional aggregations (e.g., count of items per category).

SELECT DISTINCT category FROM products;

SELECT category, COUNT(*) FROM products GROUP BY category;

LIMIT/OFFSET

🔹 Scenario: Implement pagination in a web application showing 10 users per page.
🔹 Optimization Tip: Using large OFFSET values can slow down performance. Instead, use keyset pagination (e.g., filtering on indexed columns like id for better performance).

-- Inefficient: (OFFSET scans skipped rows)

SELECT * FROM users ORDER BY user_id ASC LIMIT 10 OFFSET 100;

-- Efficient: (Keyset pagination using WHERE clause)

SELECT * FROM users WHERE user_id > 100 ORDER BY user_id ASC LIMIT 10;

Interview Tip:

💡 Emphasize Performance Optimization – Always mention indexing, query execution plans (EXPLAIN keyword), and avoiding unnecessary computations to show a deeper understanding of SQL.

Advanced Indexing Techniques & Real-World Case Studies to Impress in an Interview

To stand out in an SQL interview, showcasing knowledge of indexing, query optimization, and real-world problem-solving is crucial. Below are advanced techniques with practical use cases:

Using Indexes to Optimize Queries

🔹 Problem: Slow Query Performance

A company’s e-commerce website has a slow-loading products page because of frequent queries on the products table.

🔹 Solution: Create an Index on Frequently Searched Columns

Instead of scanning the entire table, an index on the category column can speed up retrieval.

CREATE INDEX idx_category ON products(category);

SELECT * FROM products WHERE category = 'Electronics';

✅ Result: Query time is reduced significantly, improving website performance.

🔹 Interview Tip: Mention Covering Indexes

A covering index stores all needed data inside the index, eliminating additional disk lookups.
Example: If a query only fetches name and price, create an index that includes both columns:

CREATE INDEX idx_product ON products(category, name, price);

Avoiding Full Table Scans in WHERE Clause

🔹 Problem: Using Functions on Indexed Columns

Consider a banking system that finds transactions in the last 30 days:

SELECT * FROM transactions WHERE DATE(transaction_date) >= CURDATE() - INTERVAL 30 DAY;

⚠️ Issue: Applying DATE() to transaction_date prevents the database from using an index.

🔹 Solution: Rewrite the Query for Index Usage

SELECT * FROM transactions WHERE transaction_date >= CURDATE() - INTERVAL 30 DAY;

✅ Result: The query is now index-friendly and runs faster.

Using Composite Indexes for Multi-Column Searches

🔹 Problem: Searching with Multiple Conditions

A social media app often runs this query:

SELECT * FROM posts WHERE user_id = 123 AND created_at > NOW() - INTERVAL 7 DAY;

🔹 Solution: Add a Composite Index

CREATE INDEX idx_posts ON posts(user_id, created_at);

✅ Result: The database can efficiently fetch recent posts for a user without scanning the entire table.

Pagination Optimization with Indexing

🔹 Problem: Slow Pagination for Large Datasets

An online forum paginates discussion posts but sees performance degradation due to OFFSET:

SELECT * FROM posts ORDER BY created_at DESC LIMIT 10 OFFSET 10000;

⚠️ Issue: The database scans and skips 10,000 rows before fetching results.

🔹 Solution: Use Keyset Pagination

Instead of OFFSET, fetch the next batch based on indexed values:

SELECT * FROM posts WHERE created_at < (SELECT created_at FROM posts ORDER BY created_at DESC LIMIT 1 OFFSET 100)

ORDER BY created_at DESC LIMIT 10;

✅ Result: Faster pagination without performance issues.

Real-World Case Study: Optimizing Queries for an E-Commerce Website

Scenario:

An online marketplace faced slow page loads because of this query:

SELECT * FROM orders WHERE status = 'Shipped' ORDER BY order_date DESC LIMIT 50;

Optimizations Applied:

✅ Step 1: Index the status and order_date Columns

CREATE INDEX idx_orders ON orders(status, order_date DESC);

✅ Step 2: Optimize the Query Execution Plan

Run EXPLAIN to check if indexes are used efficiently.

EXPLAIN SELECT * FROM orders WHERE status = 'Shipped' ORDER BY order_date DESC LIMIT 50;

✅ Step 3: Use a Covering Index for Faster Lookups

CREATE INDEX idx_orders_cover ON orders(status, order_date DESC) INCLUDE (customer_id, total_price);

✅ Result: Queries now run 5x faster, leading to better user experience.

💡 Final Interview Tips for Standing Out

🔹 Showcase Real-World Thinking – Don't just explain syntax, discuss performance considerations.
🔹 Mention Query Optimization Techniques – Explain the use of EXPLAIN, ANALYZE, and indexing strategies.
🔹 Demonstrate Understanding of Big Data Challenges – Talk about handling millions of records efficiently.

Joins:

SQL Joins – Best Answers for Interviews with Optimization Tips 🚀

Joins allow us to combine data from multiple tables based on a related column. Impress interviewers by explaining use cases, performance optimization, and best practices.

INNER JOIN

🔹 Definition: Returns only matching rows from both tables.
🔹 Use Case: Get a list of customers who have placed an order.

SELECT customers.customer_id, customers.name, orders.order_id

FROM customers

INNER JOIN orders ON customers.customer_id = orders.customer_id;

✅ Optimization Tip:

Index foreign keys (orders.customer_id) to speed up joins.
Use EXISTS instead of INNER JOIN for large datasets when checking existence.

LEFT JOIN (LEFT OUTER JOIN)

🔹 Definition: Returns all rows from the left table and matching rows from the right table. Missing matches return NULL.
🔹 Use Case: Get a list of all employees and their assigned departments (even if some have no department).

SELECT employees.employee_id, employees.name, departments.department_name

FROM employees

LEFT JOIN departments ON employees.department_id = departments.department_id;

✅ Optimization Tip:

Avoid filtering with WHERE right_table.column IS NULL if performance is an issue; instead, use NOT EXISTS for better execution.
Indexing on departments.department_id helps improve performance.

RIGHT JOIN (RIGHT OUTER JOIN)

🔹 Definition: Returns all rows from the right table and matching rows from the left table. Missing matches return NULL.
🔹 Use Case: Get all departments, including those without assigned employees.

SELECT employees.employee_id, employees.name, departments.department_name

FROM employees

RIGHT JOIN departments ON employees.department_id = departments.department_id;

✅ Optimization Tip:

Right joins are less common—try reversing the table order and using LEFT JOIN instead for better readability.

FULL JOIN (FULL OUTER JOIN)

🔹 Definition: Returns all rows from both tables. If there's no match, NULL appears in missing columns.
🔹 Use Case: Get all customers and all orders (including customers with no orders and orders with no customers).

SELECT customers.customer_id, customers.name, orders.order_id

FROM customers

FULL JOIN orders ON customers.customer_id = orders.customer_id;

✅ Optimization Tip:

Some databases like MySQL don’t support FULL JOIN directly. Instead, use UNION:

SELECT customers.customer_id, customers.name, orders.order_id

FROM customers

LEFT JOIN orders ON customers.customer_id = orders.customer_id

UNION

SELECT customers.customer_id, customers.name, orders.order_id

FROM customers

RIGHT JOIN orders ON customers.customer_id = orders.customer_id;

SELF JOIN

🔹 Definition: A table joins itself, useful for hierarchical relationships.
🔹 Use Case: Find employees and their managers (both stored in the same table).

SELECT e1.name AS employee, e2.name AS manager

FROM employees e1

LEFT JOIN employees e2 ON e1.manager_id = e2.employee_id;

✅ Optimization Tip:

Ensure an index on manager_id for fast lookups.

CROSS JOIN

🔹 Definition: Returns the Cartesian product (every row from Table A joins with every row from Table B).
🔹 Use Case: Generate all product-category combinations.

SELECT products.name, categories.category_name

FROM products

CROSS JOIN categories;

✅ Optimization Tip:

Avoid CROSS JOIN unless needed, as it can create large result sets.
Use INNER JOIN with conditions when applicable to limit rows.

🔥 Advanced Interview Insights & Performance Tips

💡 Use EXPLAIN to analyze joins

EXPLAIN SELECT * FROM employees INNER JOIN departments ON employees.department_id = departments.department_id;

💡 Index foreign keys in both tables to speed up joins.
💡 Use LEFT JOIN instead of FULL JOIN when possible for better performance.
💡 For large datasets, consider denormalization to reduce joins in performance-critical queries.

Real-World Case Study: Optimizing SQL Joins for Large Databases

Scenario:
A large e-commerce company has two key tables:

customers (10 million records)
orders (100 million records)

They need to generate a daily report of all customers along with their most recent order.

❌ Initial Query (Slow & Inefficient)

SELECT c.customer_id, c.name, o.order_id, o.order_date

FROM customers c

LEFT JOIN orders o ON c.customer_id = o.customer_id

WHERE o.order_date = (SELECT MAX(order_date) FROM orders WHERE customer_id = c.customer_id);

Problems: 1. Subquery in WHERE clause executes for every customer, making it extremely slow.
2️. No index usage on order_date, leading to full table scans.
3️. Performance worsens as the dataset grows.

✅ Optimized Query (Using JOIN + Indexing)

SELECT c.customer_id, c.name, o.order_id, o.order_date

FROM customers c

LEFT JOIN (

SELECT customer_id, order_id, order_date

FROM orders

WHERE order_date >= NOW() - INTERVAL 1 DAY

) o ON c.customer_id = o.customer_id;

Why This is Faster:
✅ Uses a derived table (orders filtered first) before the JOIN – reducing rows early.
✅ Indexing on order_date improves performance.
✅ Avoids correlated subqueries, which execute per row.

⚡️ Advanced Optimization: Using Window Functions (Best for PostgreSQL, MySQL 8+, SQL Server)

SELECT customer_id, name, order_id, order_date

FROM (

SELECT c.customer_id, c.name, o.order_id, o.order_date,

ROW_NUMBER() OVER (PARTITION BY c.customer_id ORDER BY o.order_date DESC) AS rn

FROM customers c

LEFT JOIN orders o ON c.customer_id = o.customer_id

) t

WHERE rn = 1;

✅ Uses ROW_NUMBER() instead of a subquery – making it much faster.
✅ Indexes on order_date and customer_id reduce scan time.

📌 Final Takeaways

💡 Always filter early – use derived tables or CTEs.
💡 Index wisely – ensure indexes exist on JOIN keys (customer_id) and filtered columns (order_date).
💡 Use EXPLAIN ANALYZE to detect slow queries before deploying in production.

🚀 Advanced SQL Clauses & Functions – Best Interview Answers & Optimization Tips

Advanced SQL techniques help optimize queries, analyze data efficiently, and solve complex problems. Here’s how to impress interviewers with practical examples, performance tips, and real-world scenarios.

1️. GROUP BY and HAVING Clauses

🔹 Definition:

GROUP BY aggregates data based on a column.
HAVING filters grouped data (like WHERE, but for groups).

🔹 Use Case: Get the total sales per customer, but only for those with sales above $1000.

SELECT customer_id, SUM(total_amount) AS total_sales

FROM orders

GROUP BY customer_id

HAVING SUM(total_amount) > 1000;

✅ Optimization Tip:

Use indexes on grouped columns (customer_id) for faster aggregation.
Avoid HAVING with non-aggregated columns – filter early using WHERE instead.

2️. Aggregate Functions (COUNT, SUM, AVG, MIN, MAX)

🔹 Definition: Used for summarizing data.
🔹 Use Case: Find the average order value for each customer.

SELECT customer_id, AVG(total_amount) AS avg_order_value

FROM orders

GROUP BY customer_id;

✅ Optimization Tip:

Use COUNT(column_name) instead of COUNT(*) when possible for better performance.
For big datasets, partition data first (e.g., PARTITION BY in window functions).

3️. Subqueries and Nested Queries

🔹 Definition: A query inside another query, often used for filtering and aggregation.
🔹 Use Case: Find customers who placed at least one order over $500.

SELECT customer_id, name

FROM customers

WHERE customer_id IN (

SELECT DISTINCT customer_id FROM orders WHERE total_amount > 500

);

✅ Optimization Tip:

Use EXISTS instead of IN for better performance on large datasets.

SELECT customer_id, name

FROM customers c

WHERE EXISTS (

SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.total_amount > 500

);

4️. CASE Statements

🔹 Definition: Conditional logic inside SQL queries.
🔹 Use Case: Categorize orders as "Low", "Medium", or "High" based on total amount.

SELECT order_id, total_amount,

CASE

WHEN total_amount < 100 THEN 'Low'

WHEN total_amount BETWEEN 100 AND 500 THEN 'Medium'

ELSE 'High'

END AS order_category

FROM orders;

✅ Optimization Tip:

Ensure CASE logic aligns with indexed columns to avoid full scans.
Use CASE inside aggregates for conditional calculations.

5️. COALESCE Function

🔹 Definition: Replaces NULL with a default value.
🔹 Use Case: Show customer emails, but if NULL, display "Not Provided" instead.

SELECT customer_id, COALESCE(email, 'Not Provided') AS email_address

FROM customers;

✅ Optimization Tip:

Use COALESCE instead of IFNULL in multi-database environments for compatibility.
Use COALESCE(col1, col2, col3, 'Default') to check multiple fallback columns.

6️. String Functions (CONCAT, LENGTH, SUBSTRING)

🔹 Definition: Used for manipulating text data.

🔹 Use Cases:
📌 Concatenate first and last names:

SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;

📌 Find customers with short names:

SELECT name FROM customers WHERE LENGTH(name) < 5;

📌 Extract area code from phone numbers:

SELECT SUBSTRING(phone, 1, 3) AS area_code FROM customers;

✅ Optimization Tip:

Avoid functions on indexed columns in WHERE clauses, as they prevent index usage.

7️. Date and Time Functions (NOW, DATEADD, DATEDIFF)

🔹 Definition: Handle date calculations.

🔹 Use Cases:
📌 Get today’s date:

SELECT NOW();

📌 Find orders placed in the last 7 days:

SELECT * FROM orders WHERE order_date >= NOW() - INTERVAL 7 DAY;

📌 Calculate customer age from birthdate:

SELECT name, DATEDIFF(NOW(), birth_date) / 365 AS age FROM customers;

✅ Optimization Tip:

Use indexed timestamp columns to avoid full table scans.
Store dates in UTC format for consistency across time zones.

🔥 Final Interview Tips

💡 Explain when to use subqueries vs. joins (Joins are usually faster).
💡 Demonstrate performance tuning (Indexing, EXPLAIN, EXISTS vs. IN).
💡 Give real-world examples (e.g., e-commerce reports, financial calculations).

Dev - GuruHub

Pages

Search