SQL Basics
SQL (Structured Query Language) is the standard language for
managing and manipulating relational databases. It enables users to perform
operations like querying, updating, and managing data efficiently.
Introduction to SQL
SQL provides a powerful way to interact with databases. It
includes commands for creating structures (DDL), modifying data (DML),
controlling access (DCL), and handling transactions (TCL). Understanding its
fundamentals is crucial for database management.
SQL Data Types
SQL data types define the kind of data a column can hold.
Common types include:
- Numeric:
INT, FLOAT, DECIMAL
- Character/String:
CHAR, VARCHAR, TEXT
- Date/Time:
DATE, TIMESTAMP
- Boolean:
BOOLEAN
Choosing the right data type optimizes storage and query performance.
DDL (Data Definition Language)
DDL commands define and modify database structures.
- CREATE
– Used to create tables, indexes, and schemas.
- ALTER
– Modifies an existing table structure (e.g., add/drop columns).
- DROP
– Deletes tables, views, or databases permanently.
These commands directly impact database design and integrity.
DML (Data Manipulation Language)
DML commands handle data operations within tables.
- SELECT
– Retrieves data from tables based on conditions.
- INSERT
– Adds new records to a table.
- UPDATE
– Modifies existing records.
- DELETE
– Removes specific records from a table.
Efficient use of DML ensures effective data retrieval and manipulation.
DCL (Data Control Language)
DCL controls user permissions in databases.
- GRANT
– Provides access rights to users or roles.
- REVOKE
– Removes previously granted permissions.
This ensures database security by restricting unauthorized access.
TCL (Transaction Control Language)
TCL commands manage database transactions, ensuring
consistency.
- COMMIT
– Saves all pending changes permanently.
- ROLLBACK
– Reverts the database to a previous state in case of failure.
- SAVEPOINT
– Creates checkpoints within a transaction for partial rollbacks.
Proper transaction handling maintains data integrity and prevents corruption.
SQL Queries:
SELECT Statements
The SELECT statement retrieves data from one or more tables.
It is the most fundamental query in SQL.
Example:
SELECT first_name, last_name FROM employees;
Best Impression Tip: Explain how SELECT can be
combined with filtering (WHERE), sorting (ORDER BY), and aggregation (GROUP BY)
for powerful data retrieval.
WHERE Clause and Filtering Data
The WHERE clause filters records based on specified
conditions.
Example:
SELECT * FROM orders WHERE status = 'Shipped';
Best Impression Tip: Mention that WHERE supports
operators like =, >, <, LIKE, IN, and BETWEEN, making filtering flexible
and efficient.
ORDER BY Clause
Used to sort query results in ascending (ASC, default) or
descending (DESC) order.
Example:
SELECT name, price FROM products ORDER BY price DESC;
Best Impression Tip: Highlight performance
optimization by ensuring indexes exist on columns used for sorting.
DISTINCT Keyword
Eliminates duplicate values from query results.
Example:
SELECT DISTINCT category FROM products;
Best Impression Tip: Explain that DISTINCT applies to
the entire row unless specified for a single column.
LIMIT/OFFSET
Controls the number of rows returned, useful for pagination.
Example:
SELECT * FROM customers LIMIT 10 OFFSET 20;
Best Impression Tip: Explain how LIMIT optimizes
performance by reducing data retrieval and how OFFSET is used for pagination in
large datasets.
Practical SQL Queries
Below are the practical scenarios and optimization
techniques for each SQL query concept to help you stand out in an
interview :
SELECT Statements
🔹 Scenario:
Retrieve a list of employees' names and salaries from the HR database.
🔹
Optimization Tip: Avoid using SELECT * unless necessary. Instead,
specify only the required columns to improve query performance.
SELECT first_name, last_name, salary FROM employees;
WHERE Clause and Filtering Data
🔹 Scenario: Fetch
all orders placed in the last 30 days.
🔹
Optimization Tip: Use indexed columns in the WHERE clause for
faster searches. Avoid functions on columns (WHERE DATE(order_date) = …) as
they can prevent index usage.
SELECT * FROM orders WHERE order_date >= NOW() - INTERVAL
30 DAY;
ORDER BY Clause
🔹 Scenario: Get
the top 5 highest-paid employees.
🔹
Optimization Tip: Ensure an index exists on the column being
sorted (salary in this case). Sorting large datasets without indexing can cause
performance issues.
SELECT first_name, last_name, salary FROM employees ORDER BY
salary DESC LIMIT 5;
DISTINCT Keyword
🔹 Scenario: Find
all unique product categories in an inventory.
🔹
Optimization Tip: Instead of DISTINCT, consider using GROUP BY when
you need additional aggregations (e.g., count of items per category).
SELECT DISTINCT category FROM products;
or
SELECT category, COUNT(*) FROM products GROUP BY category;
LIMIT/OFFSET
🔹 Scenario:
Implement pagination in a web application showing 10 users per page.
🔹
Optimization Tip: Using large OFFSET values can slow down performance.
Instead, use keyset pagination (e.g., filtering on indexed columns like id
for better performance).
-- Inefficient: (OFFSET scans skipped rows)
SELECT * FROM users ORDER BY user_id ASC LIMIT 10 OFFSET
100;
-- Efficient: (Keyset pagination using WHERE clause)
SELECT * FROM users WHERE user_id > 100 ORDER BY user_id
ASC LIMIT 10;
Interview Tip:
💡 Emphasize
Performance Optimization – Always mention indexing, query execution plans (EXPLAIN
keyword), and avoiding unnecessary computations to show a deeper understanding
of SQL.
Advanced Indexing Techniques & Real-World Case Studies to Impress in an
Interview
To stand out in an SQL interview, showcasing knowledge of indexing,
query optimization, and real-world problem-solving is crucial. Below are advanced
techniques with practical use cases:
Using Indexes to Optimize Queries
🔹 Problem: Slow Query
Performance
A company’s e-commerce website has a slow-loading
products page because of frequent queries on the products table.
🔹 Solution: Create an
Index on Frequently Searched Columns
Instead of scanning the entire table, an index on the category
column can speed up retrieval.
CREATE INDEX idx_category ON products(category);
SELECT * FROM products WHERE category = 'Electronics';
✅ Result: Query time is
reduced significantly, improving website performance.
🔹 Interview Tip: Mention Covering
Indexes
- A covering
index stores all needed data inside the index, eliminating additional
disk lookups.
- Example:
If a query only fetches name and price, create an index that includes both
columns:
CREATE INDEX idx_product ON products(category, name, price);
Avoiding Full Table Scans in WHERE Clause
🔹 Problem: Using
Functions on Indexed Columns
Consider a banking system that finds transactions in the last
30 days:
SELECT * FROM transactions WHERE DATE(transaction_date)
>= CURDATE() - INTERVAL 30 DAY;
⚠️ Issue: Applying DATE()
to transaction_date prevents the database from using an index.
🔹 Solution: Rewrite the
Query for Index Usage
SELECT * FROM transactions WHERE transaction_date >=
CURDATE() - INTERVAL 30 DAY;
✅ Result: The query is now
index-friendly and runs faster.
Using Composite Indexes for Multi-Column Searches
🔹 Problem: Searching with
Multiple Conditions
A social media app often runs this query:
SELECT * FROM posts WHERE user_id = 123 AND created_at >
NOW() - INTERVAL 7 DAY;
🔹 Solution: Add a
Composite Index
CREATE INDEX idx_posts ON posts(user_id, created_at);
✅ Result: The database can
efficiently fetch recent posts for a user without scanning the entire table.
Pagination Optimization with Indexing
🔹 Problem: Slow
Pagination for Large Datasets
An online forum paginates discussion posts but sees performance
degradation due to OFFSET:
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10 OFFSET
10000;
⚠️ Issue: The database scans
and skips 10,000 rows before fetching results.
🔹 Solution: Use Keyset
Pagination
Instead of OFFSET, fetch the next batch based on indexed
values:
SELECT * FROM posts WHERE created_at < (SELECT created_at
FROM posts ORDER BY created_at DESC LIMIT 1 OFFSET 100)
ORDER BY created_at DESC LIMIT 10;
✅ Result: Faster
pagination without performance issues.
Real-World Case Study: Optimizing Queries for an E-Commerce Website
Scenario:
An online marketplace faced slow page loads
because of this query:
SELECT * FROM orders WHERE status = 'Shipped' ORDER BY
order_date DESC LIMIT 50;
Optimizations Applied:
✅ Step 1: Index the status and
order_date Columns
CREATE INDEX idx_orders ON orders(status, order_date DESC);
✅ Step 2: Optimize the Query
Execution Plan
- Run
EXPLAIN to check if indexes are used efficiently.
EXPLAIN SELECT * FROM orders WHERE status = 'Shipped' ORDER
BY order_date DESC LIMIT 50;
✅ Step 3: Use a Covering Index
for Faster Lookups
CREATE INDEX idx_orders_cover ON orders(status, order_date
DESC) INCLUDE (customer_id, total_price);
✅ Result: Queries now run 5x
faster, leading to better user experience.
💡 Final Interview Tips for Standing Out
🔹 Showcase Real-World
Thinking – Don't just explain syntax, discuss performance considerations.
🔹
Mention Query Optimization Techniques – Explain the use of EXPLAIN, ANALYZE,
and indexing strategies.
🔹
Demonstrate Understanding of Big Data Challenges – Talk about handling millions
of records efficiently.
Joins:
SQL Joins – Best Answers for Interviews with Optimization
Tips 🚀
Joins allow us to combine data from multiple tables
based on a related column. Impress interviewers by explaining use cases,
performance optimization, and best practices.
INNER JOIN
🔹 Definition:
Returns only matching rows from both tables.
🔹
Use Case: Get a list of customers who have placed an order.
SELECT customers.customer_id, customers.name,
orders.order_id
FROM customers
INNER JOIN orders ON customers.customer_id =
orders.customer_id;
✅ Optimization Tip:
- Index
foreign keys (orders.customer_id) to speed up joins.
- Use EXISTS
instead of INNER JOIN for large datasets when checking existence.
LEFT JOIN (LEFT OUTER JOIN)
🔹 Definition:
Returns all rows from the left table and matching rows from the right
table. Missing matches return NULL.
🔹
Use Case: Get a list of all employees and their assigned departments
(even if some have no department).
SELECT employees.employee_id, employees.name,
departments.department_name
FROM employees
LEFT JOIN departments ON employees.department_id =
departments.department_id;
✅ Optimization Tip:
- Avoid
filtering with WHERE right_table.column IS NULL if performance is an
issue; instead, use NOT EXISTS for better execution.
- Indexing
on departments.department_id helps improve performance.
RIGHT JOIN (RIGHT OUTER JOIN)
🔹 Definition:
Returns all rows from the right table and matching rows from the left
table. Missing matches return NULL.
🔹
Use Case: Get all departments, including those without assigned
employees.
SELECT employees.employee_id, employees.name,
departments.department_name
FROM employees
RIGHT JOIN departments ON employees.department_id =
departments.department_id;
✅ Optimization Tip:
- Right
joins are less common—try reversing the table order and
using LEFT JOIN instead for better readability.
FULL JOIN (FULL OUTER JOIN)
🔹 Definition:
Returns all rows from both tables. If there's no match, NULL appears in
missing columns.
🔹
Use Case: Get all customers and all orders (including customers with
no orders and orders with no customers).
SELECT customers.customer_id, customers.name,
orders.order_id
FROM customers
FULL JOIN orders ON customers.customer_id =
orders.customer_id;
✅ Optimization Tip:
- Some
databases like MySQL don’t support FULL JOIN directly. Instead, use
UNION:
SELECT customers.customer_id, customers.name,
orders.order_id
FROM customers
LEFT JOIN orders ON customers.customer_id =
orders.customer_id
UNION
SELECT customers.customer_id, customers.name,
orders.order_id
FROM customers
RIGHT JOIN orders ON customers.customer_id =
orders.customer_id;
SELF JOIN
🔹 Definition: A
table joins itself, useful for hierarchical relationships.
🔹
Use Case: Find employees and their managers (both stored in the same
table).
SELECT e1.name AS employee, e2.name AS manager
FROM employees e1
LEFT JOIN employees e2 ON e1.manager_id = e2.employee_id;
✅ Optimization Tip:
- Ensure
an index on manager_id for fast lookups.
CROSS JOIN
🔹 Definition:
Returns the Cartesian product (every row from Table A joins with every
row from Table B).
🔹
Use Case: Generate all product-category combinations.
SELECT products.name, categories.category_name
FROM products
CROSS JOIN categories;
✅ Optimization Tip:
- Avoid
CROSS JOIN unless needed, as it can create large result sets.
- Use
INNER JOIN with conditions when applicable to limit rows.
🔥 Advanced Interview Insights &
Performance Tips
💡 Use EXPLAIN to
analyze joins
EXPLAIN SELECT * FROM employees INNER JOIN departments ON
employees.department_id = departments.department_id;
💡 Index foreign keys
in both tables to speed up joins.
💡
Use LEFT JOIN instead of FULL JOIN when possible for better performance.
💡
For large datasets, consider denormalization to reduce joins in
performance-critical queries.
Real-World Case Study: Optimizing SQL Joins for Large Databases
Scenario:
A large e-commerce company has two key tables:
- customers
(10 million records)
- orders
(100 million records)
They need to generate a daily report of all customers
along with their most recent order.
❌ Initial Query (Slow & Inefficient)
SELECT c.customer_id, c.name, o.order_id, o.order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date = (SELECT MAX(order_date) FROM orders
WHERE customer_id = c.customer_id);
Problems: 1. Subquery in WHERE clause executes
for every customer, making it extremely slow.
2️. No index usage on order_date, leading to
full table scans.
3️. Performance worsens as the dataset grows.
✅ Optimized Query (Using JOIN + Indexing)
SELECT c.customer_id, c.name, o.order_id, o.order_date
FROM customers c
LEFT JOIN (
SELECT
customer_id, order_id, order_date
FROM orders
WHERE order_date
>= NOW() - INTERVAL 1 DAY
) o ON c.customer_id = o.customer_id;
Why This is Faster:
✅
Uses a derived table (orders filtered first) before the JOIN – reducing
rows early.
✅
Indexing on order_date improves performance.
✅
Avoids correlated subqueries, which execute per row.
⚡️ Advanced Optimization: Using Window Functions (Best
for PostgreSQL, MySQL 8+, SQL Server)
SELECT customer_id, name, order_id, order_date
FROM (
SELECT
c.customer_id, c.name, o.order_id, o.order_date,
ROW_NUMBER() OVER (PARTITION BY c.customer_id ORDER BY o.order_date
DESC) AS rn
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
) t
WHERE rn = 1;
✅ Uses ROW_NUMBER()
instead of a subquery – making it much faster.
✅
Indexes on order_date and customer_id reduce scan time.
📌 Final Takeaways
💡 Always filter early
– use derived tables or CTEs.
💡
Index wisely – ensure indexes exist on JOIN keys (customer_id) and
filtered columns (order_date).
💡
Use EXPLAIN ANALYZE to detect slow queries before deploying in
production.
🚀 Advanced SQL Clauses & Functions – Best
Interview Answers & Optimization Tips
Advanced SQL techniques help optimize queries, analyze
data efficiently, and solve complex problems. Here’s how to impress
interviewers with practical examples, performance tips, and real-world
scenarios.
1️. GROUP BY and HAVING Clauses
🔹 Definition:
- GROUP
BY aggregates data based on a column.
- HAVING
filters grouped data (like WHERE, but for groups).
🔹 Use Case: Get
the total sales per customer, but only for those with sales above
$1000.
SELECT customer_id, SUM(total_amount) AS total_sales
FROM orders
GROUP BY customer_id
HAVING SUM(total_amount) > 1000;
✅ Optimization Tip:
- Use
indexes on grouped columns (customer_id) for faster aggregation.
- Avoid
HAVING with non-aggregated columns – filter early using WHERE instead.
2️. Aggregate Functions (COUNT, SUM, AVG, MIN, MAX)
🔹 Definition: Used
for summarizing data.
🔹
Use Case: Find the average order value for each customer.
SELECT customer_id, AVG(total_amount) AS avg_order_value
FROM orders
GROUP BY customer_id;
✅ Optimization Tip:
- Use
COUNT(column_name) instead of COUNT(*) when possible for better
performance.
- For
big datasets, partition data first (e.g., PARTITION BY in window
functions).
3️. Subqueries and Nested Queries
🔹 Definition: A
query inside another query, often used for filtering and aggregation.
🔹
Use Case: Find customers who placed at least one order over $500.
SELECT customer_id, name
FROM customers
WHERE customer_id IN (
SELECT DISTINCT
customer_id FROM orders WHERE total_amount > 500
);
✅ Optimization Tip:
- Use
EXISTS instead of IN for better performance on large datasets.
SELECT customer_id, name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM
orders o WHERE o.customer_id = c.customer_id AND o.total_amount > 500
);
4️. CASE Statements
🔹 Definition:
Conditional logic inside SQL queries.
🔹
Use Case: Categorize orders as "Low", "Medium",
or "High" based on total amount.
SELECT order_id, total_amount,
CASE
WHEN
total_amount < 100 THEN 'Low'
WHEN
total_amount BETWEEN 100 AND 500 THEN 'Medium'
ELSE 'High'
END AS
order_category
FROM orders;
✅ Optimization Tip:
- Ensure
CASE logic aligns with indexed columns to avoid full scans.
- Use
CASE inside aggregates for conditional calculations.
5️. COALESCE Function
🔹 Definition:
Replaces NULL with a default value.
🔹
Use Case: Show customer emails, but if NULL, display "Not
Provided" instead.
SELECT customer_id, COALESCE(email, 'Not Provided') AS
email_address
FROM customers;
✅ Optimization Tip:
- Use
COALESCE instead of IFNULL in multi-database environments for
compatibility.
- Use
COALESCE(col1, col2, col3, 'Default') to check multiple fallback
columns.
6️. String Functions (CONCAT, LENGTH, SUBSTRING)
🔹 Definition: Used
for manipulating text data.
🔹 Use Cases:
📌
Concatenate first and last names:
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM
employees;
📌 Find customers with
short names:
SELECT name FROM customers WHERE LENGTH(name) < 5;
📌 Extract area code
from phone numbers:
SELECT SUBSTRING(phone, 1, 3) AS area_code FROM customers;
✅ Optimization Tip:
- Avoid
functions on indexed columns in WHERE clauses, as they prevent index
usage.
7️. Date and Time Functions (NOW, DATEADD, DATEDIFF)
🔹 Definition:
Handle date calculations.
🔹 Use Cases:
📌
Get today’s date:
SELECT NOW();
📌 Find orders placed
in the last 7 days:
SELECT * FROM orders WHERE order_date >= NOW() - INTERVAL
7 DAY;
📌 Calculate customer
age from birthdate:
SELECT name, DATEDIFF(NOW(), birth_date) / 365 AS age FROM
customers;
✅ Optimization Tip:
- Use
indexed timestamp columns to avoid full table scans.
- Store
dates in UTC format for consistency across time zones.
🔥 Final Interview Tips
💡 Explain when to use
subqueries vs. joins (Joins are usually faster).
💡
Demonstrate performance tuning (Indexing, EXPLAIN, EXISTS vs. IN).
💡
Give real-world examples (e.g., e-commerce reports, financial
calculations).