0% found this document useful (0 votes)
77 views13 pages

SQL Vs Pythons

The document compares how to perform various types of joins in SQL and Python Pandas. It shows examples of inner, left, right, full, union, intersect, except, semi, anti, and cross joins displayed in tables and explains how to write the equivalent SQL and Pandas code to perform each type of join.

Uploaded by

Abhijeet Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views13 pages

SQL Vs Pythons

The document compares how to perform various types of joins in SQL and Python Pandas. It shows examples of inner, left, right, full, union, intersect, except, semi, anti, and cross joins displayed in tables and explains how to write the equivalent SQL and Pandas code to perform each type of join.

Uploaded by

Abhijeet Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SQL Joins

v/s
Python Pandas
@vimanyuchaturvedi
INNER JOIN
LEFT_TABLE RIGHT_TABLE
ID LEFT_VALUE ID RIGHT_VALUE

1 LEFT 1 1 RIGHT 1 ID LEFT_VALUE RIGHT_VALUE

2 LEFT 2 4 RIGHT 2 1 LEFT 1 RIGHT 1

3 LEFT 3 5 RIGHT 3 4 LEFT 4 RIGHT 2

4 LEFT 4 6 RIGHT 4

SQL
SELECT * FROM LEFT_TABLE AS LT INNER JOIN RIGHT_TABLE AS RT
ON [Link] = [Link]

pandas
SELF JOIN
LEFT_TABLE LEFT_TABLE

ID LEFT_VALUE ID LEFT_VALUE

1 LEFT 1 1 LEFT 1

2 LEFT 2 2 LEFT 2

3 LEFT 3 3 LEFT 3

4 LEFT 4 4 LEFT 4

ID LEFT_VALUE LEFT_VALUE2

1 LEFT 1 LEFT 1

2 LEFT 2 LEFT 2

3 LEFT 3 LEFT 3

4 LEFT 4 LEFT 4

SQL
SELECT * FROM LEFT_TABLE AS LT INNER JOIN LEFT_TABLE AS LT2
ON [Link] = [Link]

pandas
LEFT JOIN

LEFT_TABLE RIGHT_TABLE

ID LEFT_VALUE ID RIGHT_VALUE

1 LEFT 1 1 RIGHT 1

2 LEFT 2 4 RIGHT 2

3 LEFT 3 5 RIGHT 3

4 LEFT 4 6 RIGHT 4

ID LEFT_VALUE RIGHT_VALUE

1 LEFT 1 RIGHT 1

2 LEFT 2 NULL

3 LEFT 3 NULL

4 LEFT 4 RIGHT 2

SQL
SELECT * FROM LEFT_TABLE AS LT LEFT JOIN RIGHT_TABLE AS RT
ON [Link] = [Link]

pandas
RIGHT JOIN

LEFT_TABLE RIGHT_TABLE
ID LEFT_VALUE ID RIGHT_VALUE

1 LEFT 1 1 RIGHT 1

2 LEFT 2 4 RIGHT 2

3 LEFT 3 5 RIGHT 3

4 LEFT 4 6 RIGHT 4

ID LEFT_VALUE RIGHT_VALUE

1 LEFT 1 RIGHT 1

4 LEFT 4 RIGHT 2

5 NULL RIGHT 3

6 NULL RIGHT 4

SQL
SELECT * FROM LEFT_TABLE AS LT RIGHT JOIN RIGHT_TABLE AS RT
ON [Link] = [Link]

pandas
FULL JOIN
LEFT_TABLE RIGHT_TABLE
ID LEFT_VALUE ID RIGHT_VALUE

1 LEFT 1 1 RIGHT 1

2 LEFT 2 4 RIGHT 2

3 LEFT 3 5 RIGHT 3

4 LEFT 4 6 RIGHT 4

ID LEFT_VALUE RIGHT_VALUE

1 LEFT 1 RIGHT 1

2 LEFT 2 NULL

3 LEFT 3 NULL

4 LEFT 4 RIGHT 2

5 NULL RIGHT 3

6 NULL RIGHT 4

SQL
SELECT * FROM LEFT_TABLE AS LT FULL OUTER JOIN RIGHT_TABLE
AS RT ON [Link] = [Link]

pandas
UNION ALL
LEFT_TABLE RIGHT_TABLE
ID VALUE ID VALUE ID VALUE

1 VALUE 1 1 VALUE 1 1 VALUE 1

2 VALUE 2 4 VALUE 2 2 VALUE 2

3 VALUE 3 5 VALUE 3 3 VALUE 3

4 VALUE 4 6 VALUE 4 4 VALUE 4

1 VALUE 1

4 VALUE 2

5 VALUE 3

6 VALUE 4

SQL
SELECT * FROM LEFT_TABLE UNION ALL SELECT * FROM RIGHT_TABLE

pandas
UNION
LEFT_TABLE RIGHT_TABLE
ID VALUE ID VALUE

1 VALUE 1 1 VALUE 1

2 VALUE 2 4 VALUE 2

3 VALUE 3 5 VALUE 3

4 VALUE 4 6 VALUE 4

ID VALUE

1 VALUE 1

2 VALUE 2

3 VALUE 3

4 VALUE 4

4 VALUE 2

5 VALUE 3

6 VALUE 4

SQL
SELECT * FROM LEFT_TABLE UNION SELECT * FROM RIGHT_TABLE

pandas
INTERSECT
LEFT_TABLE RIGHT_TABLE

ID VALUE ID VALUE

1 VALUE 1 1 VALUE 1

2 VALUE 2 4 VALUE 2

3 VALUE 3 5 VALUE 3

4 VALUE 4 6 VALUE 4

ID VALUE

1 VALUE 1

SQL
SELECT * FROM LEFT_TABLE INTERSECT SELECT * FROM RIGHT_TABLE

pandas
EXCEPT
LEFT_TABLE RIGHT_TABLE

ID VALUE ID VALUE

1 VALUE 1 1 VALUE 1

2 VALUE 2 4 VALUE 2

3 VALUE 3 5 VALUE 3

4 VALUE 4 6 VALUE 4

ID VALUE

2 VALUE 2

3 VALUE 3

4 VALUE 4

SQL
SELECT * FROM LEFT_TABLE EXCEPT SELECT * FROM RIGHT_TABLE

pandas
SEMI JOIN
LEFT_TABLE RIGHT_TABLE

ID VALUE VALUE ID VALUE

VALUE 2 2 VALUE 2
1 VALUE 1
VALUE 3 3 VALUE 3
2 VALUE 2

3 VALUE 3

4 VALUE 4

SQL
SELECT * FROM LEFT_TABLE WHERE VALUE IN (SELECT VALUE FROM
RIGHT_TABLE )

pandas
ANTI JOIN
LEFT_TABLE RIGHT_TABLE
VALUE ID VALUE
ID VALUE

VALUE 2 1 VALUE 1
1 VALUE 1
VALUE 3 4 VALUE 4
2 VALUE 2

3 VALUE 3

4 VALUE 4

SQL
SELECT * FROM LEFT_TABLE WHERE VALUE NOT IN (SELECT VALUE
FROM RIGHT_TABLE )

pandas
CROSS JOIN

ID1 ID2
LEFT_TABLE RIGHT_TABLE
1 1
ID ID
1 2
1 1
2 1
2 2
2 2
3
3 1

3 2

SQL
SELECT * FROM LEFT_TABLE CROSS JOIN RIGHT_TABLE

pandas

Common questions

Powered by AI

A SEMI JOIN returns rows from the left table where one or more matches are found in the right table but does not duplicate rows from the left table based on the right table contents. Useful for subsetting data to rows that have related entries in another table, implemented using a WHERE EXISTS subquery .

INTERSECT in SQL returns the common rows that appear in both SELECT statements with duplicates removed. It is useful for finding common records across datasets, such as identifying shared members between groups or common attributes from different sources .

An ANTI JOIN returns rows from the left table that do not match any row in the right table, effectively doing the reverse of a regular JOIN. An EXCEPT operation, on the other hand, returns distinct rows from the left table that are not in the right table, removing duplicates as well .

An INNER JOIN in both SQL and Pandas returns only the rows with matching keys in both tables. FULL OUTER JOIN returns all rows when there is a match in either left or right table records, filling in with NULLs where no match is found. It is achieved in Pandas by setting the 'how' parameter to 'outer' in the 'merge' function .

A RIGHT JOIN returns all rows from the right table and the matched rows from the left table, adding NULLs for non-matches. To find unmatched rows from the right table, filter for NULLs in left table columns post join. In Pandas, perform a merge with 'how' set to 'right' and filter using isnull or similar function .

A SELF JOIN is used when a table is joined with itself to query hierarchical data or when comparing rows within the same table for finding duplicates or calculating successive differences. In SQL, it involves joining a table with itself using an alias; in Pandas, it can be done using the merge method with the same DataFrame as both inputs .

UNION ALL is chosen over UNION when duplicates need to be preserved, as UNION ALL does not remove duplicate rows while UNION does by default, which can be less efficient with large datasets where duplicates are known to exist .

A CROSS JOIN in SQL returns the Cartesian product of the two tables, meaning every row in the first table is combined with every row in the second table. This join is useful when all possible combinations of two datasets are needed, as opposed to INNER or OUTER JOINs which are used to combine datasets based on matching keys .

SQL is optimized for handling large-scale data efficiently, using indexes and optimized query plans. Pandas, being an in-memory data manipulation tool, can become constrained by RAM limits and less efficient for very large datasets without optimization steps like chunking. SQL databases can better handle concurrent requests and distributed data .

A LEFT JOIN in SQL returns all rows from the left table and the matched rows from the right table, with NULLs for non-matching rows from the right table. In Python Pandas, this is implemented using the 'merge' function with the parameter 'how' set to 'left' .

You might also like