Detailed Guide for String Wrangling in SQL | MySQL | SQL Analysis

Extracting information from string columns is almost a repetitive necessity in Data Engineers, Data Scientists, and Business Analysts day to day tasks, and this task can be done using a programming language such as Python, or by SQL depends on your application and on the task required. In this tutorial, we will discover together how to extract and to manipulate strings using standard SQL, and we will use MySQL database as our database engine, most of the functions here are common in all database vendors, the difference maybe will be in syntax or keywords but the same concepts will exist in all database vendors.

Following is the content of our article with main topics we will discuss and each topic will be explained with examples

Change String Format

To change string format to Lower, Upper, or first character initial we will use the following functions:

UPPER Case

to change string format to Upper case we can use UPPER() or UCASE() functions as following

/* Upper Case */
SELECT EMP_NO, FIRST_NAME, UPPER(FIRST_NAME) 'UPPER_FIRST_NAME', 
       LAST_NAME, UPPER(LAST_NAME) 'UPPER_LAST_NAME' FROM EMPLOYEES;
SELECT EMP_NO, FIRST_NAME, UCASE(FIRST_NAME)  'UPPER_FIRST_NAME',
       LAST_NAME, UCASE(LAST_NAME) 'UPPER_LAST_NAME' FROM EMPLOYEES;

lower Case

to change string format to Upper case we can use LOWER() or LCASE() functions as following

/* Lower Case */
SELECT EMP_NO, FIRST_NAME, LOWER(FIRST_NAME) 'LOWER_FIRST_NAME', LAST_NAME, LOWER(LAST_NAME) 'LOWER_LAST_NAME' FROM EMPLOYEES_ALL_UPPER;
SELECT EMP_NO, FIRST_NAME, LCASE(FIRST_NAME)'LOWER_FIRST_NAME', LAST_NAME, LCASE(LAST_NAME) 'LOWER_LAST_NAME' FROM EMPLOYEES_ALL_UPPER;

Note: We had created EMPLOYEES_ALL_UPPER from the main EMPLOYEES table but with all fields in upper case, to demonstrate the lower functionality

Initial Capital Case

In this case, the first letter of the string will be capital, and to apply this we will need to implement this logic using MySQL functions as there is no direct function to achieve this, in this function we take the first letter of the string then apply Upper method and concatenate the converted first letter with the remaining of the word as follows

/* Initcap */
SELECT EMP_NO, FIRST_NAME,
               CONCAT(UCASE(LEFT(first_name, 1)), SUBSTRING(first_name, 2)) 'INIT_FIRST_NAME'
               FROM EMPLOYEES;

We can use Upper, Lower not only in the SELECT statement of a query but we can use also in WHERE conditions or HAVING conditions to match certain values, example as following

SELECT EMP_NO, FIRST_NAME, LAST_NAME, HIRE_DATE
FROM EMPLOYEES
WHERE UPPER(FIRST_NAME)='FLORINA';

Merge or Slice String

One of the needs we can have with strings is that we need to merge two strings together, or we need to cut certain pieces of a string to extract information

Concatenate Strings

To concatenate strings we will use CONCAT or by CONCAT_WS functions, we will know the difference between both in the following examples, let’s start by CONCAT which has the following syntax

CONCAT

CONCAT function accepts columns names or strings and converts all to one string together

/* Concatenation */
SELECT EMP_NO, FIRST_NAME, LAST_NAME, CONCAT(FIRST_NAME,' ',LAST_NAME)  AS 'Full Name' 
FROM EMPLOYEES

Example on Concatenating columns with string will be as following, we will add title initials before the full name

SELECT EMP_NO, FIRST_NAME, LAST_NAME, CONCAT('Mr/Ms. ',FIRST_NAME,' ',LAST_NAME)  AS 'Full Name' 
FROM EMPLOYEES

Take Care: if any of the concatenated strings are Null, the output string will be Null also.

The solution to this problem is to use a CASE expression to check null values or update the table’s data with standard values for null columns

/* Handle Null Values */
SELECT EMP_NO, FIRST_NAME, LAST_NAME,
       CASE 
        WHEN LAST_NAME IS NULL THEN CONCAT(FIRST_NAME,' ')
        ELSE CONCAT(FIRST_NAME,' ',LAST_NAME) END AS 'Full Name' 
FROM EMPLOYEES;

CONCAT_WS

in this function we have the same inputs except that for all string will be merged we will use the same separator to merge all of them as following

SELECT EMP_NO, FIRST_NAME, LAST_NAME, CONCAT_WS('-',FIRST_NAME,LAST_NAME)  AS 'Full Name' 
FROM EMPLOYEES

Take Care: Only if the first string or column in the CONCAT_WS is null, the output string will be null, otherwise, the output will be a string and intermediate null values will be ignored

Substring Strings

In some we cases we need to extract information from a string, and we don’t need the full string in our task, to do that we will use SUBSTR function, with any of the following forms of the function

SELECT EMP_NO, SUBSTR(FIRST_NAME,1,1) AS 'Initial', FIRST_NAME FROM EMPLOYEES; 
SELECT EMP_NO, SUBSTR(FIRST_NAME FROM 1 FOR 2) AS 'Initial', FIRST_NAME FROM EMPLOYEES; 

In This example, we are substring starting from index number 1 for the length of 1 character to get the name initial

Indexing starts from left to write with the first character as number 1, and with negative numbers from the last character of the string with -1, the example below for string “DATAVALLEY” if we need to substring the indexes will be as shown below

For example to get the last character of a string we will use the negative indexing as following

SELECT EMP_NO, SUBSTR(FIRST_NAME,-1,1) AS 'Last-Character', FIRST_NAME FROM EMPLOYEES; 
SELECT EMP_NO, SUBSTR(FIRST_NAME FROM -1 FOR 1) AS 'Last-Character', FIRST_NAME FROM EMPLOYEES; 

If we removed the substring length, it will be considered to the end of the string

SELECT EMP_NO, SUBSTR(FIRST_NAME,3) AS 'Initial', FIRST_NAME FROM EMPLOYEES; 
SELECT EMP_NO, SUBSTR(FIRST_NAME FROM -2) AS 'Initial', FIRST_NAME FROM EMPLOYEES; 

Substring also can be used in WHERE and HAVING statements to filter specific records as following example

SELECT DISTINCT TITLE 
FROM TITLES
WHERE SUBSTR(TITLE,1,6)='Senior';

Search String

It is very common to search for a value in a text to see if it exists or not, or in another case to get the position of this value in the text, to search for a value in the string we will use INSTR or POSITION functions, both serve the same purpose

SELECT EMP_NO,INSTR(FIRST_NAME,'A') 'A Position' ,FIRST_NAME
FROM employees;

SELECT EMP_NO, POSITION('A' IN FIRST_NAME) 'A Position', FIRST_NAME 
FROM EMPLOYEES;

In this example we search for the position of character ‘A’ in the FIRST_NAME field using both functions for demonstration

Value of Zero means the value we are searching for is not available in the String, we can use this functions also in the WHERE and HAVING statements to filter in the existence of a value in our string or in our column


SELECT DISTINCT TITLE 
FROM TITLES
WHERE INSTR(TITLE,'Assistant')>0;

Clean Spaces from String

To remove spaces from a string value we can use TRIM, LTRIM, RTRIM functions based on our needs, as following

/****** Remove Spaces *************/
SELECT  EMP_NO, CONCAT('     ',FIRST_NAME,'           ') 'Name with Space', 
                TRIM(CONCAT('     ',FIRST_NAME,'           ')) 'Name without Space'
FROM EMPLOYEES;

Replace Value in a String

Replacing a value in a string is a straight forward using REPLACE function as following

/****** Replace String ***********/
SELECT EMP_NO, title,  REPLACE(TITLE,' ','-') FROM TITLES;

Get Length of a String

To get the length of a string column, output string from expression it is very straight forward using LENGTH function

SELECT EMP_NO, FIRST_NAME,LENGTH(FIRST_NAME), LAST_NAME, LENGTH(LAST_NAME) FROM EMPLOYEES;

Check our Latest Articles

Ahmed Ibrahem

Ahmed Ibrahem is working as Data Engineering Team lead, with a wide experience in data management projects and technical implementation using different technologies, and delivering end to end projects starting from business analysis to Data Warehouse modeling and implementation to BI design to customers in different industries.

guest
0 Comments
Inline Feedbacks
View all comments