Very often you may have to manipulate a column of text in a data frame with R. You may want to separate a column in to multiple columns in a data frame or you may want to split a column of text and keep only a part of it.
tidyr’s separate function is the best option to separate a column or split a column of text the way you want. Let us see some simple examples of using tidyr’s separate function.
Let us first load the R packages needed to see the examples with separate function.
Let us create a small data frame with a column of text separated by underscore.
The data frame contains just single column of file names.
Keyboard Shortcut: Select any cell from the column that has the address info. Then press and hold Ctrl, and hit Space. Once selected, go to the Data tab. Split text into separate columns by any characters. Whenever you want to pull text from a cell into different columns, this add-in will be the perfect online tool. It allows you to split values by any standard delimiters like spaces and line breaks, or use custom separators you have in your data. To split a column by position: In the Split a column by position dialog box: In the Number of character textbox, enter the number of characters used to. In the Number of character textbox, enter the number of characters used to split the text column. Select a Split option. Expand Show advanced. Below are the steps that will split multiple lines in a single cell into separate cells: Select the entire dataset that you want to split. Go to the Data tab. In the Data Tools group, click on the Text to Columns option. In the Text to Columns dialog box, in Step 1 of 3, select Delimited and click ‘Next’.
Let us use separate function from tidyr to split the “file_name” column into multiple columns with specific column name. Here, we will specify the column names in a vector.
By default, separate uses regular expression that matches any sequence of non-alphanumeric values as delimiter to split.
In this example, tidyr automatically found that the delimiters are underscore and dot and separted the single column to four columns with the names specified.
Often you want only part of text in a column. Let us see another example of a data frame with column containing text, but this time we specify only three columns for our output.
Note that we provide just three columns in separate function.
The output of separate() in this example contains only three column as we specified. And we also see a warning, since we left out the extra element present after separating the text.
We can use argument extra=’drop’ to specify separate to drop anything extra without warning us.
Similarly, if we want only the first element after splitting, we can just specify only one column for our output.
If you want an element that is in the middle after separating with separate, we can use dplyr’s select function select the column needed. For example, if we need the second element ‘Month’, we can combine tidyr’s separate with dplyr’s select.
Sometimes you may want to do opposite ehat separate can do, i.e. combine multiple columns into a single column. You guessed it right, tidyr has a cool function to do that. tidyr’s unite() complements separate() and combine multiple columns into a single column.
Let us see an example of unite() combining two columns created by separate(). Here, we first separate a column into three columns and then use unite() to combine the first two columns into a single column.
The output is a dataframe with two columns, where the first column is the result of unite().