Skip to contents

This function finds non-ASCII characters in a given text column of a data frame. It uses the quick_conc function for finding non-ASCII characters. The results can optionally be sorted by the non-ASCII characters.

Usage

find_non_ascii(tbl, id = NULL, text = "text", sort_by_chr = FALSE, ...)

Arguments

tbl

A data frame that contains the text data.

id

A character string indicating the name of the identifier column in tbl (default is NULL).

text

A character string indicating the name of the text column in tbl (default is "text").

sort_by_chr

A logical indicating whether to sort the results by the non-ASCII characters (default is FALSE).

...

Arguments to pass on to quick_conc(). For example, you can extend the resulting search window witb the argument n = 10,

Value

A tibble with identified non-ASCII characters. Each row represents an instance of a non-ASCII character. If id is not NULL, the tibble also includes the identifier for each instance. If sort_by_chr is TRUE, the tibble is sorted by the non-ASCII characters.

Examples

if (FALSE) {
data <- tibble(id = 1:2,
text = c("This is a text with a non-ASCII character: é.", "Another text without."))
find_non_ascii(data, id = "id")
}