Home:ALL Converter>UTF-8 / Unicode Text Encoding with RPostgreSQL

UTF-8 / Unicode Text Encoding with RPostgreSQL

Ask Time:2014-01-28T06:12:50         Author:David L

Json Formatter

I'm running R on a Windows machine which is directly linked to a PostgreSQL database. I'm not using RODBC. My database is encoded in UTF-8 as confirmed by the following R command:

dbGetQuery(con, "SHOW CLIENT_ENCODING")
#   client_encoding
# 1            UTF8

However, when some text is read into R, it displays as strange text in R.

For example, the following text is shown in my PostgreSQL database: "Stéphane"

After exporting to R it's shown as: "Stéphane" (the é is encoded as é)

When importing to R I use the dbConnect command to establish a connection and the dbGetQuery command to query data using SQL. I do not specify any text encoding anywhere when connecting to the database or when running a query.

I've searched online and can't find a direct resolution to my issue. I found this link, but their issue is with RODBC, which I'm not using.

This link is helpful in identifying the symbols, but I don't just want to do a find & replace in R... way too much data.

I did try running the following commands below and I arrived at a warning.

Sys.setlocale("LC_ALL", "en_US.UTF-8")
# [1] ""
# Warning message:
# In Sys.setlocale("LC_ALL", "en_US.UTF-8") :
#   OS reports request to set locale to "en_US.UTF-8" cannot be honored
Sys.setenv(LANG="en_US.UTF-8")
Sys.setenv(LC_CTYPE="UTF-8")

The warning occurs on the Sys.setlocale("LC_ALL", "en_US.UTF-8") command. My intuition is that this is a Windows specific issue and doesn't occur with Mac/Linux/Unix.

Author:David L,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/21392786/utf-8-unicode-text-encoding-with-rpostgresql
Scarabee :

As Craig Ringer said, setting client_encoding to windows-1252 is probably not the best thing to do. Indeed, if the data you're retrieving contains a single exotic character, you're in trouble:\n\n\n Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI\n driver: (could not Retrieve the result : ERROR: character 0xcca7 of\n encoding \"UTF8\" has no equivalent in \"WIN1252\" )\n\n\nOn the other hand, getting your R environment to use Unicode could be impossible (I have the same problem as you with Sys.setlocale... Same in this question too.).\n\nA workaround is to manually declare UTF-8 encoding on all your data, using a function like this one:\n\nset_utf8 <- function(x) {\n # Declare UTF-8 encoding on all character columns:\n chr <- sapply(x, is.character)\n x[, chr] <- lapply(x[, chr, drop = FALSE], `Encoding<-`, \"UTF-8\")\n # Same on column names:\n Encoding(names(x)) <- \"UTF-8\"\n x\n}\n\n\nAnd you have to use this function in all your queries:\n\nset_utf8(dbGetQuery(con, \"SELECT myvar FROM mytable\"))\n\n\n\n\nEDIT: Another possibility is to use RPostgres unstead of RPostgreSQL. I tested it (with the same config as in your question), and as far as I can see all declared encodings are automatically set to UTF-8.",
2017-01-27T18:37:34
Craig Ringer :

\n After exporting to R it's shown as: \"Stéphane\" (the é is encoded as é)\n\n\nYour R environment is using a 1-byte non-composed encoding like latin-1 or windows-1252. Witness this test in Python, demonstrating that the utf-8 bytes for é, decoded as if they were latin-1, produce the text you see:\n\n>>> print u\"é\".encode(\"utf-8\").decode(\"latin-1\")\né\n\n\nEither SET client_encoding = 'windows-1252' or fix the encoding your R environment uses. If it's running in a cmd.exe console you'll need to mess with the chcp console command; otherwise it's specific to whatever your R runtime is.",
2014-01-28T05:32:46
Guillaume Doucet :

If you use RPostgres::Postgres() as the first parameter of dbConnect() normally you will not have problem with encoding.\n\nI tried this script where I had the same problem and now my accented characters are ok.\n\ndbConnect(RPostgres::Postgres(),user=\"user\",password=\"psw\",host=\"host\",port=5432,dbname=\"db_name\")\n",
2020-03-30T12:40:52
yy