Replacing multiple characters in a column of emails in Oracle

Ask Time：2012-08-14T21:35:26 Author：BasicHorizon

So basically I have a column of multiple emails and some of them are invalid and contain different characters/carriage returns that are not allowed.

Below is how i go about finding the invalid emails in a select statement but I have no clue on how to replace them individually for example if a carriage return is found I know i'd use a replace statement. Same with any special characters. But that would involve writing a separate query for each possible case?

Basically What I'm asking for is the most efficient way possible to iterate through my table replacing any characters in an email address that matches one of those case statements

select /*+  parallel(a,12) full(a) */  a.row_id, a.par_row_id, a.attrib_01,     a.created_by, a.last_upd_by from s_contact_xm a 
where a.type = 'Email' and (a.attrib_01 IS NULL
or a.attrib_01 like '% %'
or a.attrib_01 like '%@%@%'
or a.attrib_01 like '%..%'
or a.attrib_01 like '%;%'
or a.attrib_01 like '%:%'
or attrib_01 not like '%@%'
or a.attrib_01 like '%/%'
or a.attrib_01 like '%\%'
or a.attrib_01 like '%|%'
or a.attrib_01 like '%@.%'
or a.attrib_01 like '%@'
or a.attrib_01 like '%.'
or a.attrib_01 like '%(%'
or a.attrib_01 like '%)%'
or a.attrib_01 like '%<%'
or a.attrib_01 like '%>%'
or a.attrib_01 like '%#%'
or a.attrib_01 like '%"%'
or a.attrib_01 like '%.@%'
or a.attrib_01 like '%..%'
or a.attrib_01 like '.%'
or a.attrib_01 IS NULL
or INSTR(a.attrib_01, CHR(13)) > '0'
or INSTR(a.attrib_01, CHR(10)) > '0') and a.created_by = ‘1-XAAX5P’

Author:BasicHorizon，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/11953462/replacing-multiple-characters-in-a-column-of-emails-in-oracle

APC :

The thing is, you've got several different categaories of potential error. Some are fixable typos; some are unfixable typos; and some are just wrong. Now, is it possible to come up with some bulletproof rules for determining the category of any given error?\n\nPerhaps.\n\nFor instance, you could convert every occurence of '%..%' to '%.%'. Likewise you could replace the carriage returns with null. Those are fixable typos.\n\nBut if somebody has included \" in an email address with there's no way you can be sure they actually meant to type: do you assume they typed 2 and didn't notice they were also pressing [shift] or do you replace it with null (i.e. remove it)? That is not a fixable typo (but you might decide a guess is good enough). \n\nIf the email address doesn't contain a @ then it's not a valid email address and there's no way to fix it.\n\nSo you probably need several separate UPDATE statements. You will run one to translate the strings where you're going to attempt a one-for-one replacement. This is the technique for the things you want to replace with null, such as those carriage returns.\n\ntranslate(attrib_01, '()\"'||chr(13), '902')\n\n\nYou'll need several passes to transform multi-character strings e.g.\n\nreplace(attrib_01, '..', '.') \n\n\nThen you'll probably want to trim leading or trailing dots\n\ntrim(both '.' from attrib_01 ) \n\n\nFinally, you'll need to report on all those addresses you cannot fix, such as values with no (or several) strudels.\n\nYou may be able to compress some of these rules into fewer steps using REGEXP_REPLACE. The regular expressions will get extremely complicated. It will be easier to make things correct using the old skool Oracle replace functions. I suggest you only use regex if you really need the performance. Even then you will still need to make more than one pass through the data. \n\n\n\n\n \"'()\"' does this mean nulls and parenthesis? \"\n\n\nThe Oracle documentation is comprehensive, free and online. You can read all about REPLACE(). TRANSLATE() and TRIM() there. \n\nBut I'll explain the REPLACE() call a bit more. This function substitutes each character in the first string with the matching character in the second string. Any characters which lack a match are discarded. Hence ( is replaced with 9, ) is replaced with 0 and \" is replaced with 2. (look at a QWERTY keyboard to understand why). chr(13) (carriage return) has no match and so is discarded (or replaced with NULL if you prefer to think of it that way).\n\n\n\nThinking about it, you could deploy a CASE statement in the UPDATE set clause, to apply different REPLACE(), TRIM() and TRANSLATE() calls in one execution. It depends on how impenetrable you want your code to be :) ",

2012-08-14T15:25:36

Replacing multiple characters in a column of emails in Oracle