05 May How To Replace Accented Characters With Regular latin Characters ?
We got a request from a client, to convert the accented characters of multiple Excel files to plain Latin.
Excel VBA is great and really handy but there is no method to perform such a task using VB code. On the other hand C# inherits all beautiful .Net libraries, and it’s so easy to solve the above problem. Why not bring the .Net world close to the “vintage” world of VBA?
So here is what we did. We prepared a method in C# (added also other languages samples) that fixes the problem and then made it public to the VBA IDE:
C#
------------------------
string accentedStr = "Staré Město";
byte[] tempBytes;
tempBytes = Encoding.GetEncoding("ISO-8859-8").GetBytes(accentedStr);
string asciiStr = Encoding.UTF8.GetString(tempBytes);
PHP
------------------------
$accentedStr = "Staré Město";
$tempBytes = iconv("UTF-8", "ISO-8859-8", $accentedStr);
$asciiStr = iconv("ISO-8859-8", "UTF-8", $tempBytes);
echo $asciiStr;
Javascript#1
------------------------
const accentedStr = "Staré Město";
// Encoding the accented string to ISO-8859-8
const utf8Encoder = new TextEncoder();
const utf8Buffer = utf8Encoder.encode(accentedStr);
const latin1Decoder = new TextDecoder("ISO-8859-8", {fatal: true});
const latin1Str = latin1Decoder.decode(utf8Buffer);
// Decoding the ISO-8859-8 string back to UTF-8
const latin1Encoder = new TextEncoder("ISO-8859-8", {NONSTANDARD_allowLegacyEncoding: true});
const latin1Buffer = latin1Encoder.encode(latin1Str);
const utf8Decoder = new TextDecoder();
const asciiStr = utf8Decoder.decode(latin1Buffer);
console.log(asciiStr);
Javascript#2
------------------------
function removeAccents(str) {
return str.normalize('NFD').replace(/[\u0300-\u036f]/g, '');
}
python
------------------------
accented_str = "Staré Město"
temp_bytes = accented_str.encode("ISO-8859-8", errors="replace")
ascii_str = temp_bytes.decode("ISO-8859-8")
print(ascii_str)
Powershell
------------------------
$accentedStr = "Staré Město"
$sourceEncoding = [System.Text.Encoding]::UTF8
$targetEncoding = [System.Text.Encoding]::GetEncoding("ISO-8859-8")
$byteArray = $sourceEncoding.GetBytes($accentedStr)
$tempBytes = $targetEncoding.GetString($byteArray)
$asciiStr = $sourceEncoding.GetString($targetEncoding.GetBytes($tempBytes))
Write-Host $asciiStr
MySQL
------------------------
-- Create a table to store the original and converted strings
CREATE TABLE string_conversion (
id INT AUTO_INCREMENT PRIMARY KEY,
original_str VARCHAR(255),
converted_str VARCHAR(255)
);
-- Insert the accented string into the table
INSERT INTO string_conversion (original_str)
VALUES ('Staré Město');
-- Update the table, converting the original string from utf8mb4 (default charset in MySQL) to latin1 (ISO-8859-1)
-- Note that ISO-8859-8 is not supported in MySQL, so we'll use ISO-8859-1 as an example
UPDATE string_conversion
SET converted_str = CONVERT(CONVERT(original_str USING latin1) USING utf8mb4)
WHERE id = 1;
-- Select the converted string from the table
SELECT converted_str FROM string_conversion WHERE id = 1;
Java
------------------------
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
public class Main {
public static void main(String[] args) {
String accentedStr = "Staré Město";
byte[] tempBytes;
try {
tempBytes = accentedStr.getBytes("ISO-8859-8");
String asciiStr = new String(tempBytes, StandardCharsets.UTF_8);
System.out.println(asciiStr);
} catch (java.io.UnsupportedEncodingException e) {
System.err.println("Unsupported encoding: " + e.getMessage());
}
}
}
And from the Excel VBA IDE:
We registered the C# Library from Tools | References, and we added the following method:
'
' VBA
'
Public Sub Convert()
Application.Cursor = xlWait
'Create Object from C#
Dim obj As CSharpTools.CSharpTools
Set obj = New CSharpTools.CSharpTools
Dim CountRows As Long
Dim ActiveCol
ActiveCol = ActiveCell.Column
Cells(1, ActiveCol).Select
'Get the last row in the current region
CountRows = Cells(1, ActiveCol).End(xlDown).Row
Dim i As Long
For i = 1 To CountRows
'Convert each cell to Latin characters
Cells(i, ActiveCol).Value = obj.ConvertFromUTF8ToLatin(Cells(i, ActiveCol).Value)
Next i
Application.Cursor = xlDefault
End Sub
'''''
''''' With the above code we can convert a whole Excel column at no time at all !
'''''
The code below is a complete tested C# console app using .Net Framewok 4.8.
You can run this app from the command line.
Give it an input text file with your own text. Leave a space and add the name of the output file and it will auto convert to latin for you at no time at all.
using System;
using System.IO;
using System.Text;
namespace AccentedStringConverter
{
internal class Program
{
private static void Main(string[] args)
{
if (args.Length != 2)
{
Console.WriteLine("Usage: AccentedStringConverter.exe ");
return;
}
string inputFilePath = args[0];
string outputFilePath = args[1];
if (!File.Exists(inputFilePath))
{
Console.WriteLine("Error: Input file does not exist.");
return;
}
string inputString = File.ReadAllText(inputFilePath, Encoding.UTF8);
Console.WriteLine("Original string: " + inputString);
string asciiStr = ConvertToASCII(inputString);
Console.WriteLine("Converted string: " + asciiStr);
File.WriteAllText(outputFilePath, asciiStr, Encoding.UTF8);
Console.WriteLine("Converted string saved to: " + outputFilePath);
}
private static string ConvertToASCII(string accentedStr)
{
byte[] tempBytes;
tempBytes = Encoding.GetEncoding("ISO-8859-8").GetBytes(accentedStr);
string asciiStr = Encoding.UTF8.GetString(tempBytes);
return asciiStr;
}
}
}
replace your accented characters using the online form below:
WE also BUILT A STAND ALONE QUICK TOOL TO LET YOU CONVERT THOSE ACCENTED CHARS:
Download your free CONVERSION TOOL
It’s a windows standalone tool, just Unzip and run.
The application was checked by Virus Total and found clean from malware, click here for more info.