Electronic Design
Text Encoding Simplifies Microcontroller Command Parsing

Text Encoding Simplifies Microcontroller Command Parsing

Parsing text such as used in SCPI commands for a PIC microcontroller consumes considerable processor time and resources. By converting the text into two-letter pairs using a simple encoding table, and representing each by a 16-bit number, a much-more efficient parsing scheme can be implemented. 

While working on a Microchip PIC project, I created a set of SCPI-style (Standard Commands for Programmable Instruments) commands to control the PIC. These SCPI commands use the first four characters of text words separated by a colon.

Download this article in .PDF format
This file type includes high resolution graphics and schematics when applicable.

In previous projects, I found that parsing text consumes significant computing time and code space. Typically, text parsing is handled by string comparisons or developing a parsing tree. Neither of these techniques is simple to design and implement on a microcontroller.

I knew that it would be faster to parse commands if I could convert the text into 16-bit numbers. So, I developed a method that converts the first four characters of each command to upper case and then encodes them as a 16-bit number. Each character is translated into a four-bit representation and then packed into a 16-bit number. 

But don’t you need 5 bits to represent 26 letters? Yes, if each letter is treated uniquely. To reduce the letters to four bits, I analyzed two-letter pairs and grouped the letters based on how often they are used. This encoding worked out well for the 25 or so commands I needed. (More extensive command sets may need to be checked for duplication and the encoding changed accordingly.)

The encoding gives , A, E, I, O, U, Y, and S single codes since they are very common. The consonants are then grouped together in sets. The table shows the encoding for the letters.

One implementation in C with the space character handled separately is:

const unsigned char LookUpTable[] = {0x1,0xA,0xB,0xA,0x2,0xD,0xB,0xD, 0x3,0xF,0xC,0x8,0xE,0x9,0x4,0xC,0xF,0x9,0x7,0x8,0x5,0xE,0xE,0xF,0x6,0xF};

These two examples show the encoding of SCPI commands:

CLS translates to 0xB870

CALCulate:AVERage:COUNt translates to 0xB18B,0x1E29,0xB459

After encoding the incoming text, parsing is just a matter of checking 16-bit numbers rather than text strings. This can be done as a CASE statement or series of IF statements, either of which is much simpler (and usually faster) than handling text strings in a microcontroller. Using this approach greatly reduced the amount of code needed.

David Hunter is an electrical engineer with First Consulting Inc. in Rochester, N.Y. He has a BSEE and an MSEE from the Rochester Institute of Technology and has worked for more than 25 years as a design engineer in embedded-systems software, digital, analog, and RF circuit hardware design.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.