Parsing of Hex bytes in Perl - Perl Programming

Posted by dharshana.ve (dharshana.ve), 25 October 2006

Hi all,

I am a student working on a project using Perl.

My work is...

I have few hex bytes as inputs for my program. Its a series of hex bytes. for eg: 01 0a ab 2a 1c etc...

Here each byte stands for some representation. I also have a data file in which all the representations are stored. For eg:
01 => "abcd"
02 => "ertg"
1c => "srfrf"
etc...

I have to parse the input hex bytes using this data file and the output should show the parsed input series.

The problem i face in linear search id, there are few hex bytes in which the digits are same but the representations are different. For eg: 01 stands for "abcd" and also in some other place it stands for "cdfv" also.

My program follows ASN.1 format. I mean The bytes come in a fashion like TYPE and then LENGTH and then VALUE..

So i have to parse in teh above mentioned manner.

I am unable to come up with an algorithm for the same as i am very new to Perl programming.

Also, the data files are very huge and input also is very huge file in which there are several hex bytes.

So plz can anyone help me in this?

The raw files(Given as input) looks like the following:

0d a1 0b 02 01 02 02 01 0e 30 03 04 01 93 7f 01 00 8b 2a 1c 0d a2 0b 02 01 03 30 06 02 01 0e 80 01 04

The output shud look like the following:

02 OpCode Tag
01 OpCode Length
26 Foron
0A Reg1
0B Era1
0C Act1
0D De1
0E Int1
11 Reg4
12 Get5
13 Pro6
3B Pro7
3C Uns7
3D Uns
10 Not8

Here The first byte 02 stand for a TYPE

the second byte 01 stands for its LENGTH

the the following bytes after that stands for the possible occurences of the VALUES.

here since the length is 01, one of the possible values only can occur once... i mean each time...

So like this there are many TYPE LENGTH VALUES in the data files... so hex value in one TLV can have one description and the same hex value in another can have another description..

If i get some logic to search in each TLVs also, its good...

Plz help me with this.

Posted by admin (Graham Ellis), 25 October 2006

I can't see how you got from the raw data to the result set in your example ... so I can't make any suggestions as to how to do it in a program. Yes, I know you've explained it but, sorry, I'm not getting it. Can anyone else see what's probably staring me in the face?

Posted by dharshana.ve (dharshana.ve), 26 October 2006

Hmm.. I understand. First of all Thanks a lot for the reply Graham.

Let me modify it a little more.... The xplanation can be like this....

The raw files(Given as input) looks like the following:

0d a1 0b 02 01 02 02 01 0e 30 03 04 01 93 7f 01 00 8b 2a 1c 0d a2 0b 02 01 03 30 06 02 01 0e 80 01 04

The data files in which all possible hex combinations stored look like this

.......
02 OpCode Tag
01 OpCode Length
26 Foron
0A Reg1
0B Era1
0C Act1
0D De1
0E Int1
11 Reg4
12 Get5
13 Pro6
3B Pro7
3C Uns7
3D Uns
10 Not8
......

The above is just a small block of the data file... Here The byte 02 stand for a TYPE

the byte 01 stands for its LENGTH

the the following bytes after that stands for the possible occurences of the VALUES.

But in the input string, according to the LENGTH, the value part occurs. For eg: if LENGTH is 01, only one possible VALUE follows it... else if LENGTH is 02, then 2 VALUES etc...

The output according to the input shown here shud look like the following:

0d -> De1
a1 -> Comp1
0b -> Era1
02 -> Opcode tag
01 -> Opcode length*
02 -> cdff

etc.....

here since the length is 01(where it is starred), one of the possible values only can occur once... i mean each time...

So like this there are many TYPE LENGTH VALUES in the data files... so hex value in one TLV can have one description and the same hex value in another can have another description..

If i get some logic to search in each TLVs also, its good...

I hope this will explain you better....

Posted by admin (Graham Ellis), 26 October 2006

OK - that's a bit clearer now that I've seen the sample output. You want to read the hex code listing file in line by line, split each line and make a hash of each possible hex code. Then you can parse the incoming hex sequence and look up the relevant item in the hash.

Of course, I's not going to be quite that simple I don't think as you've got a data file with hex capitals and raw data with lower case, but a few lc() functions will sort that out. And I think that some of the hex codes might be position dependant (I think that's what you're saying) in which case you'll need to add a bit more logic.

Posted by dharshana.ve (dharshana.ve), 27 October 2006

You are right Graham. You got my point. The lower case and upper case is not a problem because that was my mistake while posting it here. Lowercase is followed here, ob both sides.

This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.