Thursday, September 06, 2007

A Simple Lexical Analyzer .... Enjoy.

I got bored and wrote a simple lexical analyzer and thought about walking through explaining it but in all honesty its rather self explanatory especially with the inclusion of the diagrams I made and the BNF grammar I am supplying to go with the code snippet.

The grammar simply parses words or phrases in a file with white space as a delimiter and requiring that all words or phrases start with a letter and are followed by any combination of letters and digits.

BNF:


G[<Word>]

<Word> ::= <Letter> | <Letter> <LetterDigit>
<LetterDigit> ::= <Letter> | <Digit> | <LetterDigit> <Letter> | <LetterDigit> <Digit>
<Letter> ::= a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
<Digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

A graphical representation of the grammar:


And the finite state automaton:

Finally, the code ... in python of course :)

Note: If you copy and paste this code it will not work because for one reason or another I can't get blogger to format the tabs correctly, please download the code from the link provided at the bottom.


UPDATE: Now featuring code indentation goodness.

###############################

#!/usr/bin/python
import sys

class SimpleLex:
"""@Author: Adam Miller - Simple lexical anyzer"""

#allowed alphabetic characters in grammar
alpha = 'abcdefghijklmnopqrstuvwxyz'

#allowed digits in grammar
digit = '0123456789'

#white space symbolic representation
wSpace = ' \n \t'

#grammar table of relations
table = [
[1,3,0],
[1,1,2],
[2,2,2],
[3,3,3],
]

#relations of type and their state index
relations = {
"alpha":0,
"digit":1,
"wSpace":2
}


def __init__(self, file_name):
"""open a file, extract contents, close file,
initialize character list, and state variable"""
self.f = open(file_name, 'r')
self.lines = self.f.readlines()
self.f.close()
self.chars = []
self.state = 0

def scan(self):
"""process the file's contents one character at a time"""
for line in self.lines:
for c in line:
# 3 is the error state
if self.state == 3:
print "Error"
self.state = 0
self.chars = []
# 2 is the success state
elif self.state == 2:
print ''.join(self.chars)
self.state = 0
self.chars = []

#let table drive state transitions
self.state = self.table[self.state][self.lex(c)]

"""print final character buffer because the loop will end
execution and not let the last success state be checked"""
print ''.join(self.chars)

def lex(self, c):
"""return the correct relational index to the type of c"""
self.chars.append(c)
if c in self.alpha:
return self.relations["alpha"]
elif c in self.digit:
return self.relations["digit"]
elif c in self.wSpace:
return self.relations["wSpace"]
else:
return 0

#use the code i just wrote
if len(sys.argv) < 2 or len(sys.argv) > 2:
sys.stderr.write("Usage: sampleLex \n")
elif len(sys.argv) == 2:
lexer = SimpleLex(sys.argv[1])
lexer.scan()
###############################

For your viewing pleasure, a small example of the "little lexer" in action:


###############################
adam@pseudogen:~$ ./simpleLex.py lexFile.txt
hello
world
from
my
ub3r
l33t
lexical
analyzer

###############################

The code, the diagrams and the file used in this example are all available here.

Wednesday, September 05, 2007

Computer Science: Through the eye of those on the other side of the looking glass

So I sit there in front of my faithful computer hacking away at what I am pretending to be homework so that I think I am actually being scholastically productive. As I sit, I receive and instant message from a good friend of mine who is a Computer Science Major at Rice university and he has a question about some Mac software's auto backup capabilities, and I answer to the best of my knowledge and the conversation comes to a halt. I simply assume he is troubleshooting something for someone and more likely than not it is for his girlfriend for she is the proud owner of an Intel MacBook, but never the less I go back to my procrastination like a good student.

A few more minutes pass by (could have been over an hour for all I know, procrastination isn't really the best time keeping activity) and I get an instant message from my friend's girlfriend stating that she has been given the task of writing an essay about "What Computer Science is" for her introductory computer science course at Rice (we will assume for all practical purposes that this course is for non-comp sci majors). And she exclaimed to me about the daunting tale that was writing said essay along with the irony that went with the fact that the word processing software she was using had crashed and she lost everything, but being the good student she is she trudged on and rewrote it from scratch. Though, this time, with a different attitude and a delightful spin on it all. Her essay was far too priceless not to publish somewhere and I got to it first! So, without further a due I present to you "What is Computer Science?" by Rachel Gittleman:

" Earlier this evening I wrote a short essay on the definition of computer science, a discipline I am familiar with only through the influence of a computer scientist I have been dating for nearly three years. I did not let him read it because I was embarrassed by how little I actually knew about the subject, and as I was in the process of saving and finally being done with the humiliating exercise, my word processor crashed and took my hard-won essay along with it. Even with the help of said computer scientist boy-friend, the essay proved to be irrecoverable.

So what is computer science? A couple of hours ago my answer was as optimistic and technical as possible for someone who really has no idea what she is talking about. I stressed the dichotomy of the discipline as both a study of computation and computational machines--theory and practical programming, math and engineering. I mentioned the relative newness of the discipline compared to others in academia (I even dropped famous names), and conjectured that views on computer science must be very different now than a mere decades ago, but that at its essence, the field is about what programmable machines can do and how to make them do it. Having my word processor crash was disheartening and, after fruitlessly trying to recover the document, I was convinced to rewrite the essay in its current form.

Computer science is new, rapidly evolving, and incredibly broad. It has the power to drastically improve the quality of our lives over incredibly short spans of time, and in the past few decades it has provided us with earth-shatteringly new tools that quickly have become integral parts of the daily routines of even the most computer- illiterate (myself sadly counted among them). Although these qualities make computer science fascinatingly current and applicable in a way that the older sciences and humanities are not, there are also inherent downsides, particularly when it comes to accountability. The constant demand for new programs and applications and the inability of most of the public (again, including myself) to understand what goes into making them naturally results in buggy programs that can crash without notice, taking hard-working students’ single-spaced essays on computer science with them. I think the public would do much more than sigh and send an error report if a mechanical engineer built something so easily broken.

I still think computer science is the study of computation and computational machines, and I still think that it is an exciting new discipline that I want to know more about (that is why I am taking an introductory course after all), but I also want to make sure to express my hope that one day all that theory and engineering and programming will eventually be accountable for making me a word processor that will not crash."