Symbol table for the C-Compiler

In this assignment we learnt how a symbol table is implemented in a complier. Each new symbol is pushed into the hashtable (implemented as a array of pointers of elements)
of the Symbol table stack. We sum up the ascii values of the characters of the symbol and take the modulo with the MAXHASHSIZE to determine the array index for a particular entry. The problem with this implementation seems to be the constant size of the hashtable we are allocating on entering each new scope. If a scope has lesser number of variables compared to one that has close to MAXHASHSIZE, we’ll unnecessarily waste memory; Check the size of the memory using gdb by

print sizeof(symbolStackTop->symbolTablePtr->hashTable)

and it comes to 128. We defined MAXHASHSIZE as 32, and size of each integer is 4 bytes in my machine (AMD Athlon(tm) Processor L110). So whatever be the number of elements in the block – we always allocate 128 bytes.

Now this brings us to the question – symbolStackTop->symbolTablePtr just assumes different address depending on the order of and number of times enterscope() and leavescope() gets called. So do we free() the stack top (to avoid memory leak), each time we leaveScope()? Then we’ll loose the entires for that particular scope and would not be able to use it later on (it appears we’ll be using it in some other module of the project). If we don’t free(), where do we store the reference for the entry?

Another issue was the data structure Element has a data member key in it. But why do we need it? Our hashing function selects a block in the array, and on collision we resort to chaining. So does it really need a key as a data member?

Lets see what people have to say about this …

CS473 – Writing a compiler for a simple language

This semester one of neatest course I have taken is CS 473: Compiler Design. Professor V. N. Venkatakrishnan is teaching the course. We will be writing a complier for a language called C-, so lots of coding and we really get to understand how the abstract concepts are implemented. It will be a series of six homework projects.

Last semester I planed to note down stuff while doing CS 450 Introduction to Networking – another programming intensive course. And due to many other things – that did not materialize. But this time, we have to submit a short essay on what we learnt in each project which gets graded. Well quite a reason to blog. Here is the first essay.

Implementation of Scanner for C– with (f)lex

What I learnt:
The basic structure of a lex program.

Still not feeling confident:
Using lex api results gives optimized results. How to make efficient usage of it in rules and user submitted routine
Apart form a few documentation, there is hardly any that’s contrite but makes you feel confident

References used:
While none were fully read, peeped into following
[1] http://dinosaur.compilertools.net/flex/
[2] http://www.cs.manchester.ac.uk/~pjj/cs5031/ho/
[3] http://docs.sun.com/app/docs/doc/802-1952/6i5uu4c0f?a=view
[4] http://www.ibm.com/developerworks/library/l-lex.html
[5] http://stackoverflow.com/questions/641701/excellent-online-tutorial-for-lex-and-yacc

Further reading planned from:
[1] lex & yacc, Second Edition – ByDoug Brown, John Levine, Tony MasonPublisher:O’Reilly Media

Beisdes:
Just a few lines of bash scripting to do the test cases at once (assuming the test directory is in pwd).


rm -f testresult;
for i in `ls test/*.c`;do
echo $i>> testresult; cat -n $i >> testresult; echo “———–“>> testresult; ./cmlexer $i >> testresult;
done