[數位電路實驗期末報告] 微處理器實作 The Micro-Processor HCL 2516

 

Digital Circuit Experiments

Final Project - The Micro-Processor HCL 2516


Murphy Chen, Lan Chang, and Ing-Jei Huang


National Taiwan University

Department of Electrical Engineering

 

0 Abstract

-- Written by Murphy Chen <B82503131>

This report describes about the 16-bit micro-processor HCL-2516 we built. We described the design and the function of its registers, flip-flops, bus system, control unit, arithmetic and logic unit. We also described about the assembler, the download kit, and the applications we made. The experiences and the lessons we have learned are also presented.


1 System Overview

-- Written by Murphy Chen <B82503131>

What we have built primarily is a 16-bit micro-processor. The micro-processor is consist of 9 registers, 7 flip-flops, 25 instructions, a memory unit with 4096 words of 16 bits each, an arithmetic and logic unit, and a control unit. In addition, we have developed an assembler for this micro-processor so that we can write assembly programs conveniently. Finally, we have written several application programs for demonstrating the power of this micro-processor.

To communicate with the real world, the mirco-processor has the following two registers: OUTR and INPR, both are 16 bits wide, which can be used together with two flip-flops FGI and FGO to transfer 16-bit data to or from other devices.

To perform useful tasks, the micro-processor has the following instructions: And, Add, Load Word, Store Word, Branch Unconditionally, Branch and Save Return Address, Increment and Skip if Zero, Clear Register, Complement Register, Shift Register, Increment Register, Skip Next Instruction Conditionally, Halt Computer, Input Character, Output Character, Enable Interrupt, Disable Interrupt.

The application programs can be written in assembly language, and compiled by the assembler, and the assembler will output the resulted machine codes for programming into EPROMs. We have tried to build a download kit to transfer the resulted machine codes from a PC to SRAMs through a parallel printer port to facilatate the task of developing application programs, but we failed.

The block diagram of the system is depicted as follows:

IMG00001Fig.1 The System Overview



2 Introduction and Explaination of Each Subsystem

The system is divided into eight subsystems, including memory unit, registers, flip-flops, bus&decoders, control unit, arithmetic and logic unit, assembler, download kit, applications. They are described in the following sections.



2.1 Memory Unit

-- Written by Murphy Chen <B82503131>

The memory unit is used to provide a memory with 4096 words of 16 bits each. We can store data and instructions in it so as to execute user programs.

There are many practical ways in arranging the memory unit. The user can choose whatever way they like. For example, in the system developement time, the user may wish to use four 2K*8bit SRAMs as the memory unit. Because it is convenient to download programs into SRAMs and test them over and over again. After the user has fully tested his/her programs, he/she may want to use two 2K*8bit EPROMs and two 2K*8bit SRAMs as the memory unit. He/she may program EPROMs with read-only instructions and static data, and use SRAMs for storing and retrieving dynamic data.

In our applications, we arrange the memory unit in this way: address 0x000 to 0x7FF belongs to EPROMs, and address 0x800 to 0xFFF belongs to SRAMs.

The type of SRAMs we use is 6116. It is a 16,384-bit high-speed static RAM organized as 2K*8bit. We cascade two 6116s to provide the addressing space of 2K words of 16 bits each. The type of EPROMs we use is 27C256. We choose it because of its low price, but soon we find it provides another benefit as well. It is a 256K-bit erasable programmable ROM organized as 32K*8bit. We cascade two 27C256s to provide the addressing space of 2K words of 16 bits each. In fact, two 27C256s can provide 32K*16bit addressing space, so we found the benefit that we can regard two 27C256s as eight memory blocks, with each block provides 2K*16bit addressing space. So, we can write up to eight programs in two 27C256s in one time, when we want to execute one of the programs, we need only set the corresponding most significant bits of 27C256s. This means when one program has bugs, we need not wait for 30 minutes to erase the EPROMs, we need only write new programs in the next memory block of the same EPROM!

For the data sheet of SRAMs and EPROMs we used, please refer to the appendix.

2.2 Registers

-- Written by Murphy Chen <B82503131>

There are 9 registers in our micro-processor. The registers are AR, PC, DR, AC, IR, TR, OUTR, INPR, and SC.

AR stands for Address Register. It is used to tell the memory unit where to retreive the content of memory or where to store some value into the memory unit. PC stands for Program Counter. It is always pointed to the next instruction for execution. PC is used to tell the control unit where to find the next instruction for execution. DR stands for Data Register. It is used to receive data from memory, and it is also used to provide operands for the adder and logic circuit. AC stands for Accumulator. It is used to contain the results calculated from the adder and logic circuit or from memory. IR stands for Instruction Register. It is used to contain the current instruction fetched from memory to tell the control unit what operations are going to be done. TR stands for Temporary Register. It is used to temporarily contain the value of PC when executing in the interrupt cycle. OUTR stands for OUTput Register. It is used to send data to the peripheral devices. INPR stands for INPut Register. It is used to receive data from the peripheral devices. SC stands for Sequential Counter. It is used to provide timing signals for the control unit.

The text design files corresponding to these registers are REG1.TDF for PC, REG2.TDF for TR, REG4.TDF for SC, REGAR.TDF for AR, REGDR.TDF for DR, REGIR.TDF for IR, REGOUTR.TDF for OUTR in appendix. And, INPR and AC are combined with Adder and Logic Unit.

There are so many text design files for the registers, and this is because every register has its own particular function. Some can be incremented, some can be cleared, and some can be loaded, etc.

The detailed description and the simulated results of them are shown as follows.



2.2.1 REG1 (Program Counter)

Input
Output
CLR
LD
INR
a[].d is connected to
H
X
X
0
L
H
X
i[]
L
L
H
a[].q+1
L
L
L
a[].q

Function Table of REG1

IMG00002

The Simulated Result of REG1 from MAXPLUSII

REG1.TDF is to function as a 12-bit program counter. It has the ability to be loaded with a new 12-bit address value, and to be cleared when users press reset button to restart the micro-processor, and to be incremented during the fetching phase of every instruction and during some conditional branch instructions.

The mismatch of width between input and output is because that the input of Program Counter comes from the bus, and the bus is 16-bit wide, so the input is 16-bit wide, but the Program Counter itself is only 12-bit wide, because the addressing space is 4K words, so the most four significant bits come from the bus is to be thrown away by Program Counter, and under normal conditions, when Program Couter is ready to receive data from the bus, the most four significant bits of the data should always be zeros.



2.2.2 REG2 (Temporary Register)

Input
Output
CLR
LD
INR
a[].d is connected to
H
X
X
0
L
H
X
i[]
L
L
H
a[].q+1
L
L
L
a[].q

Function Table of REG2

IMG00003

The Simulated Result of REG2 from MAXPLUSII

REG2 is to to function as a 16-bit temporary register. Its function is similar to REG1, except that the wide of its input is equal to that of its output, both are 16-bit wide. It has the ability to be loaded with a new 16-bit address value, to be cleared, and to be incremented.

2.2.3 REG4 (Sequential Counter)

Input
Output
CLR
INR
a[].d is connected to
H
X
0
L
H
a[].q+1
L
L
a[].q

Function Table of REG4

IMG00004

The Simulated Result of REG4 from MAXPLUSII

REG4 is to to function as a 3-bit sequential counter. It has the ability to be cleared when users restart the micro-processor or at the end of an instruction cycle or after executing the HLT instruction, and to be incremented at each phase of an instruction cycle.

2.2.4 REGAR (Address Register)

Input
Output
CLR
LD
INR
a[].d is connected to
H
X
X
0
L
H
X
i[]
L
L
H
a[].q+1
L
L
L
a[].q

Function Table of REGAR

IMG00005

The Simulated Result of REGAR from MAXPLUSII

REGAR is to to function as a 12-bit address register. Its function is similar to REG1, except that it has one more output a11_not to give the invert signal of the most significant bit of a[] to the memory unit, for convenience with cascading four 2K*8bit SRAMs or EPROMs as the memory unit. It has the ability to be loaded with a new 16-bit address value, to be cleared, and to be incremented. The reason of the mismatch of width between the inputs and the outputs is the same as REG1.



2.2.5 REGDR (Data Register)

Input
Output
CLR
LD
INR
a[].d is connected to
H
X
X
0
L
H
X
i[]
L
L
H
a[].q+1
L
L
L
a[].q

Function Table of REGDR

IMG00006

The Simulated Result of REGDR from MAXPLUSII

REGDR is to to function as a 16-bit data register. Its function is similar to REG2, except that it has one more output DR_ZERO to tell the control unit whether DR equals zero or not in order to facilitate the implementation of the ISZ instruction. It has the ability to be loaded with a new 16-bit address value, to be cleared, and to be incremented.

2.2.6 REGIR (Instruction Register)

Input
Output
LD
a[].d is connected to
H
i[]
L
a[].q

Function Table of REGIR

IMG00007

The Simulated Result of REGIR from MAXPLUSII

REGIR is to to function as a 16-bit instruction register. It can be loaded with a 16-bit instruction during the fetch phase of an instruction cycle. It's output includes IR_15, OP[2..0], and b[11..0].

IR_15 comes from the most significant bit of instruction register, and it is an indication of whether this instruction is a direct memory-reference or an in-direct memory-reference instruction, and is also an indication of whether this instruction is a register-reference or an input-output instruction. OP[2..0] comes from the 12th bit to the 14th bit of instruction register, and it is an indication of which instruction it is. If OP[2..0] is equal to b"111" and IR_15 is equal to zero, it means that the current instruction to be executed is a register-reference instruction. If OP[2..0] is equal to b"111" and IR_15 is equal to one, it means that the current instruction to be executed is an input-output instruction. b[11..0] comes from the 0th to the 11th bit of instruction register, it may contain an address value when it is a memory-reference instruction, or it can be an indication of which instruction it is when it is a register-reference or an input-output instruction. For more information about the instructions of the micro-processor, please refer to the instruction table in appendix.

2.2.7 REGOUTR (Output Register)

Input
Output
LD
a[].d is connected to
H
i[]
L
a[].q

Function Table of REGOUTR

IMG00008

The Simulated Result of REGOUTR from MAXPLUSII

REGOUTR is to to function as a 16-bit output register. It can be loaded with a 16-bit data from AC after the execution of the instruction OUT. The output of this register is directly connected to the I/O pins of the micro-processor, and can transfer data inside the micro-processor to the outside real world.


2.3 Flip-Flops

-- Written by Murphy Chen <B82503131>

There are 7 flip-flops in our micro-processor. The flip-flops are I, S, E, R, IEN, FGI, and FGO.

I is used to indicate whether the current instruction is using a direct memory addressing or an in-direct memory addressing. S is used to indicate whether or not to halt the computer. E is used to contain the MSB of AC when performing a circulate right on AC or to contain the LSB of AC when performing a circulate left on AC, and the status of E can be used to dertermine whether or not to skip the next instruction. R is used to enter the instruction when an interrupt occurs. IEN is used to disable or enable the interrupt. FGI is used to indicate whether the input device can continue to send data. FGO is used to indicate whether the output device can continue to receive data.

The text design files corresponding to these flip-flops are FF.TDF and JK.TDF, listed in appendix. And the detailed description and the simulated results of them are shown as follows.




2.3.1 FF & JK

Input
Output
CLR
SET
a.d is connected to
H
X
0
L
H
1
L
L
a.q

Function Table of FF & JK

IMG00009

The simulated result of FF & JK

In fact, FF and JK have the same functions. But somehow, we create both of them. They are to function as I, R, IEN, FGI, FGO, S. Flip-flop E has an intimate relation with AC and ALU, so, they are put together in one text design file.

2.4 Bus & Decoders

2.4.1 Bus

-- Written by Murphy Chen <B82503131>

The bus is used to connect the memory and registers. It is implemented as a 3*8 multiplexor, and is 16-bit wide. The control unit can choose one of the six registers AR, PC, DR, AC, IR, TR and the memory unit for outputing to the bus. And the bus is connected back to the input of all of them.

For example, if we want to load the value of DR into memory, the control unit will send a signal s[] to bus to choose DR for output. And the control unit will also send a signal LOAD (in fact, it will send both Output Disable and Write Enable) to memory. After the next positive-edge of the clock, the memory will write the data come from DR into itself. Other registers will have no change, because the control unit don't not send LOAD signal to them.

The text design file corresponding to the bus is BUS.TDF, listed in appendix. And the detailed description and the simulated result of is are shown as follows.

Input
Output
S[2..0]
o[15..0]
1
AR[11..0]
2
PC[11..0]
3
DR[15..0]
4
AC[15..0]
5
IR[15..0]
6
TR[15..0]
7
MEM[15..0]

Function Table of Bus

IMG00010

The Simulated Result of Bus

Note that some input are 12 bits wide, and the output is 16 bits wide. The bus will pad zeros in the most four significant bits, from the 12th bit to the 15th bit of these inputs for the but output.




2.4.2 Decoders

-- Written by Murphy Chen <B82503131>

Decoders are used to decode operation codes of instructions, and to decode the timing signals generated from the sequential counter.

The text design file corresponding to the decoder is DECODER.TDF, listed in appendix. And the function table and the simulated result of is are shown as follows.

Input
Output
code[2..0]
out[7..0]
0
B"00000001"
1
B"00000010"
2
B"00000100"
3
B"00001000"
4
B"00010000"
5
B"00100000"
6
B"01000000"
7
B"10000000"

Function Table of Decoder

IMG00011

The Simulated Result of Decoder

2.4.3 Bidirectional I/O

-- Written by Lan Chang<B82503081>

We want to use RAM. So we must read/write data from/to RAM. And the data line of RAM is the same while writing and reading. So the necessity of using bidirectional I/O raised.

We successfully used tri-state buffer by tdf to implement our desire. Luckily we used TDF not GDF so that problem said by TA didn't happen at all.

2.5 Control Unit

-- Written by Lan Chang<B82503081>

Control Unit is an important part of the micro-processor. Its work is to receive the input from the input devices, the output of ALU, contents of registers, and the content of sequence counter, process them, and send the processed output to every registers, output devices ,ALU and bus. Without it, the micro-processor can never function correctly.

Control Unit totally uses combinatorial logic to process the signal from input. Through it's output, it controls bus, registers, and output devices directly. It also tells the ALU what to do.

Control Unit controls registers through mainly three kind of signals:

  1. Load(LD): it tells the register load the content on the bus line.
  2. Increase(INR): it tells the register increase its content by one.
  3. Clear(CLR): it tells the register clear its content to zero.

Control unit must match our requirement of the S language instructions. So it must handle the process of every instruction in every clock cycle precisely. Now, for example, we want to execute the following instruction:

ADD NUMBER

The CPU will execute the instruction for several clock cycles. To know which clock cycle of the executing sequence is, the control unit read it from sequence counter(SC). So control unit(CU) first clear SC, when SC is 0, we call it is T0. When it's T0, by the functions in the function table(See appendix), we must execute following microoperation:

Fetch R'T0: AR<- PC

Notice that R' represents the complement of R register. The line means that when R register is zero and T0 is one(T0 is the LSB of decoded SC's contents), the contents of PC(program counter) will be send to AR(address register).If we want to complete this function, we must let the content of PC be on the bus line and actuate the load line of AR(AR_LD), then the content of PC would be send to AR. So we added following line in the control.tdf(its complete content is in the appendix):

AR_LD = R'T[0] # …….

x2_PC= R'T[0] # …….

In the first line, the "…}.." means other situations in which AR would load in the contents on bus line. So if any situation in this line is matched, the AR_LD would be high so that AR would load the content on bus line.

In the second line, x2_PC is a variable in TDF file, and it means the situation in which PC would be put on bus line. Through an encoder, x1_AR, x2_PC, x3_DR, x4_AC, x5_IR, x6_TR, x7_Mem would be encoded into a 3-bit signal which called S0, S1, S2.The signal S[2..0] will control the mutiplexer, then the content we required would be put on bus line.

Then the following microoperation is executed:

R'T1: IR<- M[AR], PC<- PC+1

By the same reason, we would have following three lines in control.tdf:

IR_LD = R'T1 # …}.

x7_MEM = R'T1 # …}..

PC_INR = R'T1 # …}..

We could discover that the two cycle R'T0 and R'T1 completes the action in which a instruction is fetched into IR and PC is added by one. It's the basic fetching cycle with which every instruction begins.

After this cycle, IR[14..12] would be automatically decoded into D[7..0] because it is lined to a combinatorial decoder. And the following microoperations would be executed( Notice that SC would automatically increase by one when the S register is high, because S register is lined with SC_INR, so when the program is executing, the control unit would let S register high.)

R'T2 AR<- IR(0-11), I<- IR(15)

Of course, we would have following two lines in control.tdf:

AR_LD = R'T0 # R'T2 # …}.

x5_IR = R'T2 # ……

And because the SET pin of I register is connected to IR(15), so we need not to add any line in control.tdf.

In T3, we only have a microoperation:

D7'IT3: AR<-M[AR]

because it's not a indirect instruction, I =0, so this microoperation would not be executed. For we should judge whether the microoperation be executed, in control.tdf we write:

AR_LD = R'T0# R'T2 # D7'IT3 # …}..

x7_Mem = D7'IT3 # …}..

The next is preparing the ADD action:

D1T4: DR<- M[AR]

of course, we have:

DR_LD = D1T4 # …}..

x7_Mem = D7'IT3 # D1T4 #…}.

(notice that because the output of AR is connected to the address pins of memory directly, so CU need not do anything to send it out)

The next is the main function of ADD instruction:

D1T5 AC<- AC+DR, E<-Cout, SC<- 0




So in CU we would have:

AC_LD = D1T5

ADD = D1T5

SC_CLR = D1T5

(notice that ADD is a output that controls ALU, and the operation of E register is handled by ALU)

So far we see that a ADD operation is completed by CU, ALU, bus, and registers. And by the same method, we could implement the operations on CU. So at last the control.tdf is completed by the method described. (Please see details of control.tdf in the appendix behind the report)

The simulation of the control unit is listed in appendix ( up_ledt.scf ), and described as follows.

We implemented a S program in MEM.TDF, the function of the MEM.TDF is increasing AC by one every time and send AC out to OUTR. We can see the function of the program clearly. When reset is pressed, a low pulse made the computer start to work and execute the program. Because it's not related to INPR, so INPR maintained zero at all. And most importantly, we see the SC (S[2..0]) worked properly. Only when SC works properly, other thing would be possible. We can also see that IR changes every time when a new instruction is read in. Then we can know that PC and IR worked together properly.

We can see that when AC.INR is high, AC is added by one instantly. It demostrated that AC_INR function properly. And we also see that when indirect address is used, I.D(The output of I register) is high, and it made the test of I success. Finally, the OUTR increased by one following AC. We get correct result, and we are sure that there is no bug in the function we used. We repeated it for three times, captured 7 bugs.



2.6 Arithmetic and Logic Circuit

-- Written by Ing-Jye Huang <B82503007>

This unit is used to implement some arithmetic and logic functions which a basic computer should contains. First , it's inputs include AC[15..0] which come from the output of this unit , INP[15..0] which come from the input of the whole CPU,DR[15..0] , and control signals from the control unit. The outputs of this unit are AC[15..0] because this unit do the implementation of instructions about AC.

The control signals from the control unit is as follows:

Symbol
Action
Description
ADD
AC←AC ^ DR AND AC with DR
AND
AC←AC + DR Add AC with DR
DR
AC←DR Transfer from DR
INPR
AC←INPR Transfer from INPR
COM
AC←

IMG00012

Complement
SHR
AC←shr AC , AC(15)←E Shift right
SHL
AC←shl AC , AC(0)←E Shift left
CLR
AC←0 CLEAR
INC
AC←AC+1 Increment

The logic cirucit of ALU is as follows.

IMG00013

The Logic Circuit of ALU

These instructions can be categorized to two parts. AND , ADD , DR , INPR , COM , SHR , and SHL belong to the LD part which the control unit will send a signal to indicate. CLR and INC are the general instructions every kind of registers will have.

To implement AND , we use an AND gate to and AC[] with the corresponding DR[] and the AND control signal.

The ADD instruction is achieved by using a full adder. We and the SUM of a full adder with the ADD control signal , and send the Carry the next full adder.

The DR operation is obtained by anding DR[] with the DR control signal .

The INPR instruction is similar with DR except anding INPR[] with the INPR control signal.

The COM instruction is achieved by anding the complement of AC[] with the COM control signal.

The SHR operation is obtained by anding AC[i+1] with the SHR control signal . On the other hand , the SHL operation ands AC[i-1] with the SHL control signal.

The INC operation is in the same way as ADD except that we add AC with 1. Because only one of the control signals will be HIGH at one time, ORing the AND gates mentioned above will get the desired data that should be transferred into AC. The logic circuit is like the figure shown above. Owing to the repeatability of the unit, we use TDF to implement this unit.

Another part of this unit deals with the instructions about the E register. In a Add instruction , the Carry of AC should be transferred to E. The CME operation complements E. In the SHR instruction , the data stored in E is sent to AC[15] , and AC[0] is sent to E. Similarly, in the SHL instruction , the data stored in E is sent to AC[0] , and AC[15] is sent to E. These instructions can be implemented in similar way as mentioned above.

IMG00014

The Simulated Result of ALU from MAXPLUSII

The simulation result of this unit is shown above. We let the control signal be HIGH one at a time orderly. Because the AND and OR gates needed to implement these instructions bring some delay, if the CLK to the registers and the control signals go HIGH at the same time , the desired result will be obtained in the next clock. This won't destroy the correct operation because all instructions are delayed a clock cycle.

2.7 Assembler

-- Written by Murphy Chen <B82503131>

The assembler is used to translate assembly programs into machine instructions. It is run on a PC, and the output file of the assembler can be used to program EPROMs or download into SRAMs, which connected with the micro-processor to execute application programs.

The steps for developing application programs are described as follows. First, users write their application program in assembly language. And then, they save their file with an extension name '.s'. And then, they execute the assembler with a filename as a parameter, the assembler will output some useful information to the screen and output the translated binary code to a file with an extension name '.lst'. And then, users can use these binary codes to program EPROMs or download into SRAMs. Finally, users can turn on the power of the micro-processor to run the resulted program.

The source code of the assember I created is listed in appendix, its filename is AS.CC. The operation of the assember is described as follows.

There are several tables for the assembler to identify the keywords and the user-defined symbols. PseudoTable[] contains the pseudo instructions of the assembler. There are four pseudo instructions, including ORG, END, DEC, HEX. ORG is used to define the new location of the next instruction. END is used to indicate the end of the program. DEC is used to indicate that this line is not an instruction, rather, it is a decimal datum. Similarly, HEX is used to indicate that this line contains a heximal datum. MRITable[] contains the 7 memory-reference instructions and the corresponding machine codes. nMRITable[] contains the 18 non-memory-reference instructions and the corresponding machine codes. UATSymbol[] is used to store the user-defined symbols. UATaddr[] is used to store the adderss of a user-defined symbol. UATlen[] is used to store the length of a user-defined symbol.

Other variables include Code[], LSCode[], MSCode[], and buffer. Code[] is used to store the resulting machine codes. LSCode[] is used to store the 8 least significant bits of the resulting machine codes. Similarly, MSCode[] is used to store the 8 most significant bits of the resulting machine codes. You may ask why using LSCode[] and MSCode[]? It is because most of the EPROMs and the SRAMs are 8-bit wide. But the instructions and data are both 16-bit wide. So, if we want to program them into EPROMs/SRAMs, we have to split the 16-bit wide machine codes into two 8-bit wide parts, and program them into EPROMs/SRAMs separately. buffer is used to store the content of the input assembly file.

First, the assembler read the content of the input assembly file into buffer. Then the assembler scans over the entire buffer twice. During the first pass, the assembler only care the two pseudo instructions ORG and END, and user-define symbols. Every time it encounters a user-define symbol, it will first check whether this symbol has been defined, if it has been defined, then the assembler will output an error message and exit. If it has not been defined, the assembler will add this new user-defined symbol into UATSymbol[], and store the address of that symbol into UATaddr[], and store the length of that symbo into UATlen[].

During the second pass, the assembler will translate every instruction it encounters into machine codes according to the two tables MRITable[], and nMRITable[]. When the assembler encounters a non memory reference instruction, it will directly generate the corresponding machine codes according to the table nMRITable[]. When the assembler encouters a memory reference instruction, it will check the address part of that instruction, which is a user-defined symbol. Then, the assembler will search through UATSymbol[] to check whether this symbol has been defined in the file or not. If this symbol has not been defined, the assembler will output an error message and exit. If this symbol has been defined, the assembler will generate the address value corresponding to that symbol. Then, the assembler will check whether the instruction is ended with an 'I', which indicates an indirect memory reference. After doing these checks, the assembler can generate the corresponding machine codes according to the table MRITale[] and UATaddr[].

After translating the entire assembly instructions into machine codes, and storing them into Code[], the assembler will output Code[] into a file with extension name '.bin' for downloading into SRAMs. Then the assembler will split Code[] into two parts: LSCode[] and MSCode[] as described before, and output LSCode[] and MSCode[] to a file with an extension name '.lst'. Users can copy the content of this file and paste them into EXPRO for programming EPROMs.


2.8 Download Kit

-- Written by Ing-Jye Huang <B82503007>

Our project is to make a CPU. Without applications , we can't demonstrate its ability. The applications can be written in assembly language , and then convert it into machine codes. The problem is how to transfer the programs to the SRAMs. So, me and Murphy Chen decided to build a download kit for transferring our programs. Our first idea is to download the programs via the printer port . Because we have 12-bit address iuputs , 16-bit data iuputs , and some control signals include WE , OE …} , and the printer port can't afford so many data simultaneously , we use the concept of shift registers .

We send one bit of data each clock. This means the data is transferred serially, not parallel. For instance , the address inputs have 12 bits. We send each bit one by one , and after 12 clocks , the complete 12-bit address is stored at the registers. Then the 16-bit data is transferred in the same way . After address and data are ready , we send the write enable signal to inform the SRAM that we want to write these data into the SRAM. To check if the data is correctly downloaded into SRAM , we give the SRAM an address , and read the data out one-bit by one-bit (still use shift registers). Then we can compare it with what we have sent.

The block diagram is as follows:

IMG00015

The Circuit of Download Kit


CLK1 : the clock of shifting data

CLK2 : the clock of shifting address

CLK3 : the clock of shifting din

LOAD : load DIN[0..15] to registers

OE1 : output enable of DATA[0..15]

OE : output enable of SRAM

WE : write enable of SRAM

IMG00016

The Simulated Result of Shift Register from MAXPLUSII

We connect DIN[0..15] and DATA[0..15] together. After downloading programs into the SRAMs , we want to read the data back to check if it's correct. At this time , the output of DATA[0..15] must be high impedence , or it will effect DIN[0..15] , and we won't get the desired data. So this is why we need OE1. This is the first problem we encounter.

After finding this problem , we almost downloaded the program into the SRAMs correctly. (Noted by Murphy Chen, "we had nearly sucessfully downloaded the data into SRAMs, but when we read back the SRAMs for check, a strange bug existed that the most significant bit of every data in every address is always one. I could not figure out why that happened.") And the next time we came back to the lab to try to fix the strange bug , even the previous result couldn't be obtained.

We use LA to observe the data downloaded to the ALTERA input , and it's correct . But the output of ALTERA is wrong , though the simulation result is right. We tried and tried , even routing the circuit again. But the previous right result still can't be obtained. For this , we waste almost two whole days working on this problem. Finally , to finish this experiment before deadline , we were forced to give up this idea and use another method, using EPROM to store the programs.

2.9 Applications

2.9.1 The Sine Wave Generator (SIN.S)

-- Written by Murphy Chen <B82503131>

In order to demonstrate the power of our micro-processor HCL2516, we need to write applications. The first idea I thought is a digital voltage meter and a digital function generator, two in one. And I asked Lan for doing that. I wrote a C program to generate three tables of sine wave, triangle wave, and square wave, respectively. It is listed in appendix, and is called FUN.CC. That program can be used to generate the tables the assember can understand. And is suitable for writing the application program of digital function generator.

But because there is something wrong with the treatment of the time sequence of our micro-processor ( though we've figured out why, this will be discussed in the next chapter ), we were forced to use only EPROM for demonstrating, without SRAMS. This means, we cannot do any write to memory. This is a serious limit!

So, I came to an idea of displaying a sine wave for demonstrating. It is simple. I write a C program called SIN.CC (listed in appendix) to generate assembly programs. There are two loops in the program. In the first loop, it generate the two instructions:

LDA SINi

OUT

where i is a variable, changing from 0 to 255. These two instructions are to load a datum of sine wave from memory at address SINi, and to output it to a DAC.

In the second loop, it generate the following instruction:

SINi, HEX (sin(i*2*3.14159/256)+1)*128

where SINi is a label for assembly to identify, and at that address stores one sample value of sine wave.

The circuit was wired by Ing-Jye. And the result of the application hardcopied from a digital oscilloscope is as follows.

IMG00017

Fig 2 The Sine Wave Generated By Our Micro-Processor

We can see that there is a strange peak value, this is because the quality of EPROM we used. At some address in the EPROM corresponding to some entry of the sine wave table, the data in EPROM is corrupted, resulting the dramatic effect.

The source code of the sine wave generator is not listed in this report, because it is pure repetitive instructions and data.

2.9.2 ZERO.S

-- Written by Lan Chang<B82503081>

This program is very simple, it only display a zero on 7 segment displayer. It's used to test the whole system first because we try to find the bug.

2.9.3 7_SEG_T.S

-- Written by Lan Chang<B82503081>

This program displays a 2 on 7 segment displayer. It's purpose is to test 7 segment displayer and RAM.

2.9.4 DICE.S

-- Written by Lan Chang<B82503081>

Fastly display numbers 0-9 sequentially and continuously on 7 segment displayer and when a button is pushed, the turning is stopped and the final number is displayed. The program is for ROM because we could not use RAM at that time, so we write this program to demo.

2.9.5 7_SEG_1.S

-- Written by Lan Chang<B82503081>

This program will count from 0 to 9999 and display the number on 7 segment displayer, between two numbers, there is a 10000 times loop to delay so that we can see the number clearly. Pitifully, the program can't function properly because the failure of using RAM.

2.9.6 0-9.S

-- Written by Lan Chang<B82503081>

This program would display sequentially 0-9 on 7 segment displayer. It's also pitifully unusable because of RAM.

2.9.7 V_METER.S

-- Written by Lan Chang<B82503081>

This program functions as a voltmeter. It reads data from the output of ADC. The input of ADC is the voltage we want to measure. Then it converts binary data to BCD data, and then converts BCD data into 7 segment data. At last display the data on the 7 segment displayer. It's originally the target we planned, but because of RAM……

3 Discussions

3.1 Murphy Chen <B82503131> says

3.1.1 The Bidirectional Bus in MAXPLUSII

We heard from classmates, they said that T.A. said there's something wrong with the bidirectional bus in the current version of MAXPLUSII we used. But in our design, we had used bidirectional bus for connection with memory. So me and Ing-Jye began to test the funcitionality of the bi-directional bus of Altera 8636 device.

To test the bi-directional bus, me and Ing-Jye first built the following circuit, using the tri-state buffer MAXPLUSII provided in Graphic Design File:

IMG00018

When we compiled that graphic design file in MAXPLUSII, there was always an error message whcich we could not understand. So we decided to make our own tri-state buffer and tried again. After we built our own tri-state buffer in text design file and compiled again, there wasn't any error messages. And we download the resulted ttf file and tested that circuit by providing inputs and observing outputs manually, we found that it works! So, we can finally make sure that the bi-directional bus in Altera device is okay. (Refer to BIDIR.GDF and TRI2.TDF in appendix)


3.1.2 EPROM programming algorithms in EXPRO

When we programmed EPROMs using EXPRO, we once encountered a problem that the EPROMs have passed the blank check, but it cannot be programmed sucessfully. I checked the settings in EXPRO, and found something suspicious, that is, the EPROM programming algorithm is set to 'fast'. After resetting the programming algorithm to 'normal', the EPROMs can finally be programmed sucessfully. So, I think that it is important to note that when we program EPROMs, beside setting the right brand and the right device, we should always check if the EPROM programming algorithm is set to the appropriate one.

3.1.3 Binary Format Files in EXPRO

There is something strange in the binary format of EXPRO. When I compiled assembly programs and generated the binary files and read them from EXPRO, if the original binary file contains 0x0a, the EXPRO will always change it to two bytes 0x0a 0x0d. So, binary files could not work. I figured out an alternative way: copy & paste. Make the assembler generate a text file consist of hexidecimal characters instead of a binary file, and then we can copy these characters and paste them in the buffer edit mode in EXPRO. This solved the problem!

3.1.4 The Download kit.

Me and Ing-Jye cooperated to build the download kit. Ing-Jye had described about what we had done. I will describe more about that.

The first idea to transfer data from the PC to SRAMs is to use a shift register. We once succeeded, but something went wrong. I talked about that with some other classmates. They told me to try to use the shift register provided by MAX+PLUSII in graphic design, not to build our own shift register. So, I tried that, hoping that would work, but in vain. Please refer to the appendix for SHIFT3.GDF I had tried.

Then, I came to another idea, not to send address data to shift register, but to build a counter, which can be cleared to zero corresponding to address 0, and can be incremented corresponding to the incrementing addresses. Everytime the couter is incremented, the data can be sent to SRAMs according to the value of the couter. But this still could not work. Please refer to the appendix for DOWNLOAD.GDF and ADDR.TDF I had tried.

I think there is something wrong when the PC side send a signal to act as a positive edge of a clock puse. But it still could not explained completely why our circuit could not work! We really hated this bug, it delayed our project.

3.1.5 The Assembler

The assembler is not built and left there. It grows. Because the user, Lan Change, had many requirements and encountered many bugs of the assembler, I improved the performance and the ability of the assembler continuously.

For example, the assembler was originally only able to handle three-character labels, but after the suggestions made by Lan Change, I improved it to be able to handle up to 256-character labels.

To note one thing, all the C/C++ programs I wrote were compiled by DJGPP. It is a protected mode dos version of GNU C/C++, and it's free.

3.1.6 Acknowledgement

It is very nice to cooperate with my parners. Ing-Jye did a lot of job in wiring the circuit and also had many experience about AHDL and pin assigment in MAX+PLUSII. Beside building arithmetic and logic circuits, he also worked with me to design the download kit and the application of the sine wave generator.

Chang Lan did a lot of job in designing the control unit of our micro-processor. To simulate his circuits is really a big deal, but he really made it. He also wrote a lot of assembly applications and caught many bugs in my assembler.

Thanks to my partners for agreement with my crazy idea of building a computer! Without you guys, I could not fulfill the dream.

Finally, thanks to T.A., without you, we won't have the chance to use the most advanced instruments and to touch the most advanced technology in the filed of digital circuits.

3.2 Ing-Jye Huang <B82503007> says

To finish this experiment , we really encountered too much frustration., because we started this project quite early , and worked on it regularly each week. The main part of the project - CPU is finished very soon , but the problem of downloading programs via printer port to SRAM can't be solved and cost us lots of time. One reason that we don't give up this part is that we have spend some time on writing the downloading program (this is all due to Chen 's hard working) , and another reason is that downloading applications from PC to SRAMs is much more convenient than writing them to the EPROMs. But unfortunately , at last we still have to give up , and use EPROMs to store our applications. By the way , we learned a quite valuable lesson from using EPROMs. That is : don't hope to save your money and buy cheaper ones. Cheaper ROMs is really unreliable than those with high quality though more expensive.

It 's really a nice experience to cooperate with my two partners . I learn ed very much from them. And I have to admit that my contribution to this experiment is less than my partners. But they didn't have any complaint. This is also why I feel very nice to work with them. So at last , I want to say it again "Thank you , my partners.".

3.3 Lan Chang <B82503081> says

It's like a novel, I think. Because we really made a computer! Like a dream but we indeed spent our time and energy on it. Yes, we just have done it.

The control unit is designed properly, I think. At first we decided to refer to Ch. 5 and Ch. 6 of《Computer System Architecture》, then I designed the control unit according to the operation function table. It's fortune that I can just use the function table to design a totally combinatorial TDF file, because it's easy to design and read.

But troubles began when I want to simulate the control unit. I had to draw every input in SCF by myself. And I must inspect it carefully. Finally, the capturing of three bugs made it perfect. (Maybe not perfect but just we didn't touch the wrong part).

After CPU and bus combined with ALU and registers, we faced a more big trouble. We should simulate the whole system!! Wow, it's not a simple work, for there are so many inputs and outputs, and the ROM or RAM were not yet prepared. We must do a "false" memory to simulate it in MAX-PLUS II software. Finally we did it. I wrote the hex code of machine language directly in MEM.tdf, and then implemented it in the system. Then there was a long bug-capturing journey. And I tried three MEMs(MEM, MEM1, MEM2, which represent different programs) and got about 7 bugs done. The system seemed okay, though maybe there were still hidden bugs.

On the same period of time, Murphy Chen completed the assembler and began to try downloading from PC through print port to RAM directly. And Mr. Huang completed his ALU and wired the circuit we need.

The time was about the bottom of December. And I began to write application. In DC lab, everyone was using MAX-PLUS II in computer except me. I was using notepad to write our S program!! It's an interesting work because I felt that I am different(ha…}). And when the MAX-PLUS II has any problem, only I can use this computer because NOTEPAD never fails!! (HA HA).

Were it not the problem of RAM, we would gain more achievement feeling from the work. It's a pity, but we have more achievement, learing, and happiness than pity and tire. Thank you TA and thank every classmates that helped us.





4 References


1. M. MORRIS MANO, Computer System Architecture, Chapter 5 Basic Computer Organization and Design, Prentice-Hall.

2. M. MORRIS MANO, Computer System Architecture, Chapter 6 Programming the Basic Computer, Prentice-Hall.

3. JOSEPH D. GREENFIELD, Practical Digital Design Using Ics, Chapter 13 Memories, Prentice-Hall.

4. JOSEPH D. GREENFIELD, Practical Digital Design Using Ics, Chapter 18 Analog to Digital Conversion, Prentice-Hall.

5. MAX+PLUS II On-Line Help.

6. DATA SHEET of IDT6116SA, Integrated Device Technology, Inc.

7. DATA SHEET of M27C256B, SGS-THOMSON Microelectronics.

8. Information about printer ports, http://www.centronics.com.




5 Appendix


See the following pages.

Fetch
R'T0 ARPC
R'T1 IRM[AR], PCPC+1
Decode
R'T2 D0,…}., D7 Decode IR(12-14),
ARIR(0-11), IIR(15)
Indirect
D'7IT3: ARM[AR]
Interrupt
T'0T'1T'2(IEN)(FGI+FGO) R1
RT0: M[AR]TR, PC0
RT1: M[AR]TR, PC0
RT2: PCPC+1, R0, SC0
Memory-reference instructions
AND
D0T4: DRM[AR]
D0T5: ACAC^DR, SC0
ADD
D1T4: DRM[AR]
D1T5: ACAC+DR, ECout, SC0
LDA
D2T4: DRM[AR]
D1T5: ACAC+DR, ECout, SC0
STA
D3T4: M[AR]AC, SC0
BUN
D4T4: PCAR, SC0
BSA
D5T4: M[AR]PC, ARAR+1
D5T5: PCAR, SC0
ISZ
D6T4: DRM[AR]
D6T5: DRDR+1
D6T6: M[AR]DR, if (DR=0) then (PCPC+1), SC0
Register-reference instructions
D7I'T3=r(common to all register-reference instructions)
IR(I)=Bi(I=0,1,2, …}, 11)
r: SC0
CLA
rB11: AC0
CLE
rB10: E0
CMA
rB9: AC!AC
CME
rB8: E!E
CIR
rB7: ACshr AC, AC(15)E, EAC(0)
CIL
rB6: ACshl AC, AC(0)E, EAC(15)
INC
rB5: ACAC+1
SPA
rB4: If (AC(15)=0) then (PCPC+1)
SNA
rB3: If (AC(15)=1) then (PCPC+1)
SZA
rB2: If (AC=0) then (PCPC+1)
SZE
rB1: If (E=0) then (PCPC+1)
HLT
rB0: S0
Input-output instructions
D7IT3=p(common to all input-output instructions)
IR(i)=Bi(i=6,7,8,9,10,11)
p: SC0
INP
pB11: AC(0-7)INPR, FGI0
OUT
pB10: OUTRAC(0-7), FGO0
SKI
pB9: If(FGI=1) then (PCPC+1)
SKO
pB8: If(FGO=1) then (PCPC+1)

Control Functions and Microoperations for our Micro-processor HCL-2516