ARASP: An ASIP Processor for Automated Reversible Logic Synthesis
Subject Areas : IT Strategy
Zeinab
Kalantari
^{
1
}
(Department of Computer Engineering, Rafsanjan Branch, Islamic Azad University, Rafsanjan, Iran)
Marzieh
Gerami
^{
2
}
(Department of Computer Engineering, Shahrekord Branch, Islamic Azad University, Shahrekord, Iran)
Mohammad
eshghi
^{
3
}
(Faculty of Electrical and Computer Engineering, Shahid Beheshti University, Tehran, Iran)
Keywords: Reversible logic, Optimization Algorithms, Application Specific Instruction Set Processors, ASIP, RTL.,
Abstract :
Reversible logic has been emerged as a promising computing paradigm to design low power circuits in recent years. The synthesis of reversible circuits is very different from that of nonreversible circuits. Many researchers are studying methods for synthesizing reversible combinational logic. Some automated reversible logic synthesis methods use optimization algorithms Optimization algorithms are used in some automated reversible logic synthesis techniques. In these methods, the process of finding a circuit for a given function is a very timeconsuming task, so it’s better to design a processor which speeds up the process of synthesis. Application specific instruction set processors (ASIP) can benefit the advantages of both custom ASIC chips and general DSP chips. In this paper, a new architecture for automatic reversible logic synthesis based on an Application Specific Instruction set Processors is presented. The essential purpose of the design was to provide the programmability with the specific necessary instructions for automated synthesis reversible. Our proposed processor that we referred to as ARASP is a 16bit processor with a total of 47 instructions, which some specific instruction has been set for automated synthesis reversible circuits. ARASP is specialized for automated synthesis of reversible circuits using Genetic optimization algorithms. All major components of the design are comprehensively discussed within the processor core. The set of instructions is provided in the Register Transform Language completely. Afterward, the VHDL code is used to test the proposed architecture.
[1] K. Kucukcakar, "An ASIP design methodology for embedded systems," in Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99)(IEEE Cat. No. 99TH8450), 1999: IEEE, pp. 1721.
[2] M. Gries and K. Keutzer, Building ASIPs: The Mescal Methodology. Springer Science & Business Media, 2006.
[3] R. F. Mirzaee and M. Eshghi, "Design of an ASIP IDEA crypto processor," in 2011 IEEE 2nd International Conference on Networked Embedded Systems for Enterprise Applications, 2011: IEEE, pp. 17.
[4] K. Shahbazi, M. Eshghi, and R. F. Mirzaee, "Design and implementation of an ASIPbased cryptography processor for AES, IDEA, and MD5," Engineering science and technology, an international journal, vol. 20, no. 4, pp. 13081317, 2017.
[5] M. Venkanna, R. Rao, and P. C. Sekhar, "An Efficient Design of ASIP Using Pipelining Architecture," in International Conference on Intelligent Computing and Applications, 2019: Springer, pp. 117128.
[6] D. Große, X. Chen, G. W. Dueck, and R. Drechsler, "Exact SATbased Toffoli network synthesis," in Proceedings of the 17th ACM Great Lakes symposium on VLSI, 2007, pp. 96101.
[7] D. Bu and P. Wang, "An improved KFDD based reversible circuit synthesis method," Integration, vol. 69, pp. 251265, 2019.
[8] T. Ahmed, A. Younes, and A. Elsayed, "Improving the quantum cost of reversible Boolean functions using reorder algorithm," Quantum Information Processing, vol. 17, no. 5, pp. 116, 2018.
[9] A. Basak, A. Sadhu, K. Das, and K. K. Sharma, "Cost Optimization Technique for Quantum Circuits," International Journal of Theoretical Physics, vol. 58, no. 9, pp. 31583179, 2019.
[10] Z. Kalantari, M. Eshghi, M. Mohammadi, and S. Jassbi, "Lowcost and compact design method for reversible sequential circuits," The Journal of Supercomputing, vol. 75, no. 11, pp. 74977519, 2019.
[11] M. Lukac, M. Perkowski, and M. Pivtoraiko, “Evolutionary approach to quantum and reversible circuits synthesis,” Artificial Intelligence Review Journal, vol. 20, no. 3–4, pp. 361–417, 2003.
[12] M. Lukac, M. Pivtoraiko, A. Mishchenko, and M. Perkowski, "Automated synthesis of generalized reversible cascades using genetic algorithms," 2002.
[13] M. Haghparast, M. Mohammadi, K. Navi, and M. Eshghi, "Optimized reversible multiplier circuit," Journal of Circuits, Systems, and Computers, vol. 18, no. 02, pp. 311323, 2009.
[14] M. Mohammadi and M. Eshghi, "Heuristic methods to use don’t care in automated design of reversible and quantum logic circuits," Quantum Information Processing, vol. 7, no. 4, pp.175192, 2008.
[15] M. Y. Abubakar and L. T. Jung, "Synthesis of Reversible Logic Using Enhanced Genetic Programming Approach," in 2018 4th International Conference on Computer and Information Sciences (ICCOINS), 2018: IEEE, pp. 15.
[16] T. Atkinson, A. Karsa, J. Drake, and J. Swan, "Quantum program synthesis: Swarm algorithms and benchmarks," in European Conference on Genetic Programming, 2019: Springer, pp. 1934.
[17] R. Landauer, "Irreversibility and heat generation in the computing process," IBM journal of research and development, vol. 5, no. 3, pp. 183191, 1961.
[18] C. H. Bennett, "Logical reversibility of computation," IBM Journal of Research and Development, vol. 17, no. 6, pp. 525532, 1973.
[19] R. P. Feynman, "Quantum mechanical computers," Foundations of Physics, pp. 507531, 1986.
[20] M. P. Frank, "Introduction to reversible computing: motivation, progress, and challenges," in Proceedings of the 2nd Conference on Computing Frontiers, 2005, pp. 385390.
[21] E. Fredkin and T. Toffoli, "Conservative logic," International Journal of theoretical physics, vol. 21, no. 3, pp. 219253, 1982.
http://jist.acecr.org ISSN 23221437 / EISSN:23452773 
Journal of Information Systems and Telecommunication

ARASP: An ASIP Processor for Automated Reversible Logic Synthesis 
Zeinab Kalantari 1*, Marzieh Gerami2 , Mohammad Eshghi3

1. Department of Computer Engineering, Rafsanjan Branch, Islamic Azad University, Rafsanjan, Iran 2. Department of Computer Engineering, ShahreKord Branch, Islamic Azad University, ShahreKord, Iran 3. Faculty of Electrical and Computer Engineering, Shahid Beheshti University, Tehran, Iran 
Received: 04 Jun 2021/ Revised: 04 Jan 2022/ Accepted: 02 Feb 2022 

Application specific instruction set processors (ASIP) can compromise the advantages of custom ASIC chips and general DSP chips. In other words, ASIP chips utilize high performance and low power of ASIC chips and flexibility of DSP chips [1][2][3][4][5].
There is a tradeoff between cost and speed in ASIPs.
Programmability is the main advantage of ASIPs, which gives more flexibility to software developers. Other advantages are more convenient in the design and debugging process, predictability, and shorter time to market. Hardware and software are two aspects of ASIPs rather than one aspect of the task being dominant. Besides, compared to generalpurpose processors, ASIP benefits from having specific instructions to perform a specific task faster and reduce programmer errors. Accordingly, efficiency and programmability are both advantages of ASIPs compared to generalpurpose processors.
In this paper, a novel ASIPbased processor for the synthesis of reversible circuits is proposed. This processor is used to synthesize reversible circuits using optimization algorithms. VHDL code is used to simulate and test the proposed architecture. The main objective of the proposed design is programmability as it is the major concept of ASIP. In addition, the suggested structure reduces hardware complexity.
The organization of the rest of the paper is as follows. The next section and subsequent sections present a background on the synthesis of reversible circuits. Section 3 details the proposed ASIP architecture model for the synthesis of reversible circuits. The testing process describes in section 4. Finally, section 5 concludes the paper.
2 Background
Reversible logic has applications in low power computing, quantum computing, nanotechnology, optical computing, and DNA computing. The design of the reversible circuits is quietly different from the design of conventional irreversible logic circuits [6] because of the different gates that are available in reversible logic.
The synthesis of reversible circuits differs significantly from synthesis using traditional irreversible gates. Many algorithms have been proposed for the synthesis of reversible circuits [7][8][9][10]. Dengli et al. proposed an improved KFDD based reversible circuit synthesis method [7]. Ahmed et al. suggested a synthesis approach using reorder algorithm [8]. Basak et al. presented an algorithm using the ESOP expressions [9]. Some automated reversible logic synthesis methods, such as genetic algorithms (GAs) are also presented [11][12][13][14][15][16]. These algorithms use optimization algorithms.
In optimization algorithms, the main process is generating some random circuits and then computing the output truth table of generated circuits. Then the hamming distance between the truth table of generated random circuits and the truth table of the given function is calculated. After that, according to the optimization algorithm the best circuit is selected. These operations are repeated while desired hamming distance is reached.
To implement the automated reversible synthesis algorithm, a background on reversible gates is needed. The next section is illustrated to introduce reversible logic gates. After that, a general algorithm for the synthesis of all reversible circuits is presented.
21 Reversible Gates
Since a serious problem in modern VLSI designs is power consumption, Low power circuit design is one of the most attractive subjects for hardware designers. Landauer has shown that for irreversible logic computations, each bit of information lost, generates kTln2 joules of heat energy, where k is Boltzmann’s constant and T is the absolute temperature at which computation is performed [17]. Bennett showed that kTln2 energy dissipation would not occur if a computation is carried out reversibly [18]. This part of energy dissipation is independent of what the underlying technology is.
In reversible circuits, no bit of information is lost, and reversible computation in a system can be performed only when the system comprises reversible gates.
In a reversible gate, there is a onetoone correspondence between its inputs and outputs. As a result, the number of outputs of a reversible gate is the same as the number of inputs, and for each input vector, there is a unique output vector and vice versa.
Some more common gates to design reversible logic circuits are Feynman Gate, FG [19], Toffoli Gate, TG [20], Fredkin Gate, FRG [21] are more common gates to design reversible circuits.
A 2*2 Feynman Gate, also known as controlledNOT (CNOT), is depicted in Fig.1.a. It implements the logic functions: P = A and Q = A⊗B.
A 3*3 Toffoli Gate has 3 inputs: 2 control inputs, that are copied to the first 2 outputs and one other input that is complemented if all control inputs are 1s and are directly copied to the last output otherwise [20]. A 3 input, 3output Toffoli Gate is shown in Fig.1.b. The inputs ‘A’ and ‘B’ are passed as first and second outputs, respectively. The third output is controlled by ‘A’ and ‘B’ to invert ‘C’.
A 3*3 Fredkin Gate is depicted in Fig.1.c. Here the input ‘A’ is passed as the first output. Inputs ‘B’ and ‘C’ are swapped to get the second and third outputs, which are controlled by ‘A’. If A = 0, then the outputs are simply duplicating of the inputs; otherwise, if A = 1, then the two input lines (B and C) are swapped.

Fig. 1 (a) Feynman gate, (b) Toffoli gate, and (c) Fredkin gate
22 Synthesis of Reversible Circuits
Synthesizing a reversible circuit using a searching algorithm is a complex problem with a large amount of searching space. The number of combinations for placing one Toffoli r×r gate in an n×n circuit (0<r<n1) to synthesize a reversible n×n circuit is expressed by )1(:
ρ=n. =n.2n1  (1) 
If the number of required gates to design a circuit is m, then the number of possible circuits is ρm. If Fredkin and Press are added to the set of Toffoli r×r gates, then the number of possible circuits is (3ρ)m. So optimization algorithms, especially GA, are used to find the global minimum or maximum of a function, in an extensive searching space [11][12][13][14]. In the next subsection, a review of automated synthesis is presented.
23 Automated Reversible Logic Synthesis
The general algorithm for automated reversible logic synthesis is shown in Fig.2. This processor is used to synthesize reversible circuits.
The algorithm starts by generating a random configuration (a random state) of a circuit with one gate. We consider the hamming distance between the truth table of this circuit and the truth table of a given function as the cost function. Then for each new configuration, it’s necessary to compute the truth table. So the main operations in such algorithms are computing the truth table of each circuit and then comparing its truth table with the destination truth table to select the best circuit, based on the desired algorithm. So in a given iteration, the algorithm generates n new circuits at a time. Each new circuit is derived from the old configuration. The hamming distances of the two circuits are then compared.
After the process of finding a new circuit, comparing it to the current configuration, and either accepting or rejecting, it is done n times.
After all, if the stopping criteria of the algorithm, zero hamming distance, is not reached, the number of gates will increase and the algorithm is repeated for a new circuit with extra gates.
Set NoG=1 //the number of gates being used for the synthesis Set S=S0 // random initial state Loop1: Initiate a random circuit S using NoG While (up to maxiteration) { While(required number of circuits not generated) { Generate new circuit S’ by perturbing S; For (all rows of truthtable) E=E+HD(des[i]&mask , Syn[i]&mask); ΔE=E(S’)E(S) If (ΔE<=0) S=S’; } If (HD!=0) { NoG=NoG+1; Goto Loop1; } else Print circuit; } } 
Fig. 2 Algorithm description
3 Architecture Overview
Fig. 3. ARASP processor
This paper introduces an ASIP processor which is useful in the application domain of reversible circuits. This processor is called ARASP, an ASIP processor for automated reversible logic synthesis. The schematic of the ARASP processor is illustrated in Figure 2. The explanation of the proposed register configuration is in Figure 3.
Fig. 4 ARASP register configuration
We proposed a 16bit wide ASIP architecture with 16bit integer operations for common arithmetic operations and a specific function unit for the automated synthesis of a reversible circuit.
31 Global View on ARASP
Registers have clock inputs that are all connected to the main system clock. Each AC, DR, TR1, and TR2 are 16bit registers that provide operands of ALU. The output of ALU is connected only to the AC. The instruction register, IR, provides the instruction bits for the controller. The 12bit program counter register is called PC, this register provides an address for current instruction through the memory address register, AR. This register also is a 12bit binary up counter. The arithmetic logic unit, ALU, is a combinational logic unit with two 16bit inputs, four flag inputs, and control inputs that specify the integer operations. The output of this unit is connected to the input of AC.
The next part is a function unit for the automated synthesis of a reversible circuit. This unit works as follows. At first, a random circuit with some gates that the NOG (Number OF Gates) register specifies is generated. The SyC (Synthesis Circuit) register bank maintains this random gate. The SyCP (Synthesis Circuit Pointer) register which is an 8bit upcounter is used to specify each register of this register bank (each reversible gate) in each step. Same as SyC, NeC register bank is used to hold the neighboring circuit of synthesis circuit in each step, the size of this register is as SyC, also NeCP (Neighboring Circuit Pointer) is used to define a register of NeC register bank in each step of the synthesis.
DTT (Destination Truth Table) consists of 256 8bit registers to maintain the truth table of a given function. As the algorithm generates a reversible circuit in each step, we need a register bank to maintain the truth table of the generated circuit. So STT (Synthesis Truth Table) is used for this aim.
The size of this register bank is the same as DTT. TTP is an 8bit upcounter that refers to each row of DTT or STT in each step of the algorithm.
As said before the major timeconsuming operation in the synthesis of a reversible circuit is calculating the output of the circuit for all combinations of inputs. So the synthesis function unit that is a combinational circuit calculates each row of synthesis truth table of desired circuit, SyC or NeC.
After synthesis of the desired circuit, we need to compare this truth table with the destination truth table, DTT. So we have to calculate the hamming distance between synthesis truth table STT and destination truth table DTT, as the cost function. After that, the calculated hamming distance will set the SHD or NHD depending on the circuit that is synthesized.
The final step in the algorithm is selecting the best circuit. So we need to compare SHD and NHD registers and set HD register with one of these registers.
4 Proposed Architecture
In this paper, the proposed CPU is referred to as ARASP. The proposed processor employs a reduced hardware requirement and application specific instruction set. Due to the size of its data register and buses, ARASP is considered to be a 16bit processor. It has direct and indirect addressing modes. ARASP also has specific instructions and inputoutput interrupts.
41 Main Memory Organization
The ARASP is capable of addressing 4096 bytes of memory through its 12bit address lines. This memory is addressed by a register called AR.
42 Register Configurations
The main data register of ARASP is AC, which is used in conjunction with most general instructions. This processor has overflow, carry, zero, and sign flags (o, c, z, and s). These flags may be modified by arithmetic operations.
ARASP consists of two parts, global unit, and specific unit. The major components of the global unit are AR, PC, IR, DR, TR1, TR2, AC, LFSR, and ALU. Also, the components of the specific part are SyC and NeC register bank that consists of 256 9 bit registers. These register banks hold synthesis and neighboring circuits each consisting of at most 8 gates. DTT and STT register banks hold destination and synthesis truth tables respectively. According to the size of these register banks and hardware restrictions. The desired circuit can have at most 8 inputs (256 8bit registers). HD, SHD, and HD registers that are 8bit registers are used for holding hamming distance of the circuit throughout the running synthesis algorithm.
43 Instruction Types
The ARASP has a total of 47 instructions totally, and the specific instructions are summarized in Table 3. The Proposed processor has two different types of instruction sets (Table 1). The Memory reference instructions need the main memory address to do their operations and the Nonmemory reference instruction set, which needs no memory for their operands. The ARASP’S memory instruction set can be used by direct and indirect addressing modes.
Table 1. Instruction Types and Addressing Modes of ARASP
Instruction Type  M  I  Address  Addressing Mode 
Memory  1  0  No  Direct 
Memory  1  1  No  Indirect 
Others  0  ×  Yes   
As presented in Fig. 5, in memory reference instructions most significant bit of instruction (bit 15) is set, to specify the memory reference instruction type. Bit 14 called I, specifies direct or indirect addressing mode (0 for direct and 1 for indirect). The next 4 bits (bits 1013) specify the operation of a memory reference instruction. As this type of instruction need a memory word for holding one of the operands, in these types of instructions we should refer to the main memory to read the operand. If it is set to 1 the operand’s address is indirect and if it is set to 0 the operand’s address is direct.
Fig. 5 Memory reference instruction format
These types of instructions occupy a byte whose most significant bit (bit 15) is 0. In this type of instruction, bit 14 specifies output and register instructions or specific instructions (0 for output and register instructions and 1 for specific instructions). The other 4 bits specify operations of instructions (Fig. 6).
Fig. 6. Non Memory reference instruction format
The fetch, decode and calculation of effective address phases of the instruction cycle could be as follow:
Interrupt :
IEN(FGI+FGO) : R ß 1
RT0 : AR ß 0
RT1 : M[AR] ß PC , PC ß 0
RT2 : PC ß PC+1, IEN ß 0 , R ß 0 , SC ß 0
Fetch :
T0 : AR ß PC , PC ß PC+1
T1 : IR ß M[AR]
Decode :
T2 : D31 … D0 ß IR[1014] , AR ß IR[08] , F ß IR(9) , M ß IR(15)
Address Fetch :
MT3 : AR ß PC, PC ß PC+1
MT4 : AR ß M[AR]
M T5 : nothing
M I T5 : AR ß M[AR]
44 Arithmetic Logic and Shift Unit
Fig. 7 Arithmetic Logic and Shift Unit
The presented processor supports basic arithmetic, logic, and shift units which are presented in
Table 2. ALU Operators
S4  S3  S2  S1  S0  Unit  Operation  Function 
0  0  0  0  0  AU  ADD  A+B 
0  0  0  0  1  SUB  AB  
0  0  0  1  0  DEC  A1  
0  0  0  1  1  INC  A+1  
0  0  1  ×  ×  MUL  A×B  
0  1  0  0  0  LU  AND  AÙB 
0  1  0  0  1  OR  AÚB  
0  1  1  0  0  XOR  AÅB  
0  1  1  1  0  NOT  ØB  
0  1  1  1  1  PASS  A  
1  ×  ×  0  0  SU  SHL  SHL(B) 
1  ×  ×  0  1  SHR  SHR(B)  
1  ×  ×  1  0  ROL  ROL(B)  
1  ×  ×  1  1  ROR  ROR(B) 
The 5bit opcode (S0 to S4) hierarchically selects the proper operation. Besides the main results, five arithmetic flags (Carry, Overflow, Zero, and Sign) are set by the ALU. Each flag obtains the proper value by the Eq.s 2 to 5 considering that the input values are unsigned integer. AC, DR, TR1, and TR2 can be considered as both first and second operands.
CFlag = Cout when (Op.=ADDSUBDECINCSHLSHR)  (2) 
OFlag = ‘1’ when (Op.=ADDINCSHL & Cout=‘1’)  (Op.= SUBDEC & Cout=‘0’)  (Op.=MUL & 16bit MSB≠0)  (3) 
ZFlag = ‘1’ when (16bit LSB=0)  (4) 
SFlag = ‘1’ when (Op.=SUBDEC & Cout=‘0’)  (5) 
45 Instruction Set
The instruction set for the ARASP is depicted in Table3.
The objective of synthesizing a reversible circuit is to compute a circuit for a given function. So we need to have a destination truth table of a given function. LDTT instruction reads the truth table of the desired function from the main memory to the DTT register bank. This instruction is a specific instruction that needs to refer to the main memory for its operation. Because of the hardware restrictions, it assumes that each function could have at most eight inputs. So the process of reading the rows of the destination truth table from memory reads some rows with the number that NoTR defines. The value of this register is set by SNOTTR instruction according to the value of the NOG register.
The STRC instruction stores the generated circuit of the synthesis process of a given function in the main memory.
The RAND instruction generates a 16bit random number.
The CLRNOG instruction clears 8bits of the NOG register, while INCNOG instruction increments the value of this register.
The SNOI instruction initializes the NOI register by an immediate number.
The GRNDC instruction generates a random circuit with some gates that are determined by the value of the NOG register. This instruction gets the value of the NOG register and the number of gates and sets a random value to some registers of SyC register bank according to the specified value of NOG.
The GNBRCROS instruction performs the crossover operation on a circuit to generate a new circuit called neighboring circuit from synthesis circuit. The instruction selects two random gates from the synthesis circuit that are in the SyC register bank. After that, it exchanges the position of these two gates to generate a new circuit called a neighboring circuit that is placed on the NeC register bank.
The other instruction that performs mutation operation is GNBRMUT. This instruction also generates a random new circuit from the existing circuit by mutation operation. The instruction selects a random gate from the synthesis circuit that is in the SyC register bank. After that, it exchanges the position of control and main inputs to generate a new circuit called a neighboring circuit that is also placed on the NeC register bank.
To compute the truth table for each circuit it is necessary to set each row of the truth table with initial values 0 to 2n1. To achieve this goal, SIVDTT is used.
MASK instruction generates a mask pattern for output.
CALCSTT instruction computes the truth table of each generated circuit, SyC or NeC. In other words, this instruction, compute the value of the output for each combination of inputs in a circuit with some gate. By feeding the initial value of a given row of the truth table to the circuit as the first stage and synthesizing them the computation operation starts and then synthesis ALU calculates the output of this gate. The next gate gets the calculated output of the preceding gate and calculates the output. These operations continue while the output of the last gate is computed. This output will be replaced by the value of the given row. CALSTT repeats these operations for all combinations of inputs (all rows of truth table).
To calculate the hamming distance between syntheses or neighboring truth tables and destination truth tables, CALCHD instruction is used.
The SBC instruction selects the best circuits, a circuit with less hamming distance, between a circuit and a neighboring circuit.
Finally, SIZHD is used to determine the zero value of the HD register.
NOGTAC instruction is used to transfer the value of the NOG register to the AC.
Table 2. ALU Operators
S4  S3  S2  S1  S0  Unit  Operation  Function 
0  0  0  0  0  AU  ADD  A+B 
0  0  0  0  1  SUB  AB  
0  0  0  1  0  DEC  A1  
0  0  0  1  1  INC  A+1  
0  0  1  ×  ×  MUL  A×B  
0  1  0  0  0  LU  AND  AÙB 
0  1  0  0  1  OR  AÚB  
0  1  1  0  0  XOR  AÅB  
0  1  1  1  0  NOT  ØB  
0  1  1  1  1  PASS  A  
1  ×  ×  0  0  SU  SHL  SHL(B) 
1  ×  ×  0  1  SHR  SHR(B)  
1  ×  ×  1  0  ROL  ROL(B)  
1  ×  ×  1  1  ROR  ROR(B) 
5 Testing Process
In this paper, the proposed CPU is referred to as ARASP. The proposed
A structural VHDL code in Fig. 8 is used to test and verify the functionality of the given structure. Using the instruction set of the presented ARASP processor, the following code has to be programmed to generate a random circuit for the desired N×N function which is stored from the memory address Addr1.
Fig. 8 VHDL code
6 Conclusion
We showed that synthesizing a reversible circuit using a search algorithm is a complex task with a large number of search spaces. So optimization algorithms, especially Genetic Algorithm, GA, are used to find the global minimum or maximum of a function, in an extensive searching space. As in such algorithms, the process of calculating values of outputs is a timeconsuming operation. So, we need a processor to speed up the process of synthesis. As a result, application specific flexibility is mandatory to meet the performance requirements of synthesis reversible circuits.
In this paper, we presented a novel design of the family of ASIP processors in the application domain of reversible circuits. The Providing programmability together with required specific instructions has been the main purpose of the automated synthesis of reversible circuits. The proposed processor that we referred to as ARASP is a 16bit processor with a total of 47 instructions totally, which some specific instruction has set for automated synthesis reversible circuits. ARASP is specialized for automated synthesis of reversible circuits using optimization algorithms such as GA or simulated annealing.
The design steps of all the main components inside the processor core have been described in detail. Maximum specific instruction, GNBRMUT, needs 29 clock cycles for execution. Structural VHDL code has been used to test the proposed architecture. A pipeline technique could be used to enhance the speed and achieve a high throughput rate as future work.
As future work, the processer can be comprehensively implemented of this processor that will specialize in simulated annealing algorithm. It is suggested that the proposed work will provide a new focus in the reversible field making hardware more specific for such applications.
Appendix
Table 3. ARASP instruction set
Ii  Instruction  Name  Description  Ins. Reference  IR(9)  OpCode 

I0  INP  Input  AC(L)ßINPR  I/O 
 000000 
I1  OUT  Output  OUTRßAC(L)  I/O 
 000001 
I2  SKI  Skip if FGI  FGI: PCßPC+1  I/O 
 000010 
I3  SKO  Skip if FGO  FGO: PCßPC+1  I/O 
 000011 
I4  ION  IEN On  IENß1  I/O 
 000100 
I5  IOF  IEN Off  IENß0  I/O 
 000101 
I6  CLA  Clear Accumulator  ACß0  Register 
 000110 
I7  CLE  Clear E  Eß0  Register 
 000111 
I8  CMA  Complement Accumulator  ACß  Register 
 001000 
I9  CME  Complement E  Eß  Register 
 001001 
I10  INC  Increment Accumulator  ACßAC+1  Register 
 001010 
I11  ROL  Rotate Left Accumulator  ACßROL AC  Register 
 001011 
I12  ROR  Rotate Right Accumulator  ACßROR AC  Register 
 001100 
I13  SPA  Skip if Positive Accumulator  :: PCßPC+1  Register 
 001101 
I14  SZA  Skip if Zero Accumulator  Z: PCßPC+1  Register 
 001110 
I15  SZE  Skip if Zero E  : PCßPC+1  Register 
 001111 
I16  HLT  Halt  SCßDisable  Register 
 010000 
I17  AND  AND  ACß M[AR]ÙAC  Memory 
 100000 
I18  OR  OR  ACß M[AR]ÚAC  Memory 
 100001 
I19  XOR  XOR  AC ßM[AR]ÅAC  Memory 
 100010 
I20  ADD  Addition  AC(L)ßM[AR]+AC  Memory  Int/Real  100011 
I21  SUB  Subtraction  AC(L)ßM[AR]AC  Memory  Int/Real  100100 
I22  MUL  Multiplication  AC(L)ßM[AR]×AC  Memory  Int/Real  100101 
i23  DIV  Division  ACßAC/DR  Memory  Int/Real  100110 
I24  MOD 
 ACßAC % DR  Memory  Int  100111 
I25  POW  Power  ACßACDR  Memory  Int/Real  101000 
I26  EXP  ex  ACßeDR  Memory  Real  101001 
I27  LDA  Load Accumulator  AC(L)ßM[AR]  Memory 
 101010 
I28  STA  Store Accumulator  M[AR]ßAC  Memory 
 101011 
I29  JMP  Jump  PCßAR  Memory 
 101100 
I30  BSR  Branch and Save Returnaddress  M[AR]ßPC, PC 