

### NATIONAL UNIVERSITY OF SCIENCE AND TECHNOLOGY POLITEHNICA BUCHAREST



### Doctoral School of Electronics, Telecommunications and Information Technology

Decision No. 13 from 08-03-2024

### Ph.D. THESIS

### Andrei GAITA

### METODE DE MACHINE LEARNING PENTRU SUPORTUL VERIFICARII CIRCUITELOR INTEGRATE ANALOGICE

### MACHINE LEARNING METHODS FOR SUPPORTING VERIFICATION OF ANALOG INTEGRATED CIRCUITS

### THESIS COMMITTEE

| <b>Prof. Dr. Ing. Mihai CIUC</b><br>Universitatea Națională de Știință și Tehnologie         | President      |
|----------------------------------------------------------------------------------------------|----------------|
| POLITEHNICA Bucuresti                                                                        |                |
| <b>Prof. Dr. Ing. Corneliu BURILEANU</b><br>Universitatea Națională de Știință și Tehnologie | PhD Supervisor |
| POLITEHNICA Bucuresti                                                                        |                |
| <b>Prof. Dr. rer. nat. Georg PELZ</b><br>Infineon Technologies AG & Univ. Duisburg-Essen     | Referee        |
| <b>Prof. Dr. Ing. Marina TOPA</b><br>Universitatea Tehnică din Cluj-Napoca                   | Referee        |
| <b>Prof. Dr. Ing. Cosmin POPA</b><br>Universitatea Națională de Știință și Tehnologie        | Referee        |
| POLITEHNICA Bucuresti                                                                        |                |

### **BUCHAREST 2024**

### **Table of contents**

| Li | st of 1 | ables    |                                                                 | v    |
|----|---------|----------|-----------------------------------------------------------------|------|
| Li | st of f | ìgures   |                                                                 | vii  |
| 1  | Intr    | oductio  | n                                                               | 1    |
|    | 1.1     | Aiding   | g analog IC verification and wafer production testing           | 2    |
|    | 1.2     | Scope    | of the Research                                                 | 2    |
|    | 1.3     | Motiva   | ation                                                           | 3    |
|    | 1.4     | Thesis   | Structure                                                       | 4    |
| 2  | Rela    | ted Wo   | ork and Theoretical Fundamentals                                | 5    |
|    | 2.1     | Analog   | g IC verification methodology                                   | 5    |
|    | 2.2     | Optim    | ize verification using machine learning                         | 6    |
|    | 2.3     | Featur   | e Extraction for Time-Series                                    | 6    |
|    | 2.4     | Wavef    | orm Clustering                                                  | 7    |
|    |         | 2.4.1    | K-means                                                         | 7    |
|    |         | 2.4.2    | Hierarchical clustering                                         | 7    |
|    |         | 2.4.3    | DBSCAN clustering                                               | 8    |
|    | 2.5     | Featur   | e Space Dimensionality Reduction                                | 8    |
|    | 2.6     | Cluste   | ring Performance Evaluation                                     | 8    |
|    |         | 2.6.1    | External Evaluation                                             | 9    |
|    |         | 2.6.2    | Internal Evaluation                                             | 9    |
|    | 2.7     | IC fab   | rication verification                                           | 10   |
| 3  | Assi    | sting pr | e-silicon analog IC verification through a SIFT-based algorithm | ı 11 |
|    | 3.1     | Introdu  | uction                                                          | 11   |
|    | 3.2     | Scale I  | Invariant Feature Transform Descriptor                          | 11   |
|    |         | 3.2.1    | Overview                                                        | 11   |
|    |         | 3.2.2    | Detection of Critical Keypoints in Analog IC Time-series        | 12   |
|    |         | 3.2.3    | Descriptor Aimed to Describe Significant Events for Analog IC   | 10   |
|    |         |          | Signals                                                         | 12   |
|    |         | 3.2.4    | Bag-ot-Words optimal representation                             | 14   |

|    |        | 3.2.5 Clustering Block                                                  | 14 |
|----|--------|-------------------------------------------------------------------------|----|
|    |        | 3.2.6 Feature Space Visualisation                                       | 15 |
|    | 3.3    | Experimental Results                                                    | 15 |
|    |        | 3.3.1 Dataset                                                           | 15 |
|    |        | 3.3.2 Experimental Use-Cases of Clustering Signals                      | 15 |
|    | 3.4    | Summary and Conclusions                                                 | 16 |
| 4  | Clus   | stering Techniques for Post-Silicon Analog IC Verification              | 17 |
|    | 4.1    | Introduction                                                            | 17 |
|    | 4.2    | Dynamic Time Warping Feature Extraction                                 | 17 |
|    |        | 4.2.1 Overview                                                          | 17 |
|    |        | 4.2.2 Density-based spatial clustering for the selection of representa- |    |
|    |        | tive signals                                                            | 18 |
|    |        | 4.2.3 Constructing the Analog Signal Feature Space                      | 18 |
|    | 4.3    | Neural Network Model for Analog Signals                                 | 19 |
|    |        | 4.3.1 Overview                                                          | 19 |
|    |        | 4.3.2 Convolutional Neural Network Model                                | 19 |
|    |        | 4.3.3 Autoencoder hyperparameters optimization                          | 19 |
|    | 4.4    | Clustering Metrics Analysis                                             | 19 |
|    |        | 4.4.1 Overview                                                          | 19 |
|    |        | 4.4.2 Sensitivity Analysis of Internal Evaluation Metrics               | 20 |
|    | 4.5    | Dataset                                                                 | 21 |
|    | 4.6    | Experimental Use-Cases of Clustering Signals                            | 21 |
|    | 4.7    | Summary and Conclusions                                                 | 21 |
| 5  | Sup    | port verification of wafer fabrication                                  | 23 |
|    | 5.1    | Overview                                                                | 23 |
|    | 5.2    | Dataset                                                                 | 23 |
|    | 5.3    | Wafer failure detection                                                 | 24 |
|    | 5.4    | Experimental Results                                                    | 25 |
|    | 5.5    | Summary and Conclusions                                                 | 26 |
| 6  | Gen    | eral Conclusions                                                        | 27 |
|    | 6.1    | General Objectives and Results                                          | 27 |
|    | 6.2    | Original contributions                                                  | 29 |
|    | 6.3    | List of Original Publications                                           | 29 |
|    | 6.4    | List of Technical Reports                                               | 30 |
|    | 6.5    | Future Work                                                             | 30 |
| Re | eferen | nces                                                                    | 31 |

## List of tables

| 4.3 | CNN-AE model hyperparameters found after Bayesian optimization                                                                |    |  |
|-----|-------------------------------------------------------------------------------------------------------------------------------|----|--|
|     | $[GDB^+23] \dots \dots$ | 20 |  |
| 4.4 | Comparison between several clustering metrics                                                                                 | 20 |  |
| 4.5 | Comparison between SIFT, CNN-AE and DTW Results                                                                               | 22 |  |
| 5.1 | Top 20 Sensor Ranking based on SVM Accuracy and Davies-Bouldin                                                                |    |  |
|     | metric                                                                                                                        | 25 |  |

# List of figures

| 3.1 | Scale Invariant Feature Transform feature extraction block $[GND^+20]$ . | 13 |
|-----|--------------------------------------------------------------------------|----|
| 3.8 | Clustering results using purity metric                                   | 16 |

### **Chapter 1**

### Introduction

Analog integrated circuit (IC) verification is a major component in the development of analog circuits because it ensures that the circuits will work as intended. It is also an essential inspection step because it helps identify and rectify any defects or errors introduced during the design process.

Analog IC verification consists of two parts. In the initial phase, compliance with all requirements is examined. The functionality of these circuits is evaluated based on a variety of operational parameters, followed by a comparative analysis with respect to predetermined performance benchmarks. Various methods, such as simulation in pre-silicon and measurement in post-silicon, can be used to accomplish this goal. These methods will generate a series of signals that will be saved for subsequent examination.

In the second part of the analog IC verification, the signals need to be verified visually by experts. This is necessary because even if for a certain product all the requirements have been met, there is a possibility that it still does not work as expected. The challenge at this stage of verification is that we must manually verify a vast number of signals, as we must take into consideration signals from many possible combinations of operating conditions.

In the current work, we propose a method to make the process of manual verification of signals more efficient. This is accomplished by the process of signal clustering, which enables the analysis of a vast volume of data without the need to individually examine each signal. The purpose of this clustering procedure is not to verify the requirements, but rather to visually inspect the signal in order to confirm the absence of unexpected oscillations, overshoots, undershoots, or other forms of glitches.

The main objective of this method of optimization is to reduce the amount of time required to visually verify analog IC signals, while a second objective is to support IC fabrication by optimizing the process on how the production test sensor signals are analyzed.

# 1.1 Aiding analog IC verification and wafer production testing

During the pre-silicon verification, simulators are used to ensure that the design meets all requirements. It is also necessary to evaluate as many operating conditions as possible in order to reduce the possibility of design flaws. Following these simulations, a very large number of signals are generated, which need to be visually inspected by an expert. However, due to the fact that scenarios may be created automatically within the simulation, the verification becomes more challenging because of the increase in data that requires visual inspection. Due to the large number of signals to be analysed, signal clustering can have a significant impact at this stage.

In post-silicon verification stage, the IC was validated in simulation and must also be validated in laboratory conditions. Measurement is yet another crucial process for analog IC verification since it enables designers to actually measure the parameters of the circuits and compare them to the simulated parameters that were anticipated. Signals collected in laboratory are more challenging to evaluate than those used in simulation environments due to the presence of noise and the possibility of glitches like oscillations, overshoots, undershoots.

During wafer fabrication, the design is validated both pre-silicon and post-silicon; however, it is necessary to test the circuits printed on wafers according to specific parameters to ensure that production worked correctly. Therefore production is monitored with a series of sensors that are subsequently analysed to determine whether or not the fabrication process has functioned correctly. Several wafers will be affected if a particular problem occurs on production. Analysing the sensors and identifying those sensors that are correlated with the production error is the initial step in identifying the problem. Due to the fact that this process requires a laborious analysis of the signals, we propose to offer support for IC manufacturing by implementing an automatic method for identifying the sensors that are correlated with the manufacturing error.

Supporting analog IC verification and wafer fabrication testing is essential for reducing manual work involved. Therefore, we propose a new method for analysing these signals, by providing a more compact way of visualising the signals as clusters and reducing the number of signals that needs to be assessed.

### **1.2** Scope of the Research

The objective of this work is to aid the process of analog IC verification and production line testing by optimising manual work with the use of machine learning techniques. In the process of optimizing verification processes, the implementation of such methods may present certain challenges: Machine Learning methods for supporting verification of analog integrated circuits

- For machine learning algorithms to undergo training and evaluation, a substantial amount of data is required. The task at hand may involve the acquisition and labeling of extensive amounts of data from diverse ICs and operational scenarios within the framework of analog verification. A portion of the effort expended on these projects was devoted to the signal labelling process. This procedure had as its objective the grouping of signals based on their waveforms.
- The evaluation accuracy of the findings is directly correlated to the quality of the data that is used for training and testing the algorithms. The current objective might require the meticulous selection and processing of data to effectively mitigate the impact of unnecessary factors such as noise and other sources of imprecision.
- The process of feature selection may pose challenges, as it can be difficult to ascertain in advance the features that are relevant to the specific application. As a result, identifying the optimal combination of features to utilize might require extensive examination and experimentation. In order to accomplish this, it was necessary to evaluate a variety of feature extraction techniques to determine which are able to extract valuable information from the analogue IC signals for subsequent use in the clustering process.
- The task of selecting and adapting a suitable machine learning algorithm for this particular application can pose a significant challenge. This is due to the fact that there are multiple techniques and procedures that could potentially be optimized, each possessing specific characteristics. Several processes that have the potential for optimization include the identification of unexpected signal events, the identification of sensors reflecting chip failure, and support of IC fabrication.

Employing machine learning techniques to lower the amount of manual inspection required in the process of analog verification can be difficult, but it also has the potential to offer considerable benefits in regard to coverage and efficiency of the analog IC verification.

### **1.3** Motivation

The use of machine learning methodologies for the purpose of aiding analog verification processes has the potential to reduce the amount of manual work, simultaneously facilitating the analysis of large quantities of data related to complex operational scenarios. This may assist in accelerating the entire IC design process, while improving the efficiency of the verification process. Consequently, there are multiple justifications for using machine learning techniques to optimize the verification process:

The objective of this study is to maximize the optimization of the entire development procedure of analog ICs through the integration and adaptation of machine learning methodologies into IC verification processes, with the aim of achieving the greatest possible impact. The use of machine learning techniques for the enhancement and automation of the analog IC verification process has the potential to yield significant benefits.

### 1.4 Thesis Structure

In Chapter 2, we present fundamental elements of machine learning, including clustering techniques used in the verification optimization process, machine learning techniques, and the current state of optimization techniques based on machine learning for the analog IC verification.

In Chapter 3, we will present a method for enhancing the pre-silicon verification process by designing an algorithm to cluster signals based on well-known occurrences identified by verification engineers. This clustering and optimization technique consists of a time invariant-feature extraction algorithm, a clustering algorithm, and an algorithm for signal visualization in the space of features.

The emphasis of Chapter 4 is on events present in post-silicon signals that the algorithm presented in Chapter 3 is unable to characterize. In this framework, we have developed two methods for automated feature extraction: one based on neural networks and the other on a signal-processing approach called Dynamic Time Warping. Also in this chapter, we conducted a comparison examination of several clustering metrics in relation to our use case of analog IC verification.

In Chapter 5 we focus on supporting IC fabrication of wafers. This is done using the most effective feature extraction technique presented in Chapter 4 in combination with several classification algorithms and optimal clustering metrics for the specific use-case of analog IC signals.

In Chapter 6, we form a broad conclusion about the approaches that were proposed, discuss the influence on the current issue, and highlight the important contributions that the author has made. In the conclusion of the thesis, several suggestions for potential future research are presented on the subject of improving the efficiency of analog IC verification.

### Chapter 2

# **Related Work and Theoretical Fundamentals**

The focus of this study will be on assisting the verification of analog ICs in order to reduce the manual effort involved in this process. With the evolution of the semiconductor industry, the technology allows more requirements, and the verification procedure has become increasingly labor-intensive [GS19]. This was mostly due to the human component of the verification system, which cannot keep up with the enormous amount of data that must be visually evaluated [CK07].

Manual verification refers to the procedure of assessing the operational effectiveness of analog circuits by means of human proficiency in the form of visual examination. In the pre-silicon stage of the design process that comes before the IC is actually built, manual verification is frequently utilized to detect unexpected behaviours under various operating situations [GXGM19]. The analog circuits can be simulated with software applications, and the results can be compared to the intended performance parameters.

Enhancing and facilitating manual verification can significantly influence the IC analog verification process in both pre-silicon and post-silicon phases. The reason for this is that manual examination conducted by humans can require a significant investment of both time and resources. Hence, through the optimization of this particular aspect of the verification process, it is possible to enhance the effectiveness of the overall design process of analog ICs.

### 2.1 Analog IC verification methodology

The primary objective of the analog IC verification methodology pertains to the examination of analog IC's simulated or measured signals. The signals under consideration are primarily one-dimensional time-series that adhere to specific criteria, including settling time, overshoot or undershoot levels, and signal values. Furthermore, unusual phenomena such as oscillatory patterns or atypical signal morphology are deliberately sought after in these signals.

Using automated tools that are able to assess the simulated data and identify any deviations from the set performance standards can be beneficial in making the process of manual inspection more efficient [MGK<sup>+</sup>05]. Improving the quality of data visualization serves as a supplementary measure to automation. In situations where a large volume of data necessitates manual evaluation, engineers may employ a data visualization technique that enables the simultaneous viewing and comparison of hundreds of signals in a clear and understandable manner.

### 2.2 Optimize verification using machine learning

The analog IC verification procedure can be made more efficient by employing various machine learning strategies. These methods involve training and fitting algorithms to extract relevant features from analog signals. The first step of using machine learning techniques for the verification of analog ICs is to gather and annotate vast volumes of data from a variety of products and operating situations. This data may then be used to develop machine learning techniques to uncover specific patterns in the analog signals [MCC<sup>+</sup>22].

Researchers employed unsupervised machine learning approaches in [YWCW21]. The goal of the study was to evaluate signals and find faulty behaviours directly from the time series without any preprocessing steps. The findings demonstrated that the machine learning approaches were able to effectively identify faulty behaviours within the signals, which resulted in an increase in the speed and accuracy of the visual inspection process.

### 2.3 Feature Extraction for Time-Series

The process of obtaining relevant features or traits from a database in order to include them into a neural network model or use them in an algorithm is referred to as feature extraction and is one possible first steps involved in machine learning [KD16]. The purpose of feature extraction is to locate those aspects of the data that are the most significant and pertinent to the problem at hand, thereby assisting the model in correctly classifying the data.

When referring to signal processing, the term "feature extraction" may refer to the process of extracting aspects of a signal such as the frequency spectrum, the shape of the waveform, or the statistical qualities of the signal [RHW<sup>+</sup>16]. After that, these features can be sent into a machine learning model in the form of input, where the model can be trained to categorize or predict the signal based on these features.

### 2.4 Waveform Clustering

Clustering is a machine learning technique that entails the formation of clusters from sets of data points that exhibit similarities. The primary objective of clustering is to identify recurrent patterns and interrelationships within the data. This process enables analysts to gain novel insights and achieve a more comprehensive understanding of data distribution, even facilitating visualization [XW10].

Clustering is a valuable method for gaining knowledge of the underlying distribution of data as well as locating patterns and correlations that are present within it. Due to these factors, we believe clustering can have a significant impact on the optimization of the analog IC verification process, given the vast quantity of data involved.

#### 2.4.1 K-means

K-means clustering is an algorithm that divides a dataset into a predetermined number of groups depending on the distance between data points. The objective of k-means is to minimize the sum of squares within a cluster, a measurement of the distance between data points inside each cluster [JJJ<sup>+</sup>20].

The fact that k-means clustering can be implemented quickly and with little effort is one of the many strengths of the method. In addition to this, it is an efficient method for locating clusters that have fairly consistent densities and shapes. On the other hand, it is possible for it to be sensitive to the initially selected centroids, and it may not perform very well on datasets that have non-uniform densities or shapes. The K-means clustering algorithm is a powerful and popular method that divides a dataset into groups based on the average distance between each pair of data points in the dataset. Finding patterns and correlations in the data, as well as getting insights into the underlying structure of the dataset, are also effective applications of this technique.

#### 2.4.2 Hierarchical clustering

Hierarchical clustering is a technique that groups data points based on a feature space representation and creates a layered tree-like organization of the data. This type of clustering may be broken down into two primary categories: agglomerative (or bottom-up) and divisive (top-down) [PSJ15]. The most popular method of clustering is called agglomerative hierarchical clustering, and it involves beginning with a large number of tiny clusters and then merging them together based on the similarities between them until all of the data points are included in a single cluster. In contrast, divisive hierarchical clusters depending on the similarity between the data points in each of the smaller clusters until each data point is in its own cluster.

Hierarchical clustering generates a visual representation of the clusters in the form of a dendrogram, which provides a reasonably comprehensive interpretation, especially for the analog IC verification data.

#### 2.4.3 DBSCAN clustering

The approach known as Density-Based Spatial Clustering of Applications with Noise (DBSCAN) proposed by the authors in [Den20] is used to cluster individual data points included within a dataset. It is predicated on the hypothesis that clusters in a dataset are produced by high-density regions of points, which are then separated by low-density regions. DBSCAN requires the definition of 2 hyperparameters; the first one is Epsilon, the maximum distance to the closest neighbours and the second one is MinPts, the minimum number of points needed for illustrating a cluster [BY21].

The ability of the algorithm to automatically count the number of clusters present in the dataset, as well as its capacity to single out individual points that do not belong to any cluster, are two of the many strengths of the algorithm. Additionally, it is relatively efficient because it only needs to compute the distance between points that are within Eps of one another in order to do its job.

### 2.5 Feature Space Dimensionality Reduction

The visualization of signals in a concise and comprehensible manner through the utilization of algorithmically extracted features is a crucial element of this study. Given that the feature space is characterized by multiple dimensions, it is necessary to employ dimensionality reduction techniques in order to facilitate the visualization of signals as points within a two-dimensional space.

Dimensionality reduction, also referred to as feature space dimensionality reduction, is a technique employed in the domain of machine learning to decrease the number of features or dimensions in a given dataset while retaining the maximum amount of pertinent information. This provides benefits for numerous reasons, including the following:

The process of reducing dimensionality requires careful consideration of its tradeoffs, as it can potentially reduce the amount of information available to the model, which may compromise its performance.

### 2.6 Clustering Performance Evaluation

The monitoring of clustering performance efficiency is very important. We may split the available clustering evaluation algorithms into two broad categories: metrics that need knowledge of the ground truth called external evaluation metrics [dSCF<sup>+</sup>12], and metrics

that evaluate the clustering result itself, called internal evaluation metrics  $[LLX^+13]$ . Each of these types has benefits and drawbacks; hence, we have the following qualities for each category.

### 2.6.1 External Evaluation

#### **Purity Metric**

Purity is one of the measures used to validate the results, which is a statistic that indicates the degree to which a group comprises just one signal type [RJ18]. This measure is substantially more effective when dealing with a small number of signal clusters, since it is much easier to get the highest level of signal purity when there are numerous small signal clusters. If we choose to associate each point with a cluster, we will obtain a purity score of one hundred percent, rendering this statistic meaningless. In selecting this metric, we thus took into account the necessity to restrict the number of clusters and to have balanced data, as the purity metric does not produce relevant results in the event of imbalanced data.

#### Fowlkes-Mallows index

The Fowlkes-Mallows (FM) external evaluation metric is used to measure the ability of the clustering process to replicate the manual annotation distribution [ISAS21]. The FM metric has a value that indicates how results of the clustering match the manual annotations. To determine the Fowlkes-Mallows index, proposed in [FM83].

This statistic is good due to the fact that meaningful findings are produced for both imbalanced and outlier-containing datasets, as both accuracy and recall are included. Given its exhaustiveness as an external assessment metric compared to purity, this metric is also well suited for comparing clustering outcomes with manual annotations conducted on signals acquired from analog ICs.

### 2.6.2 Internal Evaluation

The aim of the category of internal evaluation metrics is to determine the degree to which the signals may be differentiated from one another in the feature space. It is crucial to consider this aspect as it is the means by which we can clearly assert that one algorithm is capable of extracting more relevant features than another  $[LLX^+10]$ . Although these metrics assess essentially the same aspects, such as the degree of cluster compactness and the distance between clusters [VK16], each has unique characteristics that make it more or less ideal for analog signal clustering. In light of this, an objective comparison of these measures is necessary, taking into account what we wish to emphasize in the optimization of signal verification and flaw detection.

### **Davies-Bouldin Metric**

The Davies-Bouldin metric is a common clustering performance indicator utilized in the scientific literature to assess the efficacy of clustering approaches [SMMS20]. This algorithm examines the entire quality of clustering, including the degree to which points are compactly clustered as well as the distance between groups of points. Therefore, clusters that are farther apart from one another and more dense will have a lower score, indicating superior performance [KMH17].

### Silhouette Coefficient

The sensitivity of the silhouette coefficient to the number of clusters necessitates the comparison of the silhouette coefficient across varying numbers of clusters in order to ascertain the most optimal number of clusters. It is noteworthy that the silhouette coefficient exhibits sensitivity to the data scale, thereby necessitating the normalizing of data prior to the computation of the silhouette coefficient [SN20].

#### Calinski-Harabasz Index

The Calinski-Harabasz index (CH index) presented in [CH74], also referred to as the Variance Ratio Criterion, is a metric that quantifies the degree of compactness and separation of clusters within a given dataset. The use of an evaluation criterion is viable for the purpose of ascertaining the most suitable number of clusters in a clustering algorithm.

### 2.7 IC fabrication verification

During the wafer fabrication, numerous sensors monitor production process. In the event of a production problem, these sensors must be analysed both automatically and manually to determine the potential cause. The process of analysing wafers sensors in order to identify problems may be enhanced in a variety of ways by using machine learning [CTGK22]. One application of machine learning is the automation of data collection and analysis, which can enhance the efficiency of these procedures. Machine learning algorithms can be used to identify the fundamental causes of a problem by analyzing extensive amounts of data related to the IC design, manufacturing process, and test outcomes [RZWD15]. This can facilitate accelerating the process of investigating the fundamental cause of the problem, thereby diminishing the quantity of time and resources required to determine the source of the issue.

### Chapter 3

# Assisting pre-silicon analog IC verification through a SIFT-based algorithm

### 3.1 Introduction

In the process of pre-silicon verification of analog ICs, it is necessary to visually verify the signals generated after the simulations. The step of manual verification of the signals is necessary because there is a possibility of unexpected behavior occurring for which checkers cannot be written in advance. This process is extremely labor intensive as explained in the article [KC06]; therefore, we propose multiple techniques for signal clustering to facilitate signal assessment as a solution to this issue.

Multiple experiments are conducted on each type of IC, after which signals are generated and analysed. This strategy is very beneficial because, for each individual test, the signals will be clustered, allowing engineers to concentrate on a few groups of signals rather than inspecting each signal individually.

This enhanced automated technique for clustering the vast quantities of data makes the verification process significantly faster and more reliable because if any of the measured or simulated signals present unexpected events, they will be clearly separated from the rest of the signals.

### **3.2 Scale Invariant Feature Transform Descriptor**

#### 3.2.1 Overview

One of the main challenges of waveform clustering is the variable length of the signals. This issue often occurs in signals originating from simulations. Another challenge is the high frequency signals generated by the simulation during pre-silicon phase, which can lead to challenges in identifying the appropriate similarity metric needed for the clustering algorithm. Hence, time-scale and signal length invariance is a basic requirement for waveform clustering of verification signals. Achieving this can be done either by choosing the appropriate feature extraction methods or algorithms in general.

Feature extraction process involves examining the analog IC signal in three successive steps to obtain an accurate feature multidimensional representation as shown in Figure 3.1, which depicts the functional block diagram of the full feature extraction algorithm.

The objective of the SIFT-based algorithm is to extract relevant characteristics from signals, based on waveform similarity. In order to extract these features, the algorithm identifies keypoints in the signals to locate significant features for extraction. A descriptor is then used to characterize these regions as accurately as possible, using as few coefficients as possible. Descriptors are obtained by performing a continuous wavelet transformation (CWT) followed by a 2D Discrete Cosine Transform (DCT) on the resulting scalogram of the CWT. One of the benefits of utilizing CWT is its ability to effectively depict the time-frequency spectrum of the neighborhood of keypoints in a resilient manner, as stated in [LO12]. The reason why we applied CWT to analog signals was to accurately characterise short-transient behaviours, as some signal errors can have a high frequency and short duration. In addition, DCT was used to compress the features due to the large number of coefficients in the resultant scalogram [ANR74].

#### **3.2.2** Detection of Critical Keypoints in Analog IC Time-series

To begin the process of feature extraction, the initial step involves the identification of keypoints. These keypoints are defined as points of interest based on the SIFT approach. In order to identify the aforementioned points of interest, a Gaussian filter will be employed. This type of linear filter is frequently utilized in image processing to mitigate noise or blur images, as explained in Lowe's work [Low04] for the SIFT-based approach. After applying Gaussian filters to the picture at several scales, the image is convoluted with those filters, and then the difference between each subsequent Gaussian-blurred image is obtained. The maxima and minima of the Difference of Gaussians (DoG) are then used to determine the keypoints, which can occur across many scales.

### 3.2.3 Descriptor Aimed to Describe Significant Events for Analog IC Signals

The descriptor was developed in such a way that it would provide a very unique representation of the regions surrounding the keypoints. This stage involves the consideration of two different operations: first, a wavelet transformation block, and then a Discrete Cosine Transform (DCT) [ANR74].

The objective of the CWT is to accurately represent in time-frequency domain the segments of the signal from each keypoint neighborhood. This representation, known



Fig. 3.1 Scale Invariant Feature Transform feature extraction block [GND<sup>+</sup>20]

as a scalogram, is well suited describing the non-periodically variations and short time transients that are common characteristics of our dataset. A time segment will be defined for the frequency analysis to be carried out, and since this will happen for each keypoint that is located by the SIFT block, multiple frequency representations will be carried out for each individual signal depending on the number of keypoints.

In order to accurately represent brief occurrences with fluctuating frequency, we employed the generalized Morse wavelets due to their computational efficiency in examining isolated discontinuities, as noted in the literature [LO12].

With the help of this new ordered representation, it is possible to compress the scalogram by employing a number of coefficients that is lower than 64. In the context of the present work, the empirical research making use of the provided data revealed that a reconstruction error of less than 5% is observed within the coefficient range of [25-35].

The DCT coefficients were quantified using 31 quantization levels as described in [RS01], in order to achieve a better compression level and to allow the feature aggregation step.

Therefore, for each segment of signals defined by a keypoint, 31 coefficients will be used to characterize that region. These coefficients describe the proposed analog signalspecific version of the descriptor. By employing frequency analysis and compression techniques based on DCT, we are capable of describing transient events of short duration which are of utmost importance for the verification process.

#### 3.2.4 Bag-of-Words optimal representation

This final phase of constructing the feature space employs generating a dictionary of symbols that is utilized to describe signals as histograms of symbols throughout the clustering procedure. This signal characterization (also known as bag-of-words) has the benefit of ensuring signal length and keypoint number invariance. The reason we chose this strategy was to define the characteristics of a signal based on the shape of the signal within a window centred on the keypoint. In order to accomplish this, it was necessary to identify the most frequently encountered waveforms around the keypoints, and then create a dictionary based on these waveforms.

In order to create the bag-of-words representation of the dataset signal, it is necessary to split the resultant DCT descriptor space based on their hyperspace separability. This is achieved using Density-Based Spatial Clustering [KuRA<sup>+</sup>14] on descriptors derived from the entire dataset. In our scenario, the 31 resulting clusters will constitute the dictionary of symbols.

#### **3.2.5** Clustering Block

This step is performed to separate the signals of each individual test in the dataset into separate groups with their own bag of words representation. This will allow the verification engineer to visually check the signal sets more easily. For this purpose, we used k-means clustering [KYY20], which is a simple and efficient approach when there is a clear boundary between clusters in the multidimensional space of the characteristics. Also, we employed hierarchical clustering with an agglomerative approach [KYY20] for comparison reasons.

K-means was chosen because it successfully groups points with linearly different features in a multidimensional space. In this case, the clustering approach was applied to the multidimensional feature space. In order to apply the k-means approach, we need to provide the number of clusters. This value was chosen based on the amount of labels from the dataset for each test.

#### 3.2.6 Feature Space Visualisation

The visualization is also beneficial when there are outliers that are highly distinct from the other signals, which will be displayed in the two-dimensional space far away from the other locations. This visualization was realized by means of the Principal Component Analysis (PCA) algorithm [MR93], which had the role of reducing the space of features generated by the proposed algorithm to two dimensions.

Due to the fact that this dimensionality reduction has a loss in information [GK12], it is vital to know how much information is kept in the first two components for effective separation and 2D visualization. To establish the suitability of this type of presentation for analog signals in a two-dimensional space while preserving multidimensional separability, it was imperative for our dataset to exhibit a significant proportion of variance in the initial two principal components [DYL08].

### **3.3 Experimental Results**

### 3.3.1 Dataset

In this stage of our research, we've created a dataset of using data coming from 10 simulations making up a total of 2,950 signals. These signals were obtained in a simulation environment using Low Dropout Regulators. For each test, a verification engineer manually labels the signals into two or three groups that reflect distinct positive behaviours and failure events. Since we are doing unsupervised classification (clustering), this also aids in evaluating the performance of the algorithm.

The length of the signals differs considerably among the whole dataset. This justifies the need for a clustering algorithm that provides length invariance. In addition, the dataset contains signals presenting phenomena relevant to the analog IC verification, including overshoot, undershoot, and oscillations.

#### 3.3.2 Experimental Use-Cases of Clustering Signals

In this work, a series of experiments were conducted with the primary goal of determining the extent to which the SIFT-based algorithm can extract valuable features from analog IC signals. Additionally, we determined which is the best clustering algorithm applied on the feature space produced by SIFT-based algorithm. In order to examine all of these aspects, ten groups of signals presented in the preceding chapter were used, along with the SIFT algorithm and two clustering algorithms, K-means and Hierarchical clustering.

A first analysis was conducted based on the clustering purity metric to visualize the degree of separation between the signals based on the manual annotation. In Figure 3.8 is presented the comparison based on the purity of the two clustering algorithms applied to the same space of features. In the case of clustering based on k-means, as shown in

Figure 3.8, we obtained an average purity of 97% for the tests, which demonstrates an excellent separability of the signals in the feature space. Also, this analysis demonstrates that hierarchical clustering yields less accurate clustering results, which suggests that organizing the signals in the form of a dendrogram may not be the optimal solution for the purpose of this work.



Fig. 3.8 Clustering results using purity metric

### 3.4 Summary and Conclusions

In this chapter, we have presented a method that can be used for clustering circuit test waveforms by their similarity in order to decrease the time needed for pre-silicon verification. We have achieved this by using a bag-of-words approach that uses a SIFT framework for extracting signal features. Besides very good clustering results with a 98% purity, this approach also has the advantage of obtaining time-scale and signal length invariances. Despite the availability of annotated datasets, we have developed an unsupervised learning algorithm with the purpose of using it on a significantly larger dataset, where the labeling process is impractical due to the associated effort.

### **Chapter 4**

# **Clustering Techniques for Post-Silicon Analog IC Verification**

### 4.1 Introduction

In this chapter, we developed two methods to aid the verification flow: one based on Dynamic time warping (DTW) and the other on autoencoder. These two techniques can be used for feature extraction from analog IC signals with unexpected behaviours. DTW and autoencoders are two approaches that have the potential to be effective when it comes to the extraction of features from analog IC signals that exhibit unexpected behaviour. The features of the data and the objectives of the analysis will both play a role in the selection of the appropriate algorithm. In order to determine which method would yield the best results for a specific application, it might be essential to try out a variety of algorithms and combinations.

### 4.2 Dynamic Time Warping Feature Extraction

#### 4.2.1 Overview

The DTW-based algorithm is used in the suggested technique in order to measure waveform differences between time-series originating from the same test. In order to evaluate the proposed method, we utilized multiple clustering metric techniques to measure the feature space separability between clusters and the execution time of the algorithm. This approach, in its most basic form, involves the use of a warping technique between 2 signals in order to compare them as clearly as possible. Following the execution of the algorithm, it will result the DTW coefficient that indicates the degree to which two signals are comparable to one another.

# **4.2.2** Density-based spatial clustering for the selection of representative signals

The feature extraction method is based on DTW algorithm and its associated measure, both of which are defined in works such as [HPB19], [MCC<sup>+</sup>15]. As the DTW technique involves the comparison of a combination of two signals, it is necessary to construct a collection of reference signals that can serve as a benchmark for the comparison of all other signals. By applying the DTW method between a reference signal and another signal, we will obtain the warping matrix containing the coefficients that can be used to compress or extend a signal to achieve an ideal alignment between the signals.

Because the DTW algorithm performs very well when comparing relatively similar signals, it would be optimal to choose a few sample signals from each test and establish them as reference signals. For the selection process, we propose a first representation of the signals in a two-dimensional form, in which all signals from a test may be grouped. In line with the DTW technique, we will utilize two random reference signals. This projection of the signals in the space of the coefficients results from the DTW algorithm with these random reference signals  $T = [T_1, T_2]$ . Following the execution of DBSCAN clustering, an algorithmic determination of multiple clusters will result. In order to select reference signals, it is necessary to establish a technique for extracting the centroids of each individual cluster.

### 4.2.3 Constructing the Analog Signal Feature Space

The DTW-based algorithm will be applied between the reference signals specified in the previous stage and all the signals within a test. Therefore, each unique signal will be assigned a set of  $C_j$  coefficients that define its degree of similarity to the reference signals.

Currently, the signals obtained from each individual verification test in the database possess a multidimensional representation, which enables their analysis for the purpose of identifying any linear separability that may exist between groups of signals. The implementation of a clustering strategy in laboratory experiments can potentially differentiate signals that arise from experiments with expected behavior from those that result from failure experiments or anomalous behaviors that exhibit outliers. This can be facilitated by standardizing the test analog IC input signals. The feature space is constructed using DBSCAN to automatically choose the reference signals. This approach is advantageous since it aims to minimise the number of dimensions, or features, involved.

### 4.3 Neural Network Model for Analog Signals

### 4.3.1 Overview

In this research, we offer a method of representation for analog IC signals. The initial step in the representation process is the selection of relevant characteristics from the acquired data. CNNs were selected in order to obtain the most relevant attributes in the most effective way. CNN has been employed efficiently in various cases requiring automated extraction of features, such as [KHH18] and [RMX<sup>+</sup>19].

### 4.3.2 Convolutional Neural Network Model

The objective of this AE-CNN model is to effectively extract the essential features of input images by compressing the network coefficients into the intermediate layer. Autoencoders are equipped with an intermediate layer commonly known as the "bottleneck," which is notably smaller in size than the input and output layers.

The use of Convolutional Neural Networks is required by the necessity to automatically retrieve signal properties for several analog IC events and test types. In addition, this is crucial when dealing with signals that may offer unexpected occurrences for which a robust representation is required to emphasize them throughout the visualization process. The conventional methodologies require the manual design of descriptors for specific events, which is a tedious process due to its empirical nature. The outcome is not guaranteed to be optimal as it necessitates the consideration of numerous tests and analog ICs.

### 4.3.3 Autoencoder hyperparameters optimization

We employed Bayesian optimization [AR08] to enhance the model parameters of the CNN-AE model. We partitioned the database into 70% training and 30% testing in order to improve and evaluate the model with the highest efficiency and prevent overfitting. By altering the number of convolution layer coefficients, we assessed many configurations for our particular use case. The following hyperparameter settings from Table 4.3 were found to yield the best clustering results after executing the model optimization procedure.

### 4.4 Clustering Metrics Analysis

### 4.4.1 Overview

The manner in which we evaluate the quality of clustering is of utmost relevance since we examine the feature extraction methods through the lens of a clustering metric. Selecting

Chapter 4 - Clustering Techniques for Post-Silicon Analog IC Verification

Table 4.3 CNN-AE model hyperparameters found after Bayesian optimization [GDB<sup>+</sup>23]

| Hyperparameter                   | <b>Optimal Value</b> |
|----------------------------------|----------------------|
| CNN - kernel                     | 2                    |
| MaxPooling - pool size           | 2                    |
| MaxPooling - strides             | 2                    |
| Dropout - rate                   | 0.3                  |
| Activation function              | ReLu                 |
| Bottleneck - No. of Coefficients | 128                  |

the right clustering metric for our specific use-case is not a simple task since we wish to emphasize certain behaviours seen throughout the analog IC verification process. Therefore, a comprehensive examination of various metrics is necessary to identify which one is most suited for our application.

### 4.4.2 Sensitivity Analysis of Internal Evaluation Metrics

In this study, we ran 13 tests with Gaussian distributions, and the results can be seen in Table 4.4. In these tests, we adjusted the distance between clusters for tests 1-4, created some clusters with incorrectly annotated points from tests 5-7 and created clusters with outliers for tests 7 to 13. Also X1 and X2 are two random vectors that constitutes the basis for the multivariate normal distributions for each set of generated points.

| Id     | No. of Points | Davies-Bouldin | Silhouette | СН   | FM    |
|--------|---------------|----------------|------------|------|-------|
| Set 1  | 400           | 0.149          | 0.851      | 6500 | 1     |
| Set 2  | 400           | 0.424          | 0.686      | 1527 | 0.985 |
| Set 3  | 400           | 0.798          | 0.426      | 444  | 0.794 |
| Set 4  | 400           | 11.4           | 0.026      | 2    | 0.506 |
| Set 5  | 403           | 0.238          | 0.825      | 4051 | 0.985 |
| Set 6  | 410           | 0.264          | 0.788      | 2438 | 0.952 |
| Set 7  | 430           | 0.394          | 0.676      | 1155 | 0.870 |
| Set 8  | 401           | 0.206          | 0.852      | 4449 | 1     |
| Set 9  | 403           | 0.233          | 0.834      | 2351 | 1     |
| Set 10 | 410           | 0.281          | 0.810      | 1251 | 1     |
| Set 11 | 410           | 0.474          | 0.755      | 322  | 0.952 |
| Set 12 | 410           | 0.464          | 0.759      | 333  | 0.952 |
| Set 13 | 200           | 0.211          | 0.850      | 3186 | 1     |

Table 4.4 Comparison between several clustering metrics

### 4.5 Dataset

In this study, the database was expanded from ten tests consisting of 2950 signals to thirty tests of signals totaling 10200 signals. Each test was composed of a limited number of signal categories. For each distinct test of signals, a specialist grouped the signals into two or three classes and labeled each signal with the respective class. These labels indicate the ground truth that we will consider while only evaluating the clustering performance. Because we are using unsupervised machine learning techniques, it is recommended to compare the results with labels in order to display the overall performance.

### 4.6 Experimental Use-Cases of Clustering Signals

In this chapter, we attempted to determine which of the three previously proposed algorithms is able to extract the best features from the analog signals measured during the verification process. For the experiments, we used a more exhaustive database than the one used to evaluate the efficacy of the SIFT-based algorithm in the preceding chapter. As shown in Table 4.5, the DTW-based procedure yields the greatest results for the vast majority of signal tests. The instances in which the CNN-AE technique is not surpassed are those that exhibit a high degree of class differentiation and where the inter-cluster distance holds significance.

### 4.7 Summary and Conclusions

In this chapter, we presented and proved the validity of clustering strategies for optimizing the post-silicon verification process using a CNN-AE method and a DTW-based algorithm. The CNN-AE strategy was achieved by combining convolutional neural networks and autoencoder networks capable of generating a feature space suitable for analog signals, while the DTW-based algorithm was implemented by using warping distance of the DTW matrix as features with automatically chosen waveform references. By comparing the purity and Davies-Bouldin metrics proposed in this paper with the previously developed SIFT-based clustering approaches, we demonstrated that both methods in this chapter are superior.Despite comparable external evaluation performance, as measured by purity metrics, the DTW-based clustering method exhibits a distinct advantage over its counterpart when evaluated internally.

In this chapter, we have presented and validated two efficient methods for clustering IC test response signals based on their visual similarity. As a result of the ability of the algorithms proposed in this chapter to distinguish between distinct signal forms, we can say that we have an impact on the post-silicon analog I verification methodology by reducing the amount of manual effort required to identify outlier signals.

|           |      | Purity (%) |      | Davies-Bouldin |        |       |
|-----------|------|------------|------|----------------|--------|-------|
| Test Name | SIFT | CNN-AE     | DTW  | SIFT           | CNN-AE | DTW   |
| Test 1    | 100  | 100        | 100  | 0.365          | 0.281  | 0.192 |
| Test 2    | 100  | 100        | 100  | 0.544          | 0.255  | 0.179 |
| Test 3    | 100  | 100        | 100  | 0.002          | 0.007  | 0.004 |
| Test 4    | 98   | 98         | 100  | 0.546          | 0.239  | 0.152 |
| Test 5    | 98   | 100        | 100  | 0.976          | 0.334  | 0.188 |
| Test 6    | 97   | 100        | 100  | 1.422          | 0.318  | 0.001 |
| Test 7    | 92   | 96         | 96   | 0.681          | 0.552  | 0.466 |
| Test 8    | 93   | 97         | 100  | 0.421          | 0.418  | 0.211 |
| Test 9    | 100  | 100        | 100  | 0.420          | 0.277  | 0.009 |
| Test 10   | 88   | 99         | 100  | 1.526          | 0.401  | 0.250 |
| Test 11   | 100  | 100        | 100  | 0.592          | 0.363  | 0.176 |
| Test 12   | 100  | 100        | 100  | 0.370          | 0.229  | 0.194 |
| Test 13   | 72.2 | 91         | 100  | 1.423          | 0.729  | 0.015 |
| Test 14   | 73.6 | 77.6       | 100  | 1.488          | 1.198  | 0.265 |
| Test 15   | 100  | 100        | 100  | 0.470          | 0.372  | 0.197 |
| Test 16   | 100  | 100        | 100  | 0.311          | 0.214  | 0.163 |
| Test 17   | 100  | 100        | 100  | 0.249          | 0.152  | 0.171 |
| Test 18   | 94.3 | 100        | 100  | 0.527          | 0.418  | 0.364 |
| Test 19   | 64.6 | 59.6       | 90.6 | 1.422          | 3.017  | 0.728 |
| Test 20   | 94.8 | 96.8       | 100  | 0.663          | 0.558  | 0.242 |
| Test 21   | 96.3 | 100        | 100  | 0.486          | 0.233  | 0.219 |
| Test 22   | 100  | 100        | 100  | 0.336          | 0.185  | 0.091 |
| Test 23   | 100  | 100        | 100  | 0.103          | 0.069  | 0.085 |
| Test 24   | 87.3 | 100        | 100  | 0.881          | 0.365  | 0.222 |
| Test 25   | 98.6 | 100        | 100  | 0.647          | 0.140  | 0.064 |
| Test 26   | 100  | 100        | 100  | 0.335          | 0.277  | 0.255 |
| Test 27   | 99.3 | 100        | 100  | 0.481          | 0.368  | 0.299 |
| Test 28   | 100  | 100        | 100  | 0.164          | 0.082  | 0.104 |
| Test 29   | 62   | 64         | 88   | 0.929          | 1.049  | 0.588 |
| Test 30   | 74   | 86         | 92   | 0.953          | 0.729  | 0.662 |

Table 4.5 Comparison between SIFT, CNN-AE and DTW Results

### **Chapter 5**

# Support verification of wafer fabrication

### 5.1 Overview

This work addresses the issue of decreasing the amount of human labour required for the analysis of the production sensors. This is accomplished using DTW-based algorithm and metric clustering analysis in conjunction with an SVM classifier or Davies-Bouldin metric. The utilization of DTW-based algorithm is motivated by its efficacy in encoding dissimilarities between analogous signals, which is exemplified in speech signal analysis software [HPB19] and also in our work [GDB<sup>+</sup>22].

Clustering is an effective methodology that can support analog IC fabrication. The process of clustering data can aid in the identification of fundamental patterns and failure modes, thereby facilitating the optimization of the manufacturing process and enhancing yield.

### 5.2 Dataset

The validation dataset for the proposed approach includes 972 measured and labeled wafers (171/801 fail/pass samples). Each wafer is defined by 56 sensor waveforms obtained during the testing method, totaling 54,432 distinct waveforms. In addition, the rating of each of the 56 related tests is presented. The experts generated these dataset annotations and ranks using a visual examination technique. Due to the fact that the study comprised a total of 54,432 unique waveforms, we can confidently assert that the dataset used is large enough to verify the proposed methodology. This dataset could not be used for the clustering scenario described in Chapters 3 and 4, as the labels were assigned differently. This was determined not by the waveform of the signals, as in previous instances, but by the behaviour of the wafer.

### 5.3 Wafer failure detection

The technique for extracting features that might highlight waveform (di)similarities is based on Dynamic Time Warping and its related measures, as described in [HPB19], [MCC<sup>+</sup>15]. Each sensor waveform set shares a certain degree of similarity with respect to non-faulty wafers, the DTW-based approach was chosen as the feature extraction algorithm. The rationale behind selecting the DTW-based algorithm is attributed to its superior performance in IC verification when compared to alternative algorithms. Also, the algorithm's resistance to noise and invariance to length were significant factors.

We used a trained SVM classifier to measure the degree of separation between each sensor waveform, by locating the separation hyperplane between faulty and non-faulty samples. The degree of class separability as determined by SVM classification accuracy is a ranking parameter for sensors in regard to the correlation with the defective wafers. An alternative ranking metric is a Davies-Bouldin cluster metric applied in the feature space.

For determining the separability of the two classes inside the feature space, we used the classification accuracy of a nonlinear classifier. We picked a Gaussian kernel for the SVM classifier [DS17], due to its optimal maximum classification margin characteristic [ZG08]. We have trained an SVM for each test sensor, with 75% of the data used for training and 25% for testing. The training approach involved hyperparameter optimization of the Gaussian kernel scale and box constraint utilizing a Bayesian Optimization algorithm as the acquisition function [FIPG18]. We have also utilized a balanced weight standardization to compensate for the imbalanced nature of the dataset.

The classification accuracy on the test set was employed as a ranking metric for each sensor waveform feature set. The primary factors contributing to failure can be attributed to the tests that offer the greatest precision in categorizing waveform properties. The Davies-Bouldin metric has been employed to evaluate the degree of distinctiveness between the two categories in the feature space for the purpose of comparison. The clustering metric mentioned above is a simplified method that eliminates the requirement for hyper-parameter tuning and is effective for categories that are linearly separable.

Additional hyperparameters of the SVM comprise the degree parameter and coefficient parameter, which are exclusively applicable to polynomial kernels. Additionally, the cache size and convergence criteria are utilized to regulate the memory consumption during training and the halting criterion for the optimization algorithm, respectively.

The selection of suitable hyperparameters for SVM is contingent upon the particular problem at hand and the inherent attributes of the data. Experimentation and validation, which aid in choosing appropriate hyperparameter values, can help achieve optimal performance for a given problem.

### **5.4 Experimental Results**

During this experiment, the experts manually assessed which of the existing sensors in the dataset are connected with the failure behavior and determined an order of significance regarding this behaviour. Hence, in the Table 5.1 we ordered the first 20 sensors as it was determined manually by the experts. This manual ranking will serve as the basis for the analysis of the results. It is important to recognise that ranking can be a time-consuming task, potentially spanning several weeks. As a result, our efforts were concentrated on finding methods to automatically arrange these sensors in approximately the same order in order to save time.

Although the three ranking methodologies exhibit variations, they maintain a similar ranking order in comparison to the manual ranking reference. All three ranking approaches produce an identical list of the most highly ranked sensors. The findings indicate that the system has the capability to automate the mandatory evaluation of sensor ratings in regard to their correlation with a repetitive failure, a process that typically involves human expertise and significant manual effort, while maintaining a similar level of accuracy.

| Manual Ranking | SVM Accuracy [%] | Davies-Bouldin Score |
|----------------|------------------|----------------------|
| Sensor 12      | 67.07            | 0.88                 |
| Sensor 4       | 66.66            | 0.96                 |
| Sensor 35      | 65.02            | 0.94                 |
| Sensor 7       | 62.69            | 0.98                 |
| Sensor 5       | 64.60            | 1.05                 |
| Sensor 50      | 59.25            | 1.26                 |
| Sensor 35      | 60.08            | 1.14                 |
| Sensor 21      | 61.31            | 1.51                 |
| Sensor 17      | 58.02            | 1.36                 |
| Sensor 38      | 57.01            | 1.51                 |
| Sensor 28      | 57.61            | 1.45                 |
| Sensor 16      | 56.91            | 1.63                 |
| Sensor 3       | 54.22            | 1.52                 |
| Sensor 36      | 50.11            | 1.55                 |
| Sensor 41      | 51.55            | 1.91                 |
| Sensor 29      | 53.39            | 1.98                 |
| Sensor 44      | 53.61            | 1.92                 |
| Sensor 19      | 52.18            | 1.98                 |
| Sensor 31      | 49.36            | 1.89                 |
| Sensor 14      | 47.88            | 2.03                 |

Table 5.1 Top 20 Sensor Ranking based on SVM Accuracy and Davies-Bouldin metric

In the third column of Table 5.1, we computed the Davies-Bouldin index for each sensor, which is an simpler and faster method for determining linear separability. When considering this index, it is important to keep in mind that a smaller value corresponds to a higher degree of separability. As shown in the Table 5.1, this metric is validates by the manually established order.

### 5.5 Summary and Conclusions

An automated ranking method for sensors associated with failure is a viable approach to support wafer fabrication verification. The process entails the identification of sensors that are potentially linked to the failure and subsequently prioritizing them based on their significance or pertinence to the failure. In order to execute this methodology, it may be necessary to collect information from the sensors and scrutinize it to recognize patterns or tendencies that could be associated with the malfunction. One potential approach is to employ statistical analysis or machine learning methodologies to establish associations between sensor data and the occurrence of system malfunctions.

Upon ranking the sensors, it is plausible to utilize this data to ascertain the fundamental reason for the malfunction. In the event that a specific sensor proves a consistent ranking as the most crucial sensor, it is plausible that it serves as the underlying factor for the malfunction. Alternately, it may be necessary to conduct additional research on the high-priority sensors in order to determine the underlying cause. Potential approaches to address this issue may include conducting a thorough analysis of the data obtained from the sensors, performing experimental investigations to validate hypotheses pertaining to the underlying cause, or seeking input from relevant experts or stakeholders to obtain supplementary insights.

This study presents an automated methodology utilizing machine learning to establish a reliable correlation between wafer manufacturing failures and the signal waveforms generated by the wafer testing sensors. The correlation rating holds significant value in identifying the primary factors contributing to production loss, as each output from the test sensor evaluates a unique aspect of the manufacturing process's quality. The methodology used in our approach produces results that are equivalent to those obtained through the application of human expertise. Nevertheless, it is deemed to be more efficient, namely in terms of the consistency of the ranking procedure and the mitigation of manual effort.

By utilising either the SVM classification accuracy or Davies-Bouldin metric on the retrieved features of the DTW-based method, we achieved sensor rankings that are comparable to those achieved by human experts. Based on the efficacy of sensor analysis, it can be concluded that there is a significant influence in terms of supporting IC fabrication.

### **Chapter 6**

### **General Conclusions**

The purpose of this study was to investigate the use of machine learning methods for the optimization of the analog IC verification. The necessity of these techniques arises from the considerable amount of human labour required for verifying analog ICs, which may be substantially decreased.

### 6.1 General Objectives and Results

By combining individual signals into larger clusters, the first two phases of this process are able to alleviate a portion of the challenge of improving the efficiency with which ICs are verified. This is particularly helpful for the process of verification since it means that engineers only need to visually evaluate a limited number of clusters whose signals are mainly identical, as opposed to visually reviewing thousands of signals. Moreover, in case of unexpected signals, an engineer can promptly identify the anomalies due to the depiction of the characteristic space. This is due to the fact that the outliers will exhibit a significant deviation from the average waveform of the remaining data points in the dataset. In the third stage, we enhanced the process of identifying the root cause of an issue, as opposed to solely acknowledging its existence, thereby elevating our approach to a higher level. The aforementioned factor holds substantial influence over post-silicon verification, in which numerous test sensors are present for every single chip/wafer, necessitating correlation to determine the root cause of a recurring issue in failure behavior.

#### O1. Reducing the time required to verify analog ICs

A significant amount of time is allocated to the verification process in the development of ICs, which has grown highly expensive as requirements have increased. Hence, there is a need for reducing the manual work involved in this process and speeding up the IC verification.

#### **O1.1.** Feature extraction algorithm for expected behaviours

In order accomplish this objective we developed an algorithm capable of op-

timally extracting characteristics of certain behaviours that happen regularly during IC verification. This topic has been extensively discussed in Chapter 3, Section 3.2.

#### **O1.2.** Signal invariance optimization method for simulation conditions

This objective is achieved by implementing a DCT-type compression and Bag-Of-Words approach. This objective is described in more detail in Chapter 3, Section 3.2.3.

#### **O1.3.** Clustering of signals containing similar events

The outcome of this objective was clustering algorithms applied to the multidimensional space of features for the purpose of grouping similar signals. This step was described in Chapter 3, Section 3.2.5.

### **O2.** Enabling an increased and faster coverage of the process of IC verification Combinations of parameters must be checked to provide excellent coverage and a high quality standard, which creates a challenge for the verification process as the number of requirements increases.

**O2.1. Effective methods for the extraction of features for unusual behaviours** This goal was achieved by developing 2 machine learning algorithms capable of extracting useful features from any kind of waveform with the purpose of separating unusual behaviours that may be isolated. This was explained in more detail in Chapter 4, Section 4.2 and Section 4.3.

# **O2.2. Defining most suited clustering metrics for analog IC signals** This aim is achieved by conducting a study on many clustering metrics to select the best appropriate metric for IC verification applications. This was further discussed in Chapter 4, Section 4.4

### O2.3. Noise resistant optimization method for laboratory conditions

This objective is achieved by implementing a DBSCAN selection process with the purpose of removing both redundant information and measurementrelated noise. This objective is described in more detail in Chapter 4, Section 4.2.

### O3. Optimising the verification of wafer production

During the testing of post-silicon circuits, it is vital to determine the source of repeated unexpected behaviours in order to ensure error-free and optimal manufacturing.

### **O2.1.** Improving the Methodology for Testing Production Lines

This objective was accomplished by adapting the most effective machine learning method for extracting attributes from those we evaluated and applying it to a different database of wafers. This has been presented in Chapter 5, Section 5.3.

Machine Learning methods for supporting verification of analog integrated circuits

#### **O2.2.** Autonomous sensor ranking approach to support IC fabrication

This is achieved by implementing an automated method was accomplished by integrating classification methods on the metrics of clustering sensor results. Hence, achieving a ranking of sensors based on their correlation with a repetitive failure behaviour. This has been presented in Chapter 5, Section 5.3 and 5.4.

### 6.2 Original contributions

The following is a list of the primary contributions made by Chapter 3:

- Developing an algorithm capable of optimally extracting specific properties that appear regularly in the signals we examine during the verification process. We created an innovative descriptor for this method based on a computer vision technique that recognizes points of interest [GND<sup>+</sup>20].
- Providing a method for visualising feature space to optimise the verification phase where human oversight is required [GND<sup>+</sup>20].

The following is a list of the primary contributions made by Chapter 4:

- Developing of an unsupervised feature extraction technique that is based on dynamic time warping and is capable of accurately characterizing both typical and unusual occurrences. [GDB<sup>+</sup>22].
- Designing of an algorithm based on an autoencoder-type neural network that is optimized for the signals of interest and is capable of learning certain characteristics and producing a representation of the signals with very few coefficients. [GDB<sup>+</sup>23].

The following is a list of the primary contributions made by Chapter 5:

- Developing a machine learning feature extraction technique to enable its use in the context of production line testing methodology [GBD<sup>+</sup>22].
- Developing an automated technique by merging classification methods with the metrics of sensor clustering findings. Consequently, obtaining a rating of sensors based on their association with a recurrent failure behaviour. [GBD<sup>+</sup>22].

### 6.3 List of Original Publications

 [GND<sup>+</sup>20] Andrei Gaita, Georgian Nicolae, Emilian C. David, Andi Buzo, Corneliu Burileanu, and Georg Pelz. A sift-based waveform clustering method for aiding analog/mixed-signal ic verification. In 2020 IEEE European Test Symposium (ETS), pages 1–2, 2020, ISI WOS:000615974000037

- 2. [GDB<sup>+</sup>22] A. Gaita, E. David, A. Buzo, H. Cucu, and G. Pelz. Waveform clustering based on dynamic time warping used in analog ic verification. In *2022 International Symposium ELMAR*, pages 49–52, 2022, ISI WOS:000935062500011
- 3. [GBD<sup>+</sup>22] A. Gaita, A. Buzo, E. David, H. Cucu, and G. Pelz. A machine learning based wafer test ranking for root cause analysis. In *2022 International Symposium ELMAR*, pages 45–48, 2022, ISI WOS:000935062500010
- [GDB<sup>+</sup>23] A. Gaita, E. David, A. Buzo, M. Grigore, C. Burileanu, H. Cucu, and G. Pelz. Convolutional neural network model used for aiding ic analog/mixed signal verification. UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCI-ENCE, 85(2):151–162, 2023, ISI WOS:001015488500009

### 6.4 List of Technical Reports

- 1. [Gai19b] A. Gaita. Ic optimization methods based on machine learning. *Technical Report No. 1, University Politehnica of Bucharest*, June 2019
- [Gai19a] A. Gaita. Augmented ic analog signals verification with waveform clustering. *Technical Report No. 2, University Politehnica of Bucharest*, December 2019
- 3. [Gai20] A. Gaita. Assisted analog/mixed-signal integrated circuit verification using a dtw-based waveform clustering. *Technical Report No. 3, University Politehnica of Bucharest*, June 2020

### 6.5 Future Work

Prospective areas of research or improvements to the current work that could enhance the validation of analog ICs include the following subjects:

- A notable expansion in the databases that comprise annotated signals, specifically intended for the application of deep learning methodologies that require substantial amounts of data.
- Improving the existing visualization techniques for the feature space to highlight specific attributes for diverse use cases that may be of significance.
- Integration of the root cause algorithm within the verification framework, enabling the automated identification of the sensors with the highest correlation to repetitive failures.

### References

- [ANR74] N. Ahmed, T. Natarajan, and K.R. Rao. Discrete cosine transform. *IEEE Transactions on Computers*, C-23(1):90–93, 1974.
  - [AR08] Chang Wook Ahn and R. S. Ramakrishna. On the scalability of real-coded bayesian optimization algorithm. *IEEE Transactions on Evolutionary Computation*, 12(3):307–322, 2008.
  - [BY21] Adil Abdu Bushra and Gangman Yi. Comparative analysis review of pioneering dbscan and successive density-based clustering algorithms. *IEEE Access*, 9:87918–87935, 2021.
  - [CH74] T. Caliński and J Harabasz. A dendrite method for cluster analysis. *Communications in Statistics*, 3(1):1–27, 1974.
  - [CK07] Henry Chang and Ken Kundert. Verification of complex analog and rf ic designs. *Proceedings of the IEEE*, 95(3):622–639, 2007.
- [CTGK22] Animesh Basak Chowdhury, Benjamin Tan, Siddharth Garg, and Ramesh Karri. Robust deep learning for ic test problems. *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, 41(1):183–195, 2022.
  - [Den20] Dingsheng Deng. Dbscan clustering algorithm based on density. In 2020 7th International Forum on Electrical Engineering and Automation (IFEEA), pages 949–953, 2020.
  - [DS17] Liliya Demidova and Yulia Sokolova. Two-level intellectual classifier based on the svm algorithm. In 2017 6th Mediterranean Conference on Embedded Computing (MECO), pages 1–4, 2017.
- [dSCF<sup>+</sup>12] Marcilio C.P. de Souto, André L.V. Coelho, Katti Faceli, Tiemi C. Sakata, Viviane Bonadia, and Ivan G. Costa. A comparison of external clustering evaluation indices in the context of imbalanced data sets. In 2012 Brazilian Symposium on Neural Networks, pages 49–54, 2012.
  - [DYL08] Hong Duan, Ruohe Yan, and Kunhui Lin. Research on face recognition based on pca. In 2008 International Seminar on Future Information Technology and Management Engineering, pages 29–32, 2008.
  - [FIPG18] Andrew Christian Flores, Rogelyn I. Icoy, Christine F. Peña, and Ken D. Gorro. An evaluation of svm and naive bayes with smote on sentiment analysis data set. In 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST), pages 1–4, 2018.
    - [FM83] E. B. Fowlkes and C. L. Mallows. A method for comparing two hierarchical clusterings. *Journal of the American Statistical Association*, 78(383):553– 569, 1983.

- [Gai19a] A. Gaita. Augmented ic analog signals verification with waveform clustering. *Technical Report No. 2, University Politehnica of Bucharest*, December 2019.
- [Gai19b] A. Gaita. Ic optimization methods based on machine learning. *Technical Report No. 1, University Politehnica of Bucharest*, June 2019.
- [Gai20] A. Gaita. Assisted analog/mixed-signal integrated circuit verification using a dtw-based waveform clustering. *Technical Report No. 3, University Politehnica of Bucharest*, June 2020.
- [GBD<sup>+</sup>22] A. Gaita, A. Buzo, E. David, H. Cucu, and G. Pelz. A machine learning based wafer test ranking for root cause analysis. In *2022 International Symposium ELMAR*, pages 45–48, 2022.
- [GDB<sup>+</sup>22] A. Gaita, E. David, A. Buzo, H. Cucu, and G. Pelz. Waveform clustering based on dynamic time warping used in analog ic verification. In 2022 *International Symposium ELMAR*, pages 49–52, 2022.
- [GDB<sup>+</sup>23] A. Gaita, E. David, A. Buzo, M. Grigore, C. Burileanu, H. Cucu, and G. Pelz. Convolutional neural network model used for aiding ic analog/mixed signal verification. UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGI-NEERING AND COMPUTER SCIENCE, 85(2):151–162, 2023.
  - [GK12] Bernhard C. Geiger and Gernot Kubin. Relative information loss in the pca. In 2012 IEEE Information Theory Workshop, pages 562–566, 2012.
- [GND<sup>+</sup>20] Andrei Gaita, Georgian Nicolae, Emilian C. David, Andi Buzo, Corneliu Burileanu, and Georg Pelz. A sift-based waveform clustering method for aiding analog/mixed-signal ic verification. In 2020 IEEE European Test Symposium (ETS), pages 1–2, 2020.
  - [GS19] Guruprasad and Kumara Shama. Design and verification of analog integrated circuits using free or open source eda tools. In 2019 International Conference on Communication and Electronics Systems (ICCES), pages 1–6, 2019.
- [GXGM19] Georges Gielen, Nektar Xama, Karthik Ganesan, and Subhasish Mitra. Review of methodologies for pre- and post-silicon analog verification in mixed-signal socs. In 2019 Design, Automation and Test in Europe Conference and Exhibition (DATE), pages 1006–1009, 2019.
  - [HPB19] Jae Yeol Hong, Seung Hwan Park, and Jun-Geol Baek. Segmented dynamic time warping based signal pattern classification. In 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC), pages 263–265, 2019.
  - [ISAS21] Khairul Nurmazianna Ismail, Ali Seman, and Khyrina Airin Fariza Abu Samah. A comparison between external and internal cluster validity indices. In 2021 IEEE 11th International Conference on System Engineering and Technology (ICSET), pages 229–233, 2021.
  - [JJJ<sup>+</sup>20] Chen Jie, Zhang Jiyue, Wu Junhui, Wu Yusheng, Si Huiping, and Lin Kaiyan. Review on the research of k-means clustering algorithm in big data. In 2020 IEEE 3rd International Conference on Electronics and Communication Engineering (ICECE), pages 107–111, 2020.

- [KC06] Ken Kundert and Henry Chang. Verification of complex analog integrated circuits. In *IEEE Custom Integrated Circuits Conference 2006*, pages 177–184, 2006.
- [KD16] Ergin Kılıç and Erdi Doğan. Real-time feature extraction from emg signals. In 2016 24th Signal Processing and Communication Application Conference (SIU), pages 113–116, 2016.
- [KHH18] Shoji Kido, Yasusi Hirano, and Noriaki Hashimoto. Detection and classification of lung abnormalities by use of convolutional neural network (cnn) and regions with cnn features (r-cnn). In 2018 International Workshop on Advanced Image Technology (IWAIT), pages 1–4, 2018.
- [KMH17] Ichwanul Muslim Karo Karo, Kiki MaulanaAdhinugraha, and Arief Fatchul Huda. A cluster validity for spatial clustering based on davies bouldin index and polygon dissimilarity function. In 2017 Second International Conference on Informatics and Computing (ICIC), pages 1–6, 2017.
- [KuRA<sup>+</sup>14] Kamran Khan, Saif ur Rehman, Kamran Aziz, Simon James Fong, Sababady Sarasvady, and Amrita Vishwa. Dbscan: Past, present and future. The Fifth International Conference on the Applications of Digital Information and Web Technologies (ICADIWT 2014), pages 232–238, 2014.
  - [KYY20] Naveen Kumar, Sanjay Kumar Yadav, and Divakar Singh Yadav. Similarity measure approaches applied in text document clustering for information retrieval. In 2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC), pages 88–92, 2020.
  - [LLX<sup>+</sup>10] Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, and Junjie Wu. Understanding of internal clustering validation measures. In 2010 IEEE International Conference on Data Mining, pages 911–916, 2010.
  - [LLX<sup>+</sup>13] Yanchi Liu, Zhongmou Li, Hui Xiong, Xuedong Gao, Junjie Wu, and Sen Wu. Understanding and enhancement of internal clustering validation measures. *IEEE Transactions on Cybernetics*, 43(3):982–994, 2013.
    - [LO12] Jonathan Lilly and Sofia Olhede. Generalized morse wavelets as a superfamily of analytic wavelets. *IEEE Transactions on Signal Processing*, 60, 03 2012.
    - [Low04] David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, Nov 2004.
- [MCC<sup>+</sup>15] Victor Maus, Gilberto Câmara, Ricardo Cartaxo, Fernando M. Ramos, Alber Sanchez, and Gilberto Q. Ribeiro. Open boundary dynamic time warping for satellite image time series classification. In 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pages 3349–3352, 2015.
- [MCC<sup>+</sup>22] Cristian Manolache, Alexandru Caranica, Horia Cucu, Andi Buzo, Cristian Diaconu, and Georg Pelz. Enhanced candidate selection algorithm for analog circuit verification. In 2022 International Semiconductor Conference (CAS), pages 137–140, 2022.

- [MGK<sup>+</sup>05] H. Morgenstern, G. Groos, H. Kohne, M. Stecher, W. John, and H. Reichl. Algorithm for the automatic verification of complex mixed-signal ics regarding esd-stress. In *Research in Microelectronics and Electronics*, 2005 *PhD*, volume 1, pages 213–216 vol.1, 2005.
  - [MR93] Andrzej Maćkiewicz and Waldemar Ratajczak. Principal components analysis (pca). *Computers and Geosciences*, 19(3):303–342, 1993.
  - [PSJ15] Sakshi Patel, Shivani Sihmar, and Aman Jatain. A study of hierarchical clustering algorithms. In 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom), pages 537–541, 2015.
- [RHW<sup>+</sup>16] Weijie Ren, Min Han, Jun Wang, Dan Wang, and Tieshan Li. Efficient feature extraction framework for eeg signals classification. In 2016 Seventh International Conference on Intelligent Control and Information Processing (ICICIP), pages 167–172, 2016.
  - [RJ18] K.V.S.N. Rama Rao and B. Manjula Josephine. Exploring the impact of optimal clusters on cluster purity. In 2018 3rd International Conference on Communication and Electronics Systems (ICCES), pages 754–757, 2018.
- [RMX<sup>+</sup>19] Anushree Ramanath, Saipreethi Muthusrinivasan, Yiqun Xie, Shashi Shekhar, and Bharathkumar Ramachandra. Ndvi versus cnn features in deep learning for land cover clasification of aerial images. In *IGARSS 2019* - 2019 IEEE International Geoscience and Remote Sensing Symposium, pages 6483–6486, 2019.
  - [RS01] Mark Robertson and Robert Stevenson. Dct quantization noise in compressed images. 1, 01 2001.
- [RZWD15] Narender Rana, Yunlin Zhang, Donald Wall, and Bachir Dirahoui. Predictive data analytics and machine learning enabling metrology and process control for advanced node ic fabrication. In 2015 26th Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC), pages 313– 319, 2015.
- [SMMS20] Akhilesh Kumar Singh, Shantanu Mittal, Prashant Malhotra, and Yash Vardhan Srivastava. Clustering evaluation by davies-bouldin index(dbi) in cereal data using k-means. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), pages 306–310, 2020.
  - [SN20] Ketan Rajshekhar Shahapure and Charles Nicholas. Cluster quality analysis using silhouette score. In 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pages 747–748, 2020.
  - [VK16] Ankit Vij and Padmavati Khandnor. Validity of internal cluster indices. In 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), pages 388–395, 2016.
  - [XW10] Rui Xu and Donald C. Wunsch. Clustering algorithms in biomedical research: A review. *IEEE Reviews in Biomedical Engineering*, 3:120–154, 2010.
- [YWCW21] Yueyi Yang, Lide Wang, Huang Chen, and Chong Wang. An end-to-end denoising autoencoder-based deep neural network approach for fault diagnosis of analog circuit. *Analog Integr. Circuits Signal Process.*, 107(3):605–616, June 2021.

[ZG08] Cai Zhili and Jiang Guiyan. Application of multiple svm classifier fusion technique in freeway automatic incident detection. In 2008 27th Chinese Control Conference, pages 581–585, 2008.