Abstract
Rapid progress in the CMOS technology for the past 25 years has increased the vulnerability of processors towards faults. Subsequently, focus of computer architects shifted towards designing fault-tolerance methods for processor architectures. Concurrently, chip designers encountered high order challenges for designing fault tolerant processor architectures. For processor cores, redundancy-based fault tolerance methods for fault detection at core level, micro-architectural level ,thread level , and software level are discussed. Similar applicable redundancy-based fault tolerance methods for cache memory, and hardware accelerators are presented in the article. Recent trends in fault tolerant quantum computing and quantum error correction are also discussed. The classification of state-of-the-art techniques is presented in the survey would help the researchers to organize their work on established lines.
- [1] Moore, G.E. 1998. Cramming more components onto integrated circuits. Proceedings of the IEEE 86, 1(1998), 82-85.Google Scholar
- [2] Moore, G.E. 2006. Lithography and the Future of Moore's Law. IEEE Solid-State Circuits Society Newsletter 11, 3 (2006), 37-42.Google Scholar
- [3] F Pollack. Pollack's rule of thumb for microprocessor and area. Retrieved 8 December 2023 from http://en.wikipedia.org/wiki/Pollack's_Rule.Google Scholar
- [4] Dennard, R.H., Gaensslen, F.H., Yu, H.N., Rideout, V.L., Bassous, E. and LeBlanc, A.R. 1974. Design of ion-implanted MOSFET's with very small physical dimensions. IEEE Journal of Solid-State Circuits 9, 5(1974), 256-268.Google Scholar
- [5] Tullsen, D.M., Eggers, S.J. and Levy, H.M. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd annual international symposium on Computer architecture, June 22-24, 1995, Santa Margherita Ligure, Italy, 392-403.Google Scholar
- [6] Xbit Labs.2002. Intel Pentium 4 3.06 GHz CPU with hyper-threading technology: Killing two birds with astone…, Available[online]:http://www.xbitlabs.com/articles/cpu/display/pentium4-3066.html.Google Scholar
- [7] Borkar, S. 2005. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro 25 ,6(2005),10-16.Google Scholar
- [8] Gizopoulos, D., Psarakis, M., Adve, S.V., Ramachandran, P., Hari, S.K.S., Sorin, D., Meixner, A., Biswas, A. and Vera, X. 2011. Architectures for online error detection and recovery in multicore processors. In 2011 IEEE Design, Automation & Test in Europe, March 14-18 , 2011, Grenoble, France, 1-6.Google Scholar
- [9] Ray, J., Hoe, J.C. and Falsafi, B. 2001. Dual use of superscalar Datapath for transient-fault detection and recovery. In Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture, December 1-5, 2001, Austin, TX, USA, 214-224.Google Scholar
- [10] Parashar, A., Gurumurthi, S. and Sivasubramaniam, A. 2004. A complexity-effective approach to Alu bandwidth enhancement for instruction-level temporal redundancy. In Proceedings. 31st Annual International Symposium on Computer Architecture, June 19-23, 2004, Munich, Germany, 376-386.Google Scholar
- [11] Nickel, J.B. and Somani, A.K. 2001. REESE: A method of soft error detection in microprocessors. In proceedings. IEEE International Conference on Dependable Systems and Networks , July 1-4, 2001, Gothenburg, Sweden, 401-410.Google Scholar
- [12] Gomaa, M.A. and Vijaykumar, T.N. 2005. Opportunistic transient-fault detection. In 32nd IEEE International Symposium on Computer Architecture , June 4-8, 2005, Madison, WI, USA, 172-183.Google Scholar
- [13] Shyam, S., Constantinides, K., Phadke, S., Bertacco, V. and Austin, T. 2006. Ultra low-cost defect protection for microprocessor pipelines. ACM SIGARCH Computer Architecture News 34, 5 (2006), 3-82.Google Scholar
- [14] Meixner, A., Bauer, M.E. and Sorin, D. 2007. Argus: Low-cost, comprehensive error detection in simple cores. In 40th Annual IEEE/ACM International Symposium on Microarchitecture, December 1 -5, 2007, Chicago, IL, USA, 210-222.Google Scholar
- [15] Hu, J.S., Link, G.M., John, J.K., Wang, S. and Ziavras, S.G. 2005. Resource-driven optimizations for transient-fault detecting superscalar microarchitectures. In 10th Asia-Pacific Conference on Advances in Computer Systems Architecture, October 24-26, 2005, Singapore, 200-214.Google Scholar
- [16] Soman, J., Miralaei, N., Mycroft, A. and Jones, T.M. 2015. REPAIR: Hard-error recovery via re-execution. In 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, October 12-14 , 2015, Amherst, MA, USA , 76-79.Google Scholar
- [17] Bernick, D., Bruckert, B., Vigna, P.D., Garcia, D., Jardine, R., Klecka, J. and Smullen, J. 2005. NonStop advanced architecture. In 2005 IEEE International Conference on Dependable Systems and Networks, June 28- July 01, 2005, Yokohama, Japan, 12-21.Google Scholar
- [18] Austin, T.M. 1999. DIVA: A reliable substrate for deep submicron microarchitecture design. In MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture, November 16-18, 1999, Haifa, Israel, 196-207.Google ScholarCross Ref
- [19] Purser, Z., Sundaramoorthy, K. and Rotenberg, E. 2000. A study of slipstream processors. In Proceedings of the 33rd annual ACM/IEEE International Symposium on Microarchitecture, December 10-13, 2000,Monterey, CA, USA, 269-280.Google Scholar
- [20] Rashid, M.W., Tan, E.J., Huang, M.C. and Albonesi, D.H. 2005. Exploiting coarse-grain verification parallelism for power-efficient fault tolerance. In 14th International Conference on Parallel Architectures and Compilation Techniques, September 17 – 21, 2005, St. Louis, MO, USA, 315-325.Google Scholar
- [21] Li, H.T., Chou, C.Y., Hsieh, Y.T., Chu, W.C. and Wu, A.Y. 2017. Variation-aware reliable many-core system design by exploiting inherent core redundancy. IEEE Transactions on Very Large-Scale Integration (VLSI) Systems 25, 10(2017), 2803-2816.Google Scholar
- [22] Iturbe, X., Venu, B., Penton, J. and Ozer, E.2017. A" high resilience" mode to minimize soft error vulnerabilities in ARM cortex-R CPU pipelines: work-in-progress. In Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion , October 15 -20, 2017, Seoul, Korea ,1-2.Google Scholar
- [23] Ainsworth, S. and Jones, T.M.2018. Parallel error detection using heterogeneous cores. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 25-28, 2018, Luxembourg, Luxembourg, 338-349.Google Scholar
- [24] Spainhower, L. and Gregg, T.A. 1999. IBM S/390 parallel enterprise server G5 fault tolerance: A historical perspective. IBM Journal of Research and Development 43, 5.6(1999), 863-873.Google Scholar
- [25] Rotenberg, E. 1999. AR-SMT: A Microarchitectural approach to fault tolerance in microprocessors. In IEEE Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No. 99CB36352), June 15-18, 1999, Madison, WI, USA, 84-91.Google ScholarCross Ref
- [26] Reinhardt, S.K. and Mukherjee, S.S.2000. Transient fault detection via simultaneous multithreading. In Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No. RS00201), June 12-14, 2000, Vancouver, BC, Canada, 25-36.Google Scholar
- [27] Vijaykumar, T.N., Pomeranz, I. and Cheng, K. 2002. Transient-fault recovery using simultaneous multithreading. In Proceedings 29th Annual International Symposium on Computer Architecture, May 25-29, 2002, Anchorage, AK, USA,87-98.Google Scholar
- [28] Gomaa, M., Scarbrough, C., Vijaykumar, T.N. and Pomeranz, I. 2003. Transient-fault recovery for chip multiprocessors. In Proceedings 30th Annual International Symposium on Computer Architecture, June 9 -11, 2003, San Diego, CA, USA, 98-109.Google Scholar
- [29] Huang, B., Sass, R., Debardeleben, N. and Blanchard, S. 2014. Harnessing unreliable cores in heterogeneous architecture: The PyDac programming model and runtime. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 23 -26, 2014, Atlanta, GA, USA ,744-749.Google Scholar
- [30] APPLE A12X BIONIC details. Retrieved from:<https:// www. apple.com /iPhone/ iPhone XS>[1 June 2018].Google Scholar
- [31] Kanawati, G.A., Nair, V.S., Krishnamurthy, N. and Abraham, J.A. 1996. Evaluation of integrated system-level checks for on-line error detection. In Proceedings of IEEE International Computer Performance and Dependability Symposium, September 4-6, 1996, Urbana-Champaign, IL, USA, 292-301.Google ScholarCross Ref
- [32] Reis, G.A., Chang, J., Vachharajani, N., Rangan, R. and August, D.I. 2005. SWIFT: Software implemented fault tolerance. In International symposium on Code generation and optimization, March 20-23, 2005, San Jose, CA, USA, 243-254.Google ScholarDigital Library
- [33] Z. Liu, Z. Zhang, R. Xi, P. Zhu and B. Ma.2021. SoK: A Survey on Redundant Execution Technology, International Conference on Advanced Computing and Endogenous Security, April 21-22, 2022, Nanjing, China, pp. 1-14.Google Scholar
- [34] Q. Shi and O. Khan.2013.Toward Holistic Soft-Error-Resilient Shared-Memory Multicores. Computer 46,10 (2013), 56-64.Google ScholarDigital Library
- [35] Venkatesha S, and Parthasarathi R.2023. Design of Low-Cost Reliable and Fault-Tolerant 32-Bit One Instruction Core for Multi-Core Systems. Quality Control - An Anthology of Cases. IntechOpen, England. http://dx.doi.org/10.5772/intechopen.102823Google ScholarCross Ref
- [36] Zhang, Y., Lee, J.W., Johnson, N.P. and August, D.I. 2010. DAFT: Decoupled acyclic fault tolerance. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, September 11-15, 2010, Vienna, Austria, 87-98).Google ScholarDigital Library
- [37] Liu, Q., Jung, C., Lee, D. and Tiwari, D. 2016. Compiler-directed soft error detection and recovery to avoid DUE and SDC via Tail-DMR. ACM Transactions on Embedded Computing Systems (TECS) 16, 2(2016),1-26.Google Scholar
- [38] Upasani, G., Vera, X. and González, A. 2014. Avoiding core's due & sdc via acoustic wave detectors and tailored error containment and recovery. ACM SIGARCH Computer Architecture News 42, 3 (2014), 37-48.Google Scholar
- [39] Mahmoud, A., Venkatagiri, R., Ahmed, K., Misailovic, S., Marinov, D., Fletcher, C.W. and Adve, S.V. 2019. Minotaur: Adapting software testing techniques for hardware errors. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 13-17, 2019, Providence, RI, USA, 1087-1103Google Scholar
- [40] Sorin, D.J., Martin, M.M., Hill, M.D. and Wood, D.A. 2002. SafetyNet: Improving the availability of shared memory multiprocessors with global checkpoint/recovery. In Proceedings 29th Annual International Symposium on Computer Architecture, May 25-29, 2002, Anchorage, AK, USA,123-134.Google Scholar
- [41] Prvulovic, M., Zhang, Z. and Torrellas, J. 2002. ReVive: Cost-effective architectural support for rollback recovery in shared-memory multiprocessors. ACM SIGARCH Computer Architecture News 30, 2(2002),111-122.Google Scholar
- [42] Nakano, J., Montesinos, P., Gharachorloo, K. and Torrellas, J. 2006. ReViveI/O: Efficient handling of I/O in highly-available rollback-recovery servers. In The Twelfth International Symposium on High-Performance Computer Architecture, February 11-15, 2006, Austin, TX, USA, 200-211.Google Scholar
- [43] Doudalis, I. and Prvulovic, M. 2012. Euripus: A flexible unified hardware memory checkpointing accelerator for bidirectional-debugging and reliability. In 2012 39th Annual International Symposium on Computer Architecture, June 9-13, 2012, Portland, OR, USA, 261-272.Google Scholar
- [44] Agarwal, R., Garg, P. and Torrellas, J. 2011. Rebound: scalable checkpointing for coherent shared memory. In Proceedings of the 38th annual international symposium on Computer architecture, June 4 -8, 2011, San Jose, CA, USA , 53-164.Google Scholar
- [45] Sarangi, S.R., Greskamp, B. and Torrellas, J. 2006. Cadre: Cycle-accurate deterministic replay for hardware debugging. In International Conference on Dependable Systems and Networks, June 25 - 28, 2006 , Philadelphia, PA, USA, 301-312.Google Scholar
- [46] X W. Bartlett and B. Ball.1998. Tandems approach to fault tolerance. Tandem Systems 4, 1(1998), 84-95.Google Scholar
- [47] Fair, M.L., Conklin, C.R., Swaney, S.B., Meaney, P.J., Clarke, W.J., Alves, L.C., Modi, I.N., Freier, F., Fischer, W. and Weber, N.E. 2004. Reliability, Availability, and Serviceability (RAS) of the IBM eServer z990. IBM Journal of Research and Development 48, 3.4(2004), 519-534.Google Scholar
- [48] Aggarwal, N., Ranganathan, P., Jouppi, N.P. and Smith, J.E. 2007. Configurable isolation: building high availability systems with commodity multi-core processors. ACM SIGARCH Computer Architecture News 35, 2(2007), 470-481.Google Scholar
- [49] Smolens, J.C., Gold, B.T., Kim, J., Falsafi, B., Hoe, J.C. and Nowatzyk, A.G. 2004. Fingerprinting: Bounding soft-error detection latency and bandwidth. ACM SIGOPS Operating Systems Review 38, 5(2004), 224-234.Google ScholarDigital Library
- [50] Smolens, J.C., Gold, B.T., Falsafi, B. and Hoe, J.C. 2006. December. Reunion: Complexity-effective multicore redundancy. In 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture, December 09-13, 2006, Orlando, FL, USA, 223-234.Google Scholar
- [51] LaFrieda, C., Ipek, E., Martinez, J.F. and Manohar, R. 2007. Utilizing dynamically coupled cores to form a resilient chip multiprocessor. In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 25-28, 2007, Edinburgh, UK, 317-326.Google Scholar
- [52] Sundaramoorthy, K., Purser, Z. and Rotenberg, E. 2000. Slipstream processors: Improving both performance and fault tolerance. ACM SIGPLAN Notices 35, 11(2000), 257-268.Google Scholar
- [53] Subramanyan, P, Singh, V, Saluja, KK and Larsson, E. 2009. Power-Efficient Redundant Execution for Chip Multiprocessors. In Proceedings of IEEE 3rd workshop on Dependable and Secure Nano computing held in conjunction with IEEE DSN June 29 -July 2 , 2009, Lisbon, Portugal, 1-6.Google Scholar
- [54] Subramanyan, P, Singh, V, Saluja, KK and Larsson, E.2010. Energy-Efficient Redundant Execution for Chip Multiprocessors. In Proceedings of the twentieth ACM Great Lakes Symposium on VLSI, June 28 – July 1, 2010, Chicago, IL, USA, 143-146.Google Scholar
- [55] Subramanyan, P, Singh V, KK, Saluja & Larsson, E.2010. Energy-Efficient Fault Tolerance in Chip Multiprocessors Using Critical Value Forwarding. In Proceedings of IEEE International conference on Dependable Systems and Networks, June 28 – July 1, 2010, Chicago, IL, USA, 121 -130.Google ScholarCross Ref
- [56] Gopalakrishnan, S. and Singh, V. 2017. REMORA: a hybrid low-cost soft-error reliable fault tolerant architecture. In 2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), October 23-25, 2017, Cambridge, UK ,1-6.Google Scholar
- [57] Soman, J. and Jones, T.M. 2017. High performance fault tolerance through predictive instruction re-execution. In 2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), October 23-25, 2017, Cambridge, UK, 1-4 .Google Scholar
- [58] Ainsworth, S. and Jones, T.M.2018. Parallel error detection using heterogeneous cores. In 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 25-28,2018, Luxembourg, Luxembourg,338-349.Google Scholar
- [59] Smolens, J.C., Kim, J., Hoe, J.C. and Falsafi, B. 2004. Efficient resource sharing in concurrent error detecting superscalar microarchitectures. In 37th International Symposium on Microarchitecture (MICRO-37'04), December 4-8,2004, Portland, OR, USA, 257-268.Google Scholar
- [60] Vera, X., Abella, J., Carretero, J. and González, A. 2010. Selective replication: A lightweight technique for soft errors. ACM Transactions on Computer Systems (TOCS) 27, 4(2010), 1-30.Google Scholar
- [61] Mukherjee, S. 2011. Architecture design for soft errors, Morgan Kaufmann, Burlington, Massachusetts, USAGoogle ScholarDigital Library
- [62] Mukherjee, S.S., Kontz, M. and Reinhardt, S.K. 2002. Detailed design and evaluation of redundant multi-threading alternatives. In Proceedings 29th annual international symposium on computer architecture, May 25- 29,2002, Anchorage, AK, USA, 99-110. IEEE.Google Scholar
- [63] Parashar, A., Sivasubramaniam, A. and Gurumurthi, S.2006. SlicK: slice-based locality exploitation for efficient redundant multithreading. ACM SIGOPS Operating Systems Review 40, 5(2006), 95-105.Google Scholar
- [64] Kumar, S. and Aggarwal, A. 2008. Speculative instruction validation for performance-reliability trade-off. In 2008 IEEE 14th International Symposium on High Performance Computer Architecture, February 16-20 , 2018, Salt Lake City, UT, USA , 405-414.Google Scholar
- [65] Huang, B., Sass, R., Debardeleben, N. and Blanchard, S. 2014. Harnessing unreliable cores in heterogeneous architecture: The PyDac programming model and runtime. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks , June 23 -26, 2014, Atlanta, GA, USA, 744-749.Google Scholar
- [66] Schuette, M.A. and Shen, J.P.1987. Processor control flow monitoring using signatured instruction streams. IEEE Transactions on Computers 36, 3(1987), 264-276.Google Scholar
- [67] Namjoo, M.1982. Techniques for Concurrent Testing of VLSI Processor. In Proc. of the International Test Conference (ITC),1982, Philadelphia, PA, USA ,416-468.Google Scholar
- [68] Wilken, K. and Shen, J.P.1990. Continuous signature monitoring: low-cost concurrent detection of processor control errors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 9, 6(1990), 629-641.Google Scholar
- [69] Oh, N., Shirvani, P.P. and McCluskey, E.J.2002. Control-flow checking by software signatures. IEEE Transactions on Reliability 51, 1(2002),111-122.Google Scholar
- [70] Oh, N., Shirvani, P.P. and McCluskey, E.J.2002. Error detection by duplicated instructions in super-scalar processors. IEEE Transactions on Reliability 51, 1(2002), 63-75.Google Scholar
- [71] Reis, G.A., Chang, J., Vachharajani, N., Mukherjee, S.S., Rangan, R. and August, D.I.2005. Design and evaluation of hybrid fault-detection systems. In 32nd International Symposium on Computer Architecture (ISCA'05) June 4-8, ,2005, Madison, WI, USA, 148-159. IEEE.Google Scholar
- [72] Wang, C., Kim, H.S., Wu, Y. and Ying, V. 2007. Compiler-managed software-based redundant multi-threading for transient fault detection. In International Symposium on Code Generation and Optimization (CGO'07), March 11-14, 2007, San Jose, CA, USA 244-258.Google Scholar
- [73] Chang, J., Reis, G.A. and August, D.I. 2006. Automatic instruction-level software-only recovery. In International Conference on Dependable Systems and Networks (DSN'06),June 25-28, 2006, Philadelphia, PA, USA, 83-92.Google ScholarDigital Library
- [74] Liu, Q., Jung, C., Lee, D. and Tiwari, D. 2016. Compiler-directed soft error detection and recovery to avoid DUE and SDC via Tail-DMR. ACM Transactions on Embedded Computing Systems 16, 2(2016),1-26.Google Scholar
- [75] Mitropoulou, K., Porpodas, V. and Jones, T.M.2016. COMET: Communication-optimized multi-threaded error-detection technique. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, October 2-7, 2016, Pittsburgh, PA, USA,1-10.Google ScholarDigital Library
- [76] So, H., Didehban, M., Ko, Y., Shrivastava, A. and Lee, K. 2018. Expert: Effective and flexible error protection by redundant multithreading. In 2018 Design, Automation & Test in Europe Conference & Exhibition, March 19-23, 2018, Dresden, Germany,533-538.Google Scholar
- [77] So, H., Didehban, M., Shrivastava, A. and Lee, K.2019. A software-level redundant multithreading for soft/hard error detection and recovery. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE), March 25-29, 2019, Florence, Italy, 1559-1562.Google Scholar
- [78] Wu, H., Guo, R. and Hu, Y.2021. FERNANDO: A software transient fault tolerance approach for embedded systems based on redundant multi-threading. IEEE Access 9, 67154-67166.Google ScholarCross Ref
- [79] So, H., Didehban, M., Ko, Y., Shrivastava, A. and Lee, K. 2022. EXPERTISE: An Effective Software-level Redundant Multithreading Scheme against Hardware Faults. ACM Transactions on Architecture and Code Optimization 19, 4(2022), 1-26.Google ScholarDigital Library
- [80] Döbel, B., Härtig, H. and Engel, M. 2012. Operating system support for redundant multithreading. In Proceedings of the tenth ACM international conference on Embedded software, October 7 - 12, 2012, Tampere Finland, 83-92.Google ScholarDigital Library
- [81] Döbel, B. and Härtig, H. 2014. Can we put concurrency back into redundant multithreading? In Proceedings of the 14th International Conference on Embedded Software, October 12-17, 2014, Uttar Pradesh, India, 1-10.Google ScholarDigital Library
- [82] Hukerikar, S., Teranishi, K., Diniz, P.C. and Lucas, R.F. 2018. Redthreads: An interface for application-level fault detection/correction through adaptive redundant multithreading. International Journal of Parallel Programming 46, 225-251.Google ScholarDigital Library
- [83] Hukerikar, S. and Lucas, R.F. 2016. Rolex: Resilience-oriented language extensions for extreme-scale systems. The Journal of Supercomputing 72, 4662-4695.Google ScholarCross Ref
- [84] Hukerikar, S., Diniz, P.C., Lucas, R.F. and Teranishi, K. 2014. Opportunistic application-level fault detection through adaptive redundant multithreading. In 2014 International Conference on High Performance Computing & Simulation (HPCS), July 21-25, 2014, Bologna, Italy ,243-250.Google ScholarCross Ref
- [85] Chen, Y.S. and Chen, P.S. 2016. A software-based redundant execution programming model for transient fault detection and correction. In 2016 45th International Conference on Parallel Processing Workshops (ICPPW), August 16-19, 2016, Philadelphia, PA, USA, 66-71.Google ScholarCross Ref
- [86] Arslan, S. and Unsal, O.2021. Efficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading. The Journal of Supercomputing 77, 12(2021), 4130-14160.Google ScholarDigital Library
- [87] Gong, R., Dai, K. and Wang, Z. 2008. Transient fault recovery on chip multiprocessor based on dual core redundancy and context saving. In 2008 The 9th International Conference for Young Computer Scientists, November 18-21, 2008, Hunan, China,148-153.Google ScholarDigital Library
- [88] Rashid, M.W. and Huang, M.C. 2008. Supporting highly-decoupled thread-level redundancy for parallel programs. In 2008 IEEE 14th International Symposium on High Performance Computer Architecture, February 16-20, 2008, Salt Lake City, UT, USA, 393-404.Google ScholarCross Ref
- [89] Greskamp, B. and Torrellas, J. 2007. Paceline: Improving single-thread performance in nanoscale CMPs through core overclocking. In 16th International Conference on Parallel Architecture and Compilation Techniques, September 15-19, 2017, Brasov, Romania , 213-224.Google Scholar
- [90] Didehban, M. and Shrivastava, A. 2016. nZDC: A compiler technique for near zero silent data corruption. In Proceedings of the 53rd Annual Design Automation Conference, June 5-9, 2016, Austin, TX, USA,1-6.Google ScholarDigital Library
- [91] Didehban, M. and Shrivastava, A. 2018. A compiler technique for processor-wide protection from soft errors in multithreaded environments. IEEE Transactions on Reliability 67, 1(2018), 249-263.Google ScholarCross Ref
- [92] Didehban, M., So, H., Gali, P., Shrivastava, A. and Lee, K. 2024. Generic Soft Error Data and Control Flow Error Detection by Instruction Duplication. IEEE Transactions on Dependable and Secure Computing 21, 1(2024), 78-92.Google ScholarDigital Library
- [93] Mavaddat, F. and Parhami, B. 1988. URISC: the ultimate reduced instruction set computer. International Journal of Electrical Engineering Education, 25, 4(1988),327-334.Google ScholarCross Ref
- [94] Nürnberg, P.J., Wiil, U.K. and Hicks, D.L. 2003. A grand unified theory for structural computing. In International Symposium on Metainformatics, September 17-20, 2003, Graz, Austria, 1-16.Google ScholarCross Ref
- [95] Mazonka, O and Kolodin, A. 2011. A simple multi-processor computer based on subleq. arXiv: 1106.2593. Retrieved from https://arxiv.org/ftp/arxiv/papers/1106/1106.2593.pdfGoogle Scholar
- [96] Rajendiran A. 2012. Reliable computing with ultra-reduced instruction set co-processors, Proceedings of the forty-ninth Annual IEEE Design Automation Conference, June 03-07, 2012, San Francisco, CA, USA , 697-702.Google ScholarDigital Library
- [97] Ananthanarayan, S., Garg, S. and Patel, H.D. 2013. Low -cost permanent fault detection using ultra-reduced instruction set co-processors. In 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE), March 18-22, 2013, Grenoble, France, 933-938.Google Scholar
- [98] Shashikiran,Venkatesha and Ranjani, Parthasarathi, 2019. 32-Bit One Instruction Core: A Low-Cost, Reliable, and Fault-Tolerant Core for Multicore Systems. Journal of Testing and Evaluation 47, 6(2019), 3941–3962.Google ScholarCross Ref
- [99] Hennessy, J.L. and Patterson, D.A. 2011. Computer architecture: a quantitative approach. Elsevier.Google Scholar
- [100] Kalayappan, R. and Sarangi, S.R. 2013. A survey of checker architectures. ACM Computing Surveys (CSUR) 45, 4(2013), 1-34.Google ScholarDigital Library
- [101] Lee, H., Kim, J., Park, J. and Kang, S.2023. STRAIT: Self-Test and Self-Recovery for AI Accelerator. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 42, 9(2023), 3092-3104.Google ScholarDigital Library
- [102] Mittal, S., and Vetter, J.S. 2015. A survey of techniques for modeling and improving reliability of computing systems. IEEE Transactions on Parallel and Distributed Systems 27, 4(2015),1226-1238.Google ScholarDigital Library
- [103] Li, T., Ambrose, J.A., Ragel, R. and Parameswaran, S.2016. Processor design for soft errors: Challenges and state of the art. ACM Computing Surveys (CSUR) 49, 3(2016), 1-44.Google ScholarDigital Library
- [104] Alcaide, S., Kosmidis, L., Hernandez, C. and Abella, J.2021. Achieving Diverse Redundancy for GPU Kernels. IEEE Transactions on Emerging Topics in Computing 10, 2(2021), 618-634.Google Scholar
- [105] Oz, I. and Arslan, S.2019. A survey on multithreading alternatives for soft error fault tolerance. ACM Computing Surveys (CSUR) 52, 2(2019), 1-38.Google ScholarDigital Library
- [106] Mittal, S.2020. A survey on modeling and improving reliability of DNN algorithms and accelerators. Journal of Systems Architecture 104, C(2020),101689.Google Scholar
- [107] Kundu, S., Basu, K., Sadi, M., Titirsha, T., Song, S., Das, A. and Guin, U. 2021. Special session: Reliability analysis for ML/AI hardware. arXiv : 2103.12166. Retrieved from https://arxiv.org/abs/2103.12166.Google Scholar
- [108] Postman, J. and Chiang, P. 2012. A survey addressing on-chip interconnect: Energy and reliability considerations. International Scholarly Research Notices, 2012. https://doi.org/10.5402/2012/916259Google ScholarCross Ref
- [109] Koomey, J., Berard, S., Sanchez, M. and Wong, H.2010. Implications of historical trends in the electrical efficiency of computing. IEEE Annals of the History of Computing 33,3(2010),46-54.Google ScholarDigital Library
- [110] Horowitz, M. 2014. Computing's energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers, February 9-13, 2014, San Francisco, CA, USA, 10-14.Google ScholarCross Ref
- [111] Jeyapaul, R., Hong, F., Rhisheekesan, A., Shrivastava, A. and Lee, K.2011. UnSync: A soft error resilient redundant multicore architecture. In 2011 International Conference on Parallel Processing, September 13-16, 2011, Taipei, Taiwan , 632-641.Google ScholarDigital Library
- [112] Venkatesha, S. and Parthasarathi, R.2022. One Shot System Based Reliability Modelling and Analysis for Low-Cost Fault-Tolerant Computing System Comprising of One Instruction Cores. In 2022 International Conference on Smart Generation Computing, Communication and Networking, December 23-25, 2022, Bangalore, India, 1-9.Google Scholar
- [113] M. W. Rashid, E. J. Tan, M. C. Huang, and D. H. Albonesi. 2005. Exploiting coarse-grain verification parallelism for power-efficient fault tolerance. In 14th International Conference on Parallel Architectures and Compilation Techniques., Sept. 17 – 21, 2005, St. Louis, MO, USA , 315-325 .Google Scholar
- [114] N. Madan and R. Balasubramonian. 2007. Power efficient approaches to redundant multithreading. IEEE Transactions on Parallel and Distributed Systems 18, 8 (2007), 1066–1079.Google ScholarDigital Library
- [115] A. Meixner and D. J. Sorin. 2007. Error Detection Using Dynamic Dataflow Verification. In Proc. of the Int'l Conf. on Parallel Architectures and Compilation Techniques, Brasov, September 15-19 , 2007,Romania ,104-118.Google Scholar
- [116] Zhang, W., Gurumurthi, S., Kandemir, M.T. and Sivasubramaniam, A. 2003. ICR: In-Cache Replication for Enhancing Data Cache Reliability. In Proceedings of International Conference on Dependable Systems and Networks, June 22-25, 2003, San Francisco, CA, USA, 291-300.Google Scholar
- [117] Zhang, W.2005. Replication cache: A small fully associative cache to improve data cache reliability. IEEE Transactions on Computers, 54, 12(2005), 1547-1555.Google ScholarDigital Library
- [118] Sugihara, M., Ishihara, T., & Murakami, K. 2007. Task scheduling for reliable cache architectures of multiprocessor systems. In 2007 Design, Automation & Test in Europe Conference & Exhibition, April 16-20,2007, Nice, France ,1-6.Google Scholar
- [119] Kim, S., 2006. Area-efficient error protection for caches. In Proceedings of the Design Automation & Test in Europe Conference, March 06-10, 2006, Munich, Germany,1-6.Google Scholar
- [120] Mukherjee, S.S., Emer, J., Fossum, T. and Reinhardt, S.K. 2004. Cache scrubbing in microprocessors: Myth or necessity? In 10th IEEE Pacific Rim International Symposium on Dependable Computing, March 3-5, 2004., Papeete, France, 37-42.Google Scholar
- [121] Saleh, A.M., Serrano, J.J. and Patel, J.H.1990. Reliability of scrubbing recovery-techniques for memory systems. IEEE transactions on reliability 39, 1(1990),114-122.Google Scholar
- [122] Sridharan, V., Asadi, H., Tahoori, M.B. and Kaeli, D. 2006. Reducing data cache susceptibility to soft errors. IEEE Transactions on Dependable and Secure Computing 3, 4(2006), 353-364.Google ScholarDigital Library
- [123] Li, L., Degalahal, V., Vijaykrishnan, N., Kandemir, M. and Irwin, M.J. 2004. Soft error and energy consumption interactions: A data cache perspective. In Proceedings of the 2004 international symposium on Low power electronics and design, August 11, 2004, Newport Beach, CA, USA, 132-137.Google ScholarDigital Library
- [124] Asadi, G.H., Sridharan, V., Tahoori, M.B. and Kaeli, D. 2005. Balancing performance and reliability in the memory hierarchy. In IEEE International Symposium on Performance Analysis of Systems and Software, March 20-22, 2005, Austin, TX, USA ,269-279.Google Scholar
- [125] Kadayif, I. and Kandemir, M. 2007. Modeling and improving data cache reliability. ACM SIGMETRICS Performance Evaluation Review 35, 1(2007),12.Google ScholarDigital Library
- [126] Cai, Y., Schmitz, M.T., Ejlali, A., Al-Hashimi, B.M. and Reddy, S.M. 2006. Cache size selection for performance, energy and reliability of time-constrained systems. In Proceedings of the 2006 Asia and South Pacific Design Automation Conference, January 24-27, 2006 ,Yokohama, Japan ,923-928.Google ScholarDigital Library
- [127] Jeyapaul, R. and Shrivastava, A. 2013. Enabling energy efficient reliability in embedded systems through smart cache cleaning. ACM Transactions on Design Automation of Electronic Systems 18, 4(2013), 1-25.Google ScholarDigital Library
- [128] A. Hashmi, H. Berry, O. Temam and M. Lipasti. 2011. Automatic abstraction and fault tolerance in cortical micro-architectures . 38th Annual International Symposium on Computer Architecture, June 4-8, 2011, San Jose, CA, USA, 1-10.Google Scholar
- [129] Azizimazreah, A., Gu, Y., Gu, X. and Chen, L. 2018. Tolerating soft errors in deep learning accelerators with reliable on-chip memory designs. In 2018 IEEE International Conference on Networking, Architecture and Storage, October 11-14, 2018, Chongqing, China , 1-10.Google Scholar
- [130] Libano, F., Wilson, B., Anderson, J., Wirthlin, M.J., Cazzaniga, C., Frost, C. and Rech, P.2018. Selective hardening for neural networks in FPGAs. IEEE Transactions on Nuclear Science 66, 1(2018), 216-222.Google ScholarCross Ref
- [131] Eldridge, S. and Joshi, A. 2015. Exploiting hidden layer modular redundancy for fault-tolerance in neural network accelerators. In Proc. Boston area ARChitecture (BARC) Workshop.Google Scholar
- [132] Mahdiani, H.R., Fakhraie, S.M. and Lucas, C.2012. Relaxed fault-tolerant hardware implementation of neural networks in the presence of multiple transient errors. IEEE transactions on neural networks and learning systems 23, 8 (2012), 1215-1228.Google Scholar
- [133] Dimitrov, M., Mantor, M. and Zhou, H. 2009. Understanding software approaches for GPGPU reliability. In Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, March 8, 2009, Washington D.C., USA ,94-104.Google ScholarDigital Library
- [134] Jeon, H. and Annavaram, M.2012. Warped-DMR: Light-weight error detection for GPGPU. In 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, December 01-05, Vancouver, BC, Canada, 37-47.Google Scholar
- [135] Wadden, J., Lyashevsky, A., Gurumurthi, S., Sridharan, V. and Skadron, K. 2014. Real-world design and evaluation of compiler-managed GPU redundant multithreading. ACM SIGARCH Computer Architecture News 42, 3(2014), 73-84.Google ScholarDigital Library
- [136] Gupta, M., Lowell, D., Kalamatianos, J., Raasch, S., Sridharan, V., Tullsen, D. and Gupta, R. 2017. Compiler techniques to reduce the synchronization overhead of gpu redundant multithreading. In Proceedings of the 54th Annual Design Automation Conference, June 18-22 , 2017, Austin, TX, USA, 1-6.Google Scholar
- [137] Schorn, C., Guntoro, A. and Ascheid, G. 2018. Accurate neuron resilience prediction for a flexible reliability management in neural network accelerators. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, March 19-23, 2018,979-984.Google Scholar
- [138] dos Santos, F.F., Draghetti, L., Weigel, L., Carro, L., Navaux, P. and Rech, P. 2017. Evaluation and mitigation of soft-errors in neural network-based object detection in three GPU architectures. In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), June 26-29, 2017, Denver, CO, USA, 169-176.Google ScholarCross Ref
- [139] Lunardi, C., Previlon, F., Kaeli, D. and Rech, P. 2018. On the efficacy of ECC and the benefits of FinFET transistor layout for GPU reliability. IEEE Transactions on Nuclear Science 65, 8(2018), 1843-1850.Google ScholarCross Ref
- [140] Omar, H., Shi, Q., Ahmad, M., Dogan, H. and Khan, O. 2018. Declarative resilience: A holistic soft-error resilient multicore architecture that trades off program accuracy for efficiency. ACM Transactions on Embedded Computing Systems (TECS) 17,4(2018), 1-27.Google ScholarDigital Library
- [141] Mahmoud, A., Hari, S.K.S., Sullivan, M.B., Tsai, T. and Keckler, S.W.2018. Optimizing software-directed instruction replication for gpu error detection. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, November 11-16, 2018, Dallas, TX, USA , 842-854.Google ScholarDigital Library
- [142] Kalra, C., Previlon, F., Rubin, N. and Kaeli, D. 2020. Armorall: Compiler-based resilience targeting GPU applications. ACM Transactions on Architecture and Code Optimization 17, 2(2020), 1-24.Google ScholarDigital Library
- [143] M. Lapedus. 2021. The Great Quantum Computing Race. Retrieved Aug. 6, 2022. From https://semiengineering.com/thegreat-quantum-computing-race/Google Scholar
- [144] E. Gibney. 2020. Quantum Computer Race Intensifies as Alternative Technology Gains Steam. Retrieved Aug. 6, 2022. From https://www.nature.com/articles/d41586-020-03237-wGoogle Scholar
- [145] E. Pednault, J. Gunnels, D. Maslov, and J. Gambetta. 2019. On quantum supremacy. IBM Research Blog 21.Google Scholar
- [146] Arute, F., Arya, K., Babbush, R., Bacon, D., Bardin, J.C., Barends, R., Biswas, R., Boixo, S., Brandao, F.G., Buell, D.A. and Burkett, B. 2019. Quantum supremacy using a programmable superconducting processor. Nature 574, 7779 (2019), 505-510. https://doi.org/10.1038/s41586-019-1666-5Google ScholarCross Ref
- [147] Bobier, J.F., Langione, M., Tao, E. and Gourevitch, A. 2021. What happens when ‘if'turns to ‘when'in quantum computing? Boston Consulting Group.Google Scholar
- [148] A. Y. Kitaev. 1995. Quantum measurements and the Abelian stabilizer problem. arXiv: 9511026 Retrieved from https://arXiv.org/quant-ph/9511026.Google Scholar
- [149] M. A. Nielsen and I. Chuang.2002. Quantum computation and quantum information. Amer. J. Phys. 70, 5(2002), 558–559.Google ScholarCross Ref
- [150] Shor, P.W. 1994. Algorithms for quantum computation: discrete logarithms and factoring. In Proceedings 35th annual symposium on foundations of computer science, November 20-22 ,1994, Santa Fe, NM, USA ,124-134.Google ScholarDigital Library
- [151] Grover, L.K. 1996. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, May 22 - 24, 1996, Philadelphia, Pennsylvania, USA , 212-219.Google ScholarDigital Library
- [152] Wu, Y., Bao, W.S., Cao, S., Chen, F., Chen, M.C., Chen, X., Chung, T.H., Deng, H., Du, Y., Fan, D. and Gong, M. 2021. Strong quantum computational advantage using a superconducting quantum processor. Physical review letters 127, 18 (2021), 180501.Google Scholar
- [153]. Fellner, M., Messinger, A., Ender, K. and Lechner, W. 2022. Universal parity quantum computing. Physical Review Letters 129, 18(2022), 180503.Google ScholarCross Ref
- [154] Akhtar, M., Bonus, F., Lebrun-Gallagher, F.R., Johnson, N.I., Siegele-Brown, M., Hong, S., Hile, S.J., Kulmiya, S.A., Weidt, S. and Hensinger, W.K. 2023. A high-fidelity quantum matter-link between ion-trap microchip modules. Nature Communications 14, 1(2023),531. https://doi.org/10.1038/s41467-022-35285-3Google ScholarCross Ref
- [155] Kim, Y., Eddins, A., Anand, S., Wei, K.X., Van Den Berg, E., Rosenblatt, S., Nayfeh, H., Wu, Y., Zaletel, M., Temme, K. and Kandala, A. 2023. Evidence for the utility of quantum computing before fault tolerance. Nature 618, 7965(2023), 500-505. https://doi.org/10.1038/s41586-023-06096-3Google ScholarCross Ref
- [156] Wang, Y., Simsek, S., Gatterman, T.M., Gerber, J.A., Gilmore, K., Gresh, D., Hewitt, N., Horst, C.V., Matheny, M., Mengle, T. and Neyenhuis, B. 2023. Fault-Tolerant One-Bit Addition with the Smallest Interesting Colour Code. arXiv:2309.09893. Retrieved from https://arxiv.org/abs/2309.09893Google Scholar
- [157] Lechner, W., Hauke, P. and Zoller, P. 2015. A quantum annealing architecture with all-to-all connectivity from local interactions. Science advances 1, 9(2015), 1500838.Google Scholar
- [158] Lvovsky, A.I., Sanders, B.C. and Tittel, W. 2009. Optical quantum memory. Nature photonics 3, 12(2009), 706 -714.Google Scholar
- [159] Fu, X., Rol, M.A., Bultink, C.C., Van Someren, J., Khammassi, N., Ashraf, I., Vermeulen, R.F.L., De Sterke, J.C., Vlothuizen, W.J., Schouten, R.N. and Almudever, C.G. 2017. An experimental microarchitecture for a superconducting quantum processor. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, October 14 - 18, 2017,Cambridge Massachusetts, 813-825.Google ScholarDigital Library
- [160] Fu, X., Riesebos, L., Rol, M.A., Van Straten, J., Van Someren, J., Khammassi, N., Ashraf, I., Vermeulen, R.F.L., Newsum, V., Loh, K.K.L. and De Sterke, J.C. 2019. eQASM: An executable quantum instruction set architecture. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), February 16-20 , 2019,Washington, DC, USA , 224-237.Google ScholarCross Ref
- [161] IBM. 2018. IBMQ Backend Information. Retrieved from https://github.com/Qiskit/ibmq-device information. Accessed on 2018-11-01.Google Scholar
- [162] Caldwell, S.A., Didier, N., Ryan, C.A., Sete, E.A., Hudson, A., Karalekas, P., Manenti, R., da Silva, M.P., Sinclair, R., Acala, E. and Alidoust, N. 2018. Parametrically activated entangling gates using transmon qubits. Physical Review Applied 10, 3(2018), 034050.Google ScholarCross Ref
- [163] Debnath, S., Linke, N.M., Figgatt, C., Landsman, K.A., Wright, K. and Monroe, C. 2016. Demonstration of a small programmable quantum computer with atomic qubits. Nature 536, 7614 (2016), 63-66. https://doi.org/10.1038/nature18648Google ScholarCross Ref
- [164] Murali, P., Linke, N.M., Martonosi, M., Abhari, A.J., Nguyen, N.H. and Alderete, C.H. 2019. Full-stack, real-system quantum computer studies: Architectural comparisons and design insights. In Proceedings of the 46th International Symposium on Computer Architecture, June 22 - 26, 2019, Phoenix Arizona , USA, 527-540.Google ScholarDigital Library
- [165] IBM. 2018. IBM Qiskit. https://qiskit.org/. Accessed on 2018-08-05.Google Scholar
- [166] Rigetti. 2018. PyQuil. https://github.com/rigetticomputing/pyquil. Accessed on 2018-08-01.Google Scholar
- [167] Google. 2018. A Preview of Bristlecone, Google's New Quantum Processor. Retrieved August 05, 2018 from https://ai.googleblog.com/2018/03/a-preview-of-bristlecone-googles-new.html.Google Scholar
- [168] Suzuki, Y., Sugiyama, T., Arai, T., Liao, W., Inoue, K. and Tanimoto, T. 2022. Q3DE: A fault-tolerant quantum computer architecture for multi-bit burst errors by cosmic rays. In 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), October 01-05, 2022,Chicago, IL, USA ,1110-1125.Google ScholarDigital Library
- [169] McEwen, M., Faoro, L., Arya, K., Dunsworth, A., Huang, T., Kim, S., Burkett, B., Fowler, A., Arute, F., Bardin, J.C. and Bengtsson, A. 2022. Resolving catastrophic error bursts from cosmic rays in large arrays of superconducting qubits. Nature Physics 18, 1(2022), 107-111. https://doi.org/10.1038/s41567-021-01432-8Google ScholarCross Ref
- [170] Oliveira, D., Giusto, E., Dri, E., Casciola, N., Baheri, B., Guan, Q., Montrucchio, B. and Rech, P. 2022. Qufi: a quantum fault injector to measure the reliability of qubits and quantum circuits. In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 27- 30, 2022, Baltimore, MD, USA , 137-149.Google ScholarCross Ref
- [171] Google Quantum AI. 2021.Exponential suppression of bit or phase errors with cyclic error correction. Nature 595, 7876(2021), 383–387. https://doi.org/10.1038/s41586-021-03588-yGoogle ScholarCross Ref
- [172] Kukkonen, H., Rovamo, J., Tiippana, K. and Näsänen, R. 1993. Michelson contrast, RMS contrast and energy of various spatial stimuli at threshold. Vision research 33, 10(1993), 431-1436.Google Scholar
- [173] Peres, A. 1985. Reversible logic and quantum computers. Physical review A 32, 6(1985), 3266.Google Scholar
- [174] Shor, P.W. 1995. Scheme for reducing decoherence in quantum computer memory. Physical review A 52, 4(1995), R2493.Google Scholar
- [175] Gottesman, D. 1997. Stabilizer codes and quantum error correction. arXiv: 9705052. Retrieved from https://arxiv.org/abs/quant-ph/9705052Google Scholar
- [176] Cai, W., Ma, Y., Wang, W., Zou, C.L. and Sun, L. 2021. Bosonic quantum error correction codes in superconducting quantum circuits. Fundamental Research 1, 1(2021), 50-67.Google ScholarCross Ref
- [177] Litinski, D. 2019. A game of surface codes: Large-scale quantum computing with lattice surgery. Quantum 3, 128.Google ScholarCross Ref
- [178] Dennis, E., Kitaev, A., Landahl, A. and Preskill, J. 2002. Topological quantum memory. Journal of Mathematical Physics 43, 9(2002), 4452-4505.Google ScholarCross Ref
- [179] Bacon, D. 2006. Operator quantum error-correcting subsystems for self-correcting quantum memories. Physical Review A 73, 1(2006), 012340.Google ScholarCross Ref
- [180] Gottesman, D. 1996. Class of quantum error-correcting codes saturating the quantum Hamming bound. Physical Review A 54, 3 (1996), 1862.Google ScholarCross Ref
- [181] Steane, A., 1996. Multiple-particle interference and quantum error correction. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences 452, 1954(1996), 2551-2577.Google Scholar
- [182] Knill, E., Laflamme, R., Martinez, R. and Negrevergne, C.2001. Benchmarking quantum computers: the five-qubit error correcting code. Physical Review Letters 86, 25(2001), 5811.Google ScholarCross Ref
- [183] Shor, P.W.1995. Scheme for reducing decoherence in quantum computer memory. Physical review A 52, 4(1995), R2493.Google Scholar
- [184] Holmes, A., Jokar, M.R., Pasandi, G., Ding, Y., Pedram, M. and Chong, F.T. 2020. NISQ+: Boosting quantum computing power by approximating quantum error correction. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, May 30 – June 3, 2020, Virtual event , 556-569.Google Scholar
- [185] Ueno, Y., Kondo, M., Tanaka, M., Suzuki, Y. and Tabuchi, Y. 2021. Qecool: On-line quantum error correction with a superconducting decoder for surface code. In 2021 58th ACM/IEEE Design Automation Conference, December 5 - 9, 2021, San Francisco, CA, USA, 451-456.Google ScholarDigital Library
- [186] Das, P., Pattison, C.A., Manne, S., Carmean, D.M., Svore, K.M., Qureshi, M. and Delfosse, N. 2022. AFS: Accurate, fast, and scalable error-decoding for fault-tolerant quantum computers. In 2022 IEEE International Symposium on High-Performance Computer Architecture, April 02-06, 2022, Seoul, Korea, 259-273.Google Scholar
- [187] Ueno, Y., Kondo, M., Tanaka, M., Suzuki, Y. and Tabuchi, Y. 2022. QULATIS: A Quantum Error Correction Methodology toward Lattice Surgery. In 2022 IEEE International Symposium on High-Performance Computer Architecture, April 02-06, 2022, Seoul, Korea, 274-287.Google Scholar
- [188] Das, P., Locharla, A. and Jones, C. 2022. LILLIPUT: a lightweight low-latency lookup-table decoder for near-term Quantum error correction. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, February 28- March 04, 2022, Lausanne Switzerland, 541-553.Google ScholarDigital Library
- [189] Vittal, S., Das, P. and Qureshi, M. 2023. Astrea: Accurate Quantum Error-Decoding via Practical Minimum-Weight Perfect-Matching. In Proceedings of the 50th Annual International Symposium on Computer Architecture, June 17 - 21, 2023, Orlando, FL, USA, 1-16Google ScholarDigital Library
- [190] Ravi, G.S., Baker, J.M., Fayyazi, A., Lin, S.F., Javadi-Abhari, A., Pedram, M. and Chong, F.T. 2023. Better than worst-case decoding for quantum error correction. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 25 - 29, 2023, Vancouver, BC, Canada, 88-102.Google Scholar
- [191] Google Quantum AI. 2023. Suppressing quantum errors by scaling a surface code logical qubit. Nature 614, 7949 (2023), 676–681. https://doi.org/10.1038/s41586-022-05434-1 .Google ScholarCross Ref
- [192] Krinner, S., Lacroix, N., Remm, A., Di Paolo, A., Genois, E., Leroux, C., Hellings, C., Lazar, S., Swiadek, F., Herrmann, J. and Norris, G.J.2022. Realizing repeated quantum error correction in a distance-three surface code. Nature 605, 7911(2022), 669-674.Google Scholar
- [193] Vittal, S., Das, P. and Qureshi, M. 2023. ERASER: Towards Adaptive Leakage Suppression for Fault-Tolerant Quantum Computing. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, October 28– November 1 , 2023, Toronto, ON, Canada, 509-525.Google Scholar
- [194] Balkind, J., Lim, K., Schaffner, M., Gao, F., Chirkov, G., Li, A., Lavrov, A., Nguyen, T.M., Fu, Y., Zaruba, F. and Gulati, K. 2020. BYOC: a" bring your own core" framework for heterogeneous-ISA research. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, March 16 - 20, 2020, Lausanne Switzerland, 699-714.Google ScholarDigital Library
- [195] Foutris, N., Kotselidis, C. and Luján, M. 2019. Simulating Wear-out Effects of Asymmetric Multicores at the Architecture Level. In 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, October 02-04, 2019,Noordwijk, Netherlands,1-6.Google Scholar
- [196] Li, A., Ning, A. and Wentzlaff, D. 2023. Duet: Creating Harmony between Processors and Embedded FPGAs. In 2023 IEEE International Symposium on High-Performance Computer Architecture, Feb. 25 - March 1, 2023, Montreal, QC, Canada, 745-758.Google Scholar
- [197] Leng, J., Buyuktosunoglu, A., Bertran, R., Bose, P., Chen, Q., Guo, M. and Reddi, V.J.2020. Asymmetric resilience: Exploiting task-level idempotency for transient error recovery in accelerator-based systems. In 2020 IEEE International Symposium on High Performance Computer Architecture , Feb. 22 -26, 2020, San Diego, CA, USA, 44-57.Google Scholar
- [198] Papadimitriou, G. and Gizopoulos, D. 2023. Avgi: Microarchitecture-driven, fast, and accurate vulnerability assessment. In 2023 IEEE International Symposium on High-Performance Computer Architecture, February 25 – March 01, 2023 Montreal, QC, Canada, 935-948.Google Scholar
- [199] Papadimitriou, G. and Gizopoulos, D. 2021. Demystifying the system vulnerability stack: Transient fault effects across the layers. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, June 14-18, 2021, Valencia, Spain, 902-915.Google Scholar
- [200] Tyagi, A., Gan, Y., Liu, S., Yu, B., Whatmough, P. and Zhu, Y.2022. Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators. arXiv:2212.02649. Retrieved from https://arxiv.org/abs/2212.02649Google Scholar
- [201] Chatzidimitriou, A., Bodmann, P., Papadimitriou, G., Gizopoulos, D. and Rech, P. 2019. Demystifying soft error assessment strategies on arm CPUs: Microarchitectural fault injection vs. neutron beam experiments. In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 24 -27, Portland, OR, USA, 26-38.Google Scholar
- [202] Hussain, Z., Znati, T. and Melhem, R. 2020. Enhancing reliability-aware speedup modelling via replication. In 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, June 29 - July 02, 2020, Valencia, Spain , 528-539.Google ScholarCross Ref
- [203] Agiakatsikas, D., Papadimitriou, G., Karakostas, V., Gizopoulos, D., Psarakis, M., Belanger-Champagne, C. and Blackmore, E. 2023. Impact of Voltage Scaling on Soft Errors Susceptibility of Multicore Server CPUs. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, October 28 - November 1, 2023, Toronto, ON, Canada, 957-971.Google ScholarDigital Library
- [204] Papadimitriou, G and Gizopoulos, D.2023,.Silent Data Corruptions: Microarchitectural Perspectives in IEEE Transactions on Computers 72, 11(2023), 3072-3085.Google ScholarDigital Library
- [205] Zhang, Y. and Jung, C. 2022. Featherweight soft error resilience for GPUs. In 2022 55th IEEE/ACM International Symposium on Microarchitecture, October 1-5, 2022, Chicago, IL, USA, 245-262.Google ScholarDigital Library
- [206] Sullivan, M.B., Hari, S.K.S., Zimmer, B., Tsai, T. and Keckler, S.W.2018. SwapCodes: Error codes for hardware-software cooperative gpu pipeline error detection. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture, October 20-24, 2018, Fukuoka, Japan, 762-774.Google ScholarDigital Library
- [207] Raghunandana, K.K., BKSVL, V., Reorda, M.S. and Singh, V. 2023. TREFU: An Online Error Detecting and Correcting Fault Tolerant GPGPU Architecture. In 2023 IEEE 29th International Symposium on On-Line Testing and Robust System Design, July 03-05, 2023 Crete, Greece, 1-7.Google Scholar
- [208] Raghunandana, K.K., BKSVL, V., Reorda, M.S. and Singh, V. 2022. REFU: Redundant Execution with Idle Functional Units, Fault Tolerant GPGPU architecture. In 2022 IEEE Computer Society Annual Symposium on VLSI, July 04-06 , 2022 Nicosia, Cyprus, 394-397.Google Scholar
Recommendations
Reliability Measure of Hardware Redundancy Fault-Tolerant Digital Systems with Intermittent Faults
While significant results are available which allow estimation of reliability measure for systems with permanent faults, no generally applicable results are available for intermittent (transient) faults. Methods are presented here which allow ...
Susceptible Workload Evaluation and Protection using Selective Fault Tolerance
Low power fault tolerance design techniques trade reliability to reduce the area cost and the power overhead of integrated circuits by protecting only a subset of their workload or their most vulnerable parts. However, in the presence of faults not all ...
Fault Tolerance in Multiprocessor Systems Without Dedicated Redundancy
An algorithm called RAFT (recursive algorithm for fault tolerance) for achieving fault tolerance in multiprocessor systems is described. Through the use of a combination of dynamic space- and time- redundancy techniques, RAFT achieves fault tolerance in ...
Comments