Lifeng Nai

1600 Amphitheatre Parkway · Mountain View, CA 94043
佴立峰 · nailifeng [at] gmail.com

I am currently an accelerator architect at Google, focusing on the architecture exploration for future Machine Learning accelerators (aka TPU). My research interests include Machine Learning acceleration, graph computing system & architecture, and emerging memory technologies. Before Google, I received my Ph.D. degree from Georgia Institute of Technology, where I worked in the HPArch lab under the advisement of Prof. Hyesoon Kim. I also worked in the MARS lab, advised by Dr. Hsien-Hsin S. Lee and co-advised by Dr. Bo Hong.


Publications

  • Thermal-Aware Processing-in-memory Instruction Offloading
    Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, Hyesoon Kim
    Journal of Parallel and Distributed Computing (JPDC), 2019

  • CODA: Enabling Co-location of Computation and Data for Near-Data Processing
    Hyojong Kim, Ramyad Hadidi, Lifeng Nai, Hyesoon Kim, Nuwan Jayasena, Yasuko Eckert, Onur Kayiran, Gabriel H. Loh
    ACM Transactions on Architecture and Code Optimization (TACO), 2018

  • CoolPIM: Thermal-Aware Source Throttling for Efficient PIM Instruction Offloading
    Lifeng Nai, Ramyad Hadidi, He Xiao, Hyojong Kim, Jaewoong Sim, Hyesoon Kim
    Proc. of the International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, British Columbia, Canada, May 2018

  • CAIRO: A Compiler-Assisted Technique for Enabling Instruction-Level Offloading of Processing-In-Memory
    Ramyad Hadidi, Lifeng Nai, Hyojong Kim, Hyesoon Kim
    ACM Transactions on Architecture and Code Optimization (TACO), 2017

  • SimProf: A Sampling Framework for Data Analytic Workloads
    Jen-Cheng Huang, Lifeng Nai, Pranith Kumar, Hyojong Kim, Hyesoon Kim
    Proc. of the International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, May 2017

  • GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks
    Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, Hyesoon Kim
    Proc. of the 23rd International Symposium on High Performance Computer Architecture (HPCA), Austin, TX, Feb. 2017
    [PDF] [Slides] [Lightning]

  • Exploring Big Graph Computing --- an Empirical Study from Architectural Perspective
    Lifeng Nai, Yinglong Xia, Ilie G. Tanase, Hyesoon Kim
    Journal of Parallel and Distributed Computing (JPDC), 2016
    [PDF]

  • Analyzing Consistency Issues In HMC Atomics
    Pranith Kumar, Lifeng Nai, Hyesoon Kim
    Proc. of International Symposium on Memory Systems (MEMSYS), Washington, DC, Oct. 2016
    [PDF]

  • LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms
    Alexandru Iosup, Tim Hageman, Wing Lung Ngaio, Stijin Heldens, Arnau Prat Perez, Thomas Manhardt, Mihai Capota, Narayanan Sundaram, Michael Anderson, Ilie G. Tanase, Yinglong Xia, Lifeng Nai, Peter Boncz
    Proc. of International Conference on Very Large Data Bases (VLDB), New Delhi, India, Sept. 2016
    [PDF]

  • GraphBIG: Understanding Graph Computing in the Context of Industrial Solutions
    Lifeng Nai, Yinglong Xia, Ilie G. Tanase, Hyesoon Kim, Ching-Yung Lin
    Proc. of International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Austin, TX, Nov. 2015
    [PDF] [SC15 Presentation] [Code Repository] [GraphBIG Doc] [GraphBIG Wiki]

  • Instruction Offloading with HMC 2.0 Standard - A Case Study for Graph Traversals
    Lifeng Nai, Hyesoon Kim
    Proc. of International Symposium on Memory Systems (MEMSYS), Washington, DC, Oct. 2015
    [PDF] [Slides]

  • Towards Balance-Affinity Tradeoff in Concurrent Subgraph Traversals
    Yinglong Xia, Lifeng Nai, Jui-Hsin Lai
    Proc. of IEEE International Parallel and Distributed Processing Symposium (IPDPS), Hyderabad, INDIA, May 2015
    [PDF]

  • Explore Efficient Data Organization for Large Scale Graph Analytics and Storage
    Yinglong Xia, Ilie G. Tanase, Lifeng Nai, Wei Tan, Yanbin Liu, Jason Crawford, Ching-Yung Lin
    Proc. of IEEE International Conference on Big Data (BigData), Washington, DC, Oct. 2014
    [PDF]

  • Concurrent Image Query Using Local Random Walk with Restart on Large Scale Graphs
    Yinglong Xia, Jui-Hsin Lai, Lifeng Nai, Ching-Yung Lin
    Proc. of Workshop on Multimedia Big Data Computing (MBDC) in conjunction to ICME, Chengdu, China, July 2014
    [PDF]

  • A Highly Efficient Runtime and Graph Library for Large Scale Graph Analytics
    Ilie G. Tanase, Yinglong Xia, Lifeng Nai, Wei Tan, Yanbin Liu, Jason Crawford, Ching-Yung Lin
    Proc. of Workshop on Graph Data management Experiences and Systems (GRADES) in conjunction to SIGMOD, Snowbird, UT, June 2014
    [PDF]

  • Cache-Conscious Graph Collaborative Filtering on Multisocket Multicore Systems
    Lifeng Nai, Yinglong Xia, Ching-Yung Lin, Bo Hong, Hsien-Hsin Lee
    Proc. of ACM International Conference on Computing Frontiers (CF), Cagliari, Italy, May 2014
    [PDF]

  • TBPoint: Reducing Simulation Time for Large Scale GPGPU Kernels
    Jen-Cheng Huang, Lifeng Nai, Hyesoon Kim, Hsien-Hsin Lee
    Proc. of IEEE International Parallel and Distributed Processing Symposium (IPDPS), Phoenix, Arizona, May 2014
    [PDF]

  • Reducing False Transactional Conflicts with Speculative Sub-blocking State - An Empirical Study for ASF Transactional Memory System
    Lifeng Nai, Hsien-Hsin Lee
    Proc. of IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Boston, MA US, May 2013
    [PDF]

  • Notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


    Patents

  • Graph-based Online Image Query
    US Patent App. 15/215,864

  • Efficient Property Graph Storage for Streaming/Multi-versioning Graphs
    US Patent App. 15/264,570

  • Trace/Trajectory Reconstruction via Wearable and/or Mobile Sensors for Indoor/Outdoor Location
    US Patent App. 15/263,314

  • Wearable Sensor based System for Person Identification
    US Patent 9,769,166

  • Remote Control System with Muscle Sensor and Alerting Sensor
    US Patent App. 15/286,528

  • A Differential Processing Mechanism for Spark-based Graph Computing
    (Filed @IBM), June 2015

  • A Controlling Method of Host Storage Device for Embedded Systems
    CN 200910116305.8

  • A New Video Decoding Method
    CN 200910116303.9

  • A New Embedded Storage Device Management Method for Multiple Hosts
    CN 200910116304.3

  • Experience

    Academia Activities

  • TPC member, IEEE International Conference on Big Data (BigData 2019)
  • ERC member, ACM International Conference on Supercomputing (ICS 2019)
  • TPC member, IEEE International Parallel and Distributed Processing Symposium (IPDPS 2019)
  • TPC member, International Conference on Contemporary Computing (IC3 2017/2018)
  • TPC member, International Workshop on Big Graph Processing (BGP 2017, in conjunction with ICDCS 2017)
  • TPC member, IEEE International Conference on Parallel and Distributed Systems (ICPADS 2014)
  • Performance Engineer/Accelerator Architect

    Google
  • Explore and define future TPU chip and system architecture
  • Optimize ML infrastructure performance
  • Visiting Scholar/Research Intern

    IBM T. J. Watson Research Center
  • High performance large-scale graph computing system and database (IBM SystemG), 2013 - 2015

  • Education

    Georgia Institute of Technology

    Ph.D.
    Electrical and Computer Engineering - Computer Architecture

    Shanghai Jiao Tong University

    B.S. & M.S.
    Electronic Engineering