BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Denver
X-LIC-LOCATION:America/Denver
BEGIN:DAYLIGHT
TZOFFSETFROM:-0700
TZOFFSETTO:-0600
TZNAME:MDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0600
TZOFFSETTO:-0700
TZNAME:MST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20200129T163600Z
LOCATION:607
DTSTART;TZID=America/Denver:20191118T121000
DTEND;TZID=America/Denver:20191118T123000
UID:submissions.supercomputing.org_SC19_sess124_ws_lasalss109@linklings.co
 m
SUMMARY:Toward Half-Precision Computation for Complex Matrices: A Case Stu
 dy for Mixed Precision Solvers on GPUs
DESCRIPTION:Workshop\n\nToward Half-Precision Computation for Complex Matr
 ices: A Case Study for Mixed Precision Solvers on GPUs\n\nAbdelfattah, Tom
 ov, Dongarra\n\nLow-precision computations are popular in machine learning
  and artificial intelligence (AI) applications. Hardware architectures, su
 ch as high-end GPUs, now support native 16-bit floating point arithmetic (
 i.e. half-precision). While  half-precision provides a natural 2x/4x speed
 ups against the performance of single/double precisions, modern GPUs are e
 quipped with hardware accelerators for even more FP16 performance. These a
 ccelerators, which are called tensor cores, have a theoretical peak perfor
 mance that is 8x/16x faster than FP32/FP64 performance, respectively. Such
  a high level of performance has encouraged researchers to harness the com
 pute power of the tensor cores outside AI applications. \n\nThis paper pre
 sents a mixed-precision dense linear solver (Ax = b) for complex matrices 
 using the tensor core units of the GPU. Unlike similar efforts that have d
 iscussed accelerating Ax=b using real FP16 arithmetic, this paper focuses 
 on complex precisions. The developed solution uses a ``half-complex'' prec
 ision to accelerate the solution of Ax=b while maintaining single-complex 
 precision accuracy. The proposed solver requires a matrix multiplication k
 ernel that can accept half-complex inputs. We discuss two possible designs
  for such a kernel, and integrate both of them into a mixed-precision LU f
 actorization. The other component of our solution is an iterative refineme
 nt solver, which recovers the single-complex accuracy using a precondition
 ed GMRES solver. Our experiments, which are conducted on a V100 GPU, show 
 that the mixed-precision solver can be up to 2.5x faster than a full singl
 e-complex precision solver.\n\nTag: Workshop Reg Pass, Algorithms, Scalabl
 e Computing\n\nRegistration Category: Workshop Reg Pass, Algorithms, Scala
 ble Computing
URL:https://sc19.supercomputing.org/presentation/?id=ws_lasalss109&sess=se
 ss124
END:VEVENT
END:VCALENDAR

