20

I am planning to build a very high end deep learning machine with as many Xeon or i7 CPU cores and as many Titan X GPU cards as possible on a single motherboard. So what is the maximum limit for this? Which motherboard will give the maximum leverage without getting saturated?

The single motherboard assumption is to minimize latency keeping in mind the broader goal of extracting maximum performance from the system.

Answers to the comments to make it very specific:

Standard form factor or proprietary?

Standard preferred but not mandatory

Is more GPU sockets or more CPU sockets the priority (eg. would you rather have 2 GPU/4 CPU, or 4 CPU/2 GPU)?

More GPU sockets is the priority.

How important is memory? Do you have a price limit? Do you have a topology requirement?

Not much importance, even 64GB is OK. No price limit. Should not be saturated.

What is required from the hardware:

Run as many CPU and GPU intensive individual and possibly containerized applications as possible without getting bottle-necked by memory or topology of the mother board. The motherboard may not be easily available so can also suggest whom to contact to get one. Supermicro seems to offer some solutions .

4
  • You would ideally be looking for a server motherboard, and probably something that is not easily available - especially if you want multiple CPUs
    – Rubydesic
    Commented Mar 29, 2016 at 11:42
  • In your place I would definitely wait for the new Pascal architecture based Titan GPUs. They are not that long-off (less then half a year AFAIK) and the promised boost in performance is very significant IMHO.
    – jaskij
    Commented Mar 29, 2016 at 14:15
  • S4S server motherboards with 11 PCI-E x8/x16 would seem to be what your looking for. Not many of them are standard form factor or easily available though. Even 2S motherboards are difficult to find. You might want to contact supermicro.
    – timuzhti
    Commented Mar 31, 2016 at 1:35
  • In case the answer is removed:Supermicro has a [motherboard model][1] that can hold up to 8 GPUs in x16 mode, I also found a [YouTube video][2] about it. [1]: supermicro.com/products/motherboard/Xeon/C600/X10DRG-O_-CPU.cfm [2]: youtube.com/watch?v=DhZJ66l82r8
    – hw101
    Commented Apr 4, 2016 at 10:35

3 Answers 3

1

nVidia has a desktop sized DevBox. Fair warning...it isn't cheap. You can order a fully configured, nVidia supported one for approximately $15,000. If you want to build one it's going to cost you about $8-9,000 (at least according to the math done in this article, I haven't verified the components are actually that price).

nVidia DevBox

The DevBox has the following specifications:

  • Four TITAN X GPUs with 12GB of memory per GPU
  • 64GB DDR4
  • Asus X99-E WS workstation class motherboard with 4-way PCI-E Gen3 x16 support
  • Core i7-5930K 6 Core 3.5GHz desktop processor
  • Three 3TB SATA 6Gb 3.5” Enterprise Hard Drive in RAID5
  • 512GB PCI-E M.2 SSD cache for RAID
  • 250GB SATA 6Gb Internal SSD
  • 1600W Power Supply Unit from premium suppliers including EVGA
  • Ubuntu 14.04
  • NVIDIA-qualified driver
  • NVIDIA® CUDA® Toolkit 7.0
  • NVIDIA® DIGITS™ SW
  • Caffe, Theano, Torch, BIDMach

This is a beast of a machine and the price point (both self built and preconfigured) show that.

1

It all depends on what you mean by "very high-end", I am not aware of any consumer applications that would utilize more than 2-3 GPUs, with a single user but if you want to run several of the most demanding applications there are at the same time for some reason, and therefore you need to take everything to the very max, here you go:

http://www.supermicro.com/products/system/4U/4028/SYS-4028GR-TRT.cfm

(This is the highest end server motherboard I could find, with support for the most [8] PCIE 3.0 GPUs.)

You did not make clear your requirements for GPUs, however, but since you are trying to build a "a very high-end" system, I am guessing you would want to go with 8 980TIs, of course you will not be able to run all the graphics cards in SLI (or crossfire if you decide to go with AMD) since neither of the two manufacturers have SLI bridges with support for more than 4 cards.

As for the CPU: https://www.amazon.com/gp/product/B01DX5O20W/ref=as_li_qf_sp_asin_il_tl?tag=linustechtips-20&ie=UTF8&camp=1789&creative=9325&creativeASIN=B01DX5O20W&linkCode=as2&linkId=AQM7FRJICIOEDYTX

Now, I would like to note that though I am not exactly sure what you will be using this coputer for, I think it is a total waste of money to build.a system like the one above since, unless there are many users, or a LOT OF very demanding applications being ran at the same time, there is no way you could use up all this power.

Instead I would recommend getting the parts linked below, which will still probably be an absolute overkill for your needs:

Mobo: https://www.amazon.com/gp/product/B00O1AXIHM/ref=as_li_qf_sp_asin_il_tl?tag=linustechtips-20&ie=UTF8&camp=1789&creative=9325&creativeASIN=B00O1AXIHM&linkCode=as2&linkId=EJA47R5CG7ZOFSSN

Case: http://m.newegg.com/Product/index?itemnumber=N82E16811129218&nm_mc=KNC-GoogleAdwords-Mobile&cm_mmc=KNC-GoogleAdwords-Mobile--pla--Cases+%28Computer+Cases+-+ATX+Form%29-_-N82E16811129218&gclid=CKrhi_2zvM4CFYcfhgodADABkg&gclsrc=aw.ds

Processors: https://www.amazon.com/gp/search/ref=as_li_qf_sp_sr_il_tl?tag=linustechtips-20&ie=UTF8&camp=1789&creative=9325&index=aps&keywords=Intel+5960X&linkCode=as2

Graphics Cards: http://www.vgastore.com/2021301/asus-gtxtitanx_12gd5-geforce-gtx-titan-x-12gb-384-bit-gddr5-pci-express-3-0-hdcp-ready-sli-support-video-card

PSU: http://m.newegg.com/Product/index?itemnumber=9SIA91N4DC5875&nm_mc=KNC-GoogleMKP-Mobile&cm_mmc=KNC-GoogleMKP-Mobile--pla--Power+Supplies-_-9SIA91N4DC5875&gclid=CIHOlvq0vM4CFZFZhgod5moDNQ&gclsrc=aw.ds

And any drives and cables of choice...

2
  • "you will not be able to run all the graphics cards in SLI" Since we are talking of deep learning, I think we dont't care about SLI/CrossFire here (I might be wrong). Actually, even when you are developping using CUDA for GPU computation, you can't use SLI advantages.
    – comicurus
    Commented Aug 18, 2016 at 14:18
  • True! I just thought I would mention it, so he will know, that he will not be able to enjoy all the advanteges of all the cards. Commented Aug 18, 2016 at 14:57
0

This is a standard form-factor solution with a non-standard GPU configuration designed to overcome standard case mounting limitations and the limitations inherent to card size:

The PC itself:
PCPartPicker part list: http://pcpartpicker.com/list/TPTFtJ
Price breakdown by merchant: http://pcpartpicker.com/list/TPTFtJ/by_merchant/

CPU: Intel Xeon E5-2699 V4 2.2GHz 22-Core OEM/Tray Processor 
CPU: Intel Xeon E5-2699 V4 2.2GHz 22-Core OEM/Tray Processor 
CPU Cooler: Dynatron R27 65.4 CFM Ball Bearing CPU Cooler  ($29.88 @ OutletPC) 
CPU Cooler: Dynatron R27 65.4 CFM Ball Bearing CPU Cooler  ($29.88 @ OutletPC) 
Motherboard: Asus Z10PE-D16 WS SSI EEB Dual-CPU LGA2011-3 Motherboard  ($484.99 @ SuperBiiz) 
Memory: Kingston ValueRAM 16GB (1 x 16GB) DDR4-2133 Memory  ($73.99 @ SuperBiiz) 
Memory: Kingston ValueRAM 16GB (1 x 16GB) DDR4-2133 Memory  ($73.99 @ SuperBiiz) 
Memory: Kingston ValueRAM 16GB (1 x 16GB) DDR4-2133 Memory  ($73.99 @ SuperBiiz) 
Memory: Kingston ValueRAM 16GB (1 x 16GB) DDR4-2133 Memory  ($73.99 @ SuperBiiz) 
Storage: Western Digital Red Pro 6TB 3.5" 7200RPM Internal Hard Drive  ($279.99 @ Amazon) 
Storage: Western Digital Red Pro 6TB 3.5" 7200RPM Internal Hard Drive  ($279.99 @ Amazon) 
Storage: Western Digital Red Pro 6TB 3.5" 7200RPM Internal Hard Drive  ($279.99 @ Amazon) 
Case: Silverstone FT04B-W ATX Full Tower Case  ($229.00 @ Amazon) 
Power Supply: EVGA SuperNOVA T2 750W 80+ Titanium Certified Fully-Modular ATX Power Supply  ($167.04 @ Newegg) 
Optical Drive: Lite-On iHDS118-04 DVD/CD Drive  ($13.99 @ Newegg) 
Case Fan: Delta Electronics FFB1212EH-F00 150.3 CFM  120mm Fan  ($32.81 @ Amazon) 
Case Fan: Delta Electronics FFB1212EH-F00 150.3 CFM  120mm Fan  ($32.81 @ Amazon) 
Case Fan: Delta Electronics FFB1212EH-F00 150.3 CFM  120mm Fan  ($32.81 @ Amazon) 
Case Fan: Delta Electronics FFB1212EH-F00 150.3 CFM  120mm Fan  ($32.81 @ Amazon) 
Fan Controller: Lamptron FC-FC5V2-B Fan Controller  ($61.99 @ Amazon) 
UPS: APC BR1500G UPS  ($163.70 @ Amazon) 
Total: $2447.64
Prices include shipping, taxes, and discounts when available
Generated by PCPartPicker 2016-07-12 17:51 EDT-0400

The GPU solution:

This setup depends on you not needing to saturate the PCI-E bus, much like in GPU bitcoin mining, where a 1x connection to each of the mining GPUs was often sufficient. Putting a splitter into each of the native PCI-E slots, you get 24 GPU connections at, I think, 1x PCI-E 2.0 each, for a total transfer rate (theoretical up+down) of 1000MB/s. However, there are only 40 lanes on each CPU, so unless the PCH on the motherboard takes care of that part of the issue (I'm not certain how PCH works), each GPU will get something more like 833 MB/s theoretical maximum bandwidth.

Blockquote (4000mbs 4x PCI-E 2.0 bandwidth / 4 gpus) * 0.8325 percent lane saturation Blockquote

There are probably other ways of doing this that would net you more CPUs and thus more lanes, and I do know they make GPU backplanes that might be of use here, but this was the configuration I was comfortable recommending and I do like that it fits nicely within the desire to keep things working on a standard format workstation motherboard.

The fans, coolers, RAM, storage, PSU are all just suggestions, not strict requirements - they are a common kind of configuration for builds like this that I have seen recommended before Note that this system, as configured, would be extremely loud and might have to run headless to make the most use of your GPU/PCI-E resources.

2
  • Would it be possible to replace the Titan X's with Tesla P100s?
    – SEJPM
    Commented Jul 13, 2016 at 18:25
  • 1
    Assuming the P100 is like most other Teslas, yes, with the caveat that you may need to use one slot to add a GPU so you have a video out. However I also know some of the newer Teslas have their own basic video out functionality, so that should work identically to the above solution.
    – Adam Wykes
    Commented Jul 13, 2016 at 18:28

Not the answer you're looking for? Browse other questions tagged or ask your own question.