Update-ULL (Ultra Low Latency) Architectures for Electronic Trading @ NYU Summer 2017

nyu

ULL (Ultra Low Latency) Architectures for Electronic Trading
NYU SPS Summer 2017 Session 9 sessions (approx 2 ½ hours each)  –– Ted Hruzd
Est: Tuesdays May 30, Jun 6-13-20-27, July 6-11-18-25, 6:15-8:45 pm

On-Line Registration to begin in February 2017 (approx cost $650-$675):

https://www.sps.nyu.edu/professional-pathways/topics.html

Course Objectives

Develop advanced skills in architecting electronic trading (ET) and market data applications for ultra low latency (ULL), for competitive advantage, and for positive ROI.  At end of course one will have developed expertise in end-end architecture of ET applications and infrastructure, including:

  •  Tick-2-Trade applications with single digit micro seconds, even with sub 1 micro seconds
  • How to architect for deterministic latencies even in times of volume spikes
  • Why ‘Meta-Speed’ (info how to used speed) is more important than pure speed
  • Proper use of multi-layer 48 port $18K ULL switches, FPGA’s, GPU’s, MicroWave technologies
  • Integration of FPGA’s and Intel cores via high speed caches, eventually FPGA’s and cores on same die (Intel-Altera current and upcoming enhancements)
  • When, how to architect market data order books and FIX engines in FPGA based NIC’s
  • Multi core, high speed cache Intel based servers
  • Linix 7.2 kernel and NIC tuning
  • Kernel bypass technologies including RDMA and LDMA
  • Leading FPGA based NIC(s) – from SolarFlare, ExaBlaze, Enyx
  • Single tier (or simplified spine-leaf); Ex: from Plexxi networks
  • Layer 1 network switches (Metamako & ExaBlaze)
  • SDN (Software Defined Networks) – when applicable for ULL trading applications
  • New binary FIX protocol for ULL order routing
  • ULL messaging middleware (29 West LBM/UME) and 60 East Tech AMPS
  • ULL software design (deep vectors ex Intel’s AVX-512 & multi threading – OpenMP, TBB)
  • Storage, including NVME Flash
  • Tools (some free) to attain performance optimization insights
  • Network appliances – detailed timings/analytics – network, market data, and order routing
  • Big Data and Event Stream processing, real time analytics for seeking alpha (trade opportunities)
  • Fundamentals of FPGA design and programming
  • Network performance analysis via WireShark and potentially also via Corvil
  • Programming trading algo’s via basic Python
  • Machine learning / neural networks for seeking alpha via basic R programming
  • ROI analysis

PreReq – (for most, expecting basic to intermediate expertise, unless noted)

  • Most important: at least 2 years working with electronic trading applications/infrastructures as Developer, SA, network admin/engineer, Architect, QA analyst, tech project mgr, operations engineer, manager, CTO, CIO, CEO, vendor or consultant providing technology to Wall Street IT,
  • TCP/IP, UDP, multicast (basic knowledge),
  • Linux OS and shell or scripting (ex bash, perl); at minimum basic familiarity of output and usefulness of core Linux commands such as sysctl –a, ethtool, ifconfig, top, ls, grep, awk, sed, and others listed later in this syllabus
  • Intel servers, cores, sockets, GHz clock speed, NUMA
  • Network routers, switches
  • 1 or more network protocols from BGP, OSPF, EIGRP, MPLS, IB
  • FIX protocol
  • Market Data, at minimum contents of equities consolidated feeds
  • Visio (will use for homework assignments)
  • Python (very basic will be fine – a 2 hour reading assignment will be arranged for beginners). We will use a text written for traders with zero programming experience that quickly trains them how to use small set of Python for creating trading algo’s
  • R programming (nice to have. Will use basics that one can learn in 1-2 hours),

Course Logistics

  • 8 or 9 sessions -, 2 ½ hours each ex: 6:30-9pm or 6:15-8:45 pm), start May 30 (Tue) or 31 (Wed), once a week
  • Tech book(s) to download to kindle
    • Architects of Electronic Trading, Spephanie Hammer, Wiley 2013
    • Ultimate Algorithmic Trading Systems ToolBox, George Pruitt, Wiley, 2016
    • (optional) Trading and Electronic Markets: What Investment Professionals Need to Know, Larry Harris, CFA, 2015
  • Multiple web site links to technical white papers and tech analyses -ex: nextplatform.com, http://intelligenttradingtechnology.com/, http://datamanagementreview.com, www.tabbforum.com,  www.tradersmagazine.com
  • Visio (some homework assignments)
  • Extensive use of white board by instructor and students. Sessions will present students with few infrastructures to architect per specific business success criteria
  • Grading:
    • 1/3 class participation in in-class architecture designs -white board sessions)
    • 1/3 quizzes / tests
    • 1/3 – Homework – visio, wireshark analysis, basic python algo programming

Session 1 – Tue May 30

ULL components: CoLo, switches, FPFA, servers, OS, networks, software & middleware, market data

  • Will present a Visio diagram with a co-lo ULL architecture that generates orders destined for trading venues, utilizing Layer 1 switching + FPGA’s for market data & order flow with a target of sub 1 micro second Tick-2-Trade (T2T) latencies. Latencies will be deterministic, even at peak loads, as long as switch and FPGA’s process at line speed
  • Will briefly present an alternative architecture utilizing a Single Tier network (Plexxi)
  • We will periodically revisit the co-lo ULL architecture throughout this course when we cover specific architecture components in depth (Algo Trading and/or SOR that feeds this architecture, use of FPGA, and Layer 1 switching)
  • Partial FPGA and non FPGA alternative architectures
  • Key advantages of FPGA (Ted’s A-Team Doc)
  • Why speed of processing (& ULL market data) still matter & will for next several years at least
  • Why Meta-Speed is more important than pure speed (reference Corvil-Tabb Doc)
    • Meta-Speed Deep Dive (10 minutes)
  • Why Layer 1 switches
  • Layer 1 switch with integrated cores and FPGA for risk checks
  • High speed real time analytics for seeking alpha (trade opportunities) & infrastructure analytics
  • Exchange (Trading Venue) connectivity
  • Layer 2/3 aggregation in new switch appliances
  • Some leading ULL vendors:
    • Metamako
    • Algo-Logic
    • Nova-Sparks, with Nova-Link product
    • Corvil
    • Intel / Lenovo
    • SolarFlare
    • ExaBlaze
    • Enyx
  • Role of Linux kernel tuning for ULL – use network-latency profile & common Linux best practices
  • Present some Linux configurations to critique (ex: no K bypass, same NIC for mkt data & order flow)
  • What electronic trading organizations will prosper in space of ULL ET now & in future? Which may very well fail, even disappear?  Why is role of ROI critical?  Difficulties of proper ROI analysis

lass Exercise (at end of class we will do this together)Given few server, Linux configurations with flaws, respond with measures to optimize performance & lower latencies

Session 2 – Tue June 6

Deep Dive into Red Hat Linux 7.2 low latency configuration & tuning, kernel bypass, PTP & NTP, then more details regarding ULL architectures from Class 1

Entertain review of last week + questions & discussion of assigned readings.

Next:

I will present/explain following best practices regarding Linux tuning in a way that will lead to some  white boarding designs.  To large extent the below is an outline of the Linux reading assignment.  **** Do we have access to NYU Linux server(s) for review of Linux configurations and to run some basic commands – else is Dev CoLo available? Other alternative – Ted will have same Linux and server configurations with flaws, open for optimization

  • Deep dive into Linux 7.2 network-latency configuration
    • Base config includes (perf over power saving):
      • Tcp_fastopen=3 (2 way handshake – encryption of cookie of client @ init, so reconnect is 2 way, using the cookie)
      • Enable Intel_pstat & min_perf_pct =100 (Ghz steady; disable fluctuations)
      • Disable THP (Transparent Huge Pages of 2 MB under K control)
      • Cpu_dma_latency
        • @ c_states, keep cores from sleeping; part of QoS
      • Busy_read 50 uSec (100 uSec for large# pkts) & busy_poll 50 uSec (skt poll recvQ of NIC, disable net interrupt); cores “active”
        • BUT — K bypass much better (discuss 3 methods of K bypass)
      • Numa_balance 0 (no auto NUMA mgt)
    • Disable unnecessary daemons and services (ex firewalld & iptables)
    • Max # ring buffer size
      • Dev driver drains buf via soft IRQ (other tasks not interr vs hard interr)
    • Set RFS (Recv Flow Steering)- increase CPU cache hits,forwards pkts to consuming app
    • TCP SACK- retrains only missed bytes)- tcp_sack+1
    • TCP Window scaling – up to 1 GB
    • Sysctl –w net.ipv4.tcp_low_latency=1
    • Timing and scheduling:
      • Sched_latency_ns (20 ms default; increase!!)
      • Sched_min_granularity (4 ms default; increase!)
        • Increase # procs, threads – formula may lower this 4 ms
      • Some applications may benefit from tickles kernel
        • (ex: small # procs, threads at no more than # cores)
      • Sched_migration_ns (default500 uS; increase!)
        • This pertains to period of “hot” cache, prevents pre task migration
      • Basic Linux and Server measures and utilities for performance analytics:
        • BIOS updates and tuning
        • Turbostat
        • Lstopo
        • Lscpu
        • Numactl
        • Numastat
        • Tuned
        • Tuned-admin network-latency configuration (set profile)
        • Isolcpus
        • Interrupt affinity or isolation
        • Irqbalance
        • Busy_poll
        • Check gamut of process (pid) info, much pertaining to performance in /proc/<pid>; for ex: files numa_maps, stat, syscall
        • Tuna – control processor and scheduler affinity
          • Options: Isolate sockets from user space, push to socket 0
        • VTune Amplifier 2016
          • CPU, GPU, threads, BW, cach, locks, spinTm, FxCalls, serial+Par Tm,
          • ID code section ID for parallelization; ex: TBB – more control over OpenMP
          • MPI analysis ex locks, MCDRAM analysis
        • Intel’s PCM (Performance Counter Monitor) – major enhamcements
          • Ex: times specific threads hit/miss L1-2-3 caches and measures cache times and impacys of misses; helps ID priority procs, threads for cache
        • Tx Profilers – Wily, VisualJVM, BeaWLS, valgrind, custom FREE– T/S, ESP correl, ML
        • Perf: perf top –g (functions)
          • Perf counters in hardware (cpu) , with kernel trace points (ex: cache miss, cpu-migration, softirq’s)
        • strace
        • Ftrace- uses the frysk engine to trace systemcalls
          • sycalls of procs and threads
          • Dynamic kernel fx trace, including latencies (ex: how long proc wakes/starts
          • /debug/tracing
          • Trace_clock
        • Dtrace for Linux:
          • Dynamic – cpu, fs, net resources by active procs, can be quite specific
          • Log of args /fx
          • Procs accessing specific files
          • # New processes with arguments
          • dtrace -n ‘proc:::exec-success { trace(curpsinfo->pr_psargs); }’

§  # Pages paged in by process§  dtrace -n ‘vminfo:::pgpgin { @pg[execname] = sum(arg0); }’§  # Syscall count by process§  dtrace -n ‘syscall:::entry { @num[pid,execname] = count(); }’      ….. specific syscall ct per process or thread§  also ‘canned’ scripts for processes with top tcp and udp traffic, ranking of processes by bandwidth o    SystemTap – ex:probe tcp.setsockopt.return§  Uses strace points for kernel and user probes§  Script thief.stp – interrupts by procs histogramo    dynamically instrumenting running production Linux kernel-based operating systems. System administrators can use SystemTap to extract, filter and summarize data in order to enable diagnosis of complex performance or functional problems.o    SysDig Tool – only syscalls, dump for post processing  scripting

 

  • Oprofile uses hw counters, tracks mem access and L2 cache, hw interrupts
    • Mpstat, vmstat, iostat, nicstat, free, top, netstat, ss [filter/script for analytics]
  • VM (Virtual Memory) and page flushes, optimize market data caches
    • Slab allocation= mem mgt for k objects, eliminates frag
  • Slow network connections and packet drops
  • Intro to NetPerf tool
  • NIC tuning
  • Kernel bypass, LDMA, RDMA
  • Kernel bypass with NIC vendors (SolarFlare, Mellanox, ExaBlaze,) – description how each work
    • SolarFlare OpenOnLoad sets up all socket calls in user space instead of kernel space, with dedicated socket connection & data handled in NIC memory
    • Mellanox VMA linked library to user space, also sets up user space calls to NIC; Connect-IB NIC allows non-contiguous memory transfers for app-app; RV offload – speeds up MC; MLNX OFED open fabric verbs for IB and Ethernet; PCIe switch & NVMe over Fabric; MPI offloads; 2 ports at 100 Gbps; IB & Ethernet connections < 600 ns latency
    • Enyx NICs: differs from SF and MX network stack in user space (can be CPU intensive)
      • Enyx places full TCP stack in hardware (FPGA); reduce jitter
    • Network appliances:
      • ExaBlaze Fusion
      • Metamako MetaApp
      • FixNetics ZeroLatency
    • Precision Timing – PTP and NTP
    • PTP Symmetricom Sync Server S300s – NTP & PTP, owned by Microsemi GM
    • GPS Satellite satisfies UTC Req.
    • MIFIF II and PTP (software (sw) + hardware(hw) critical for accuracy; Req: 100 uSec + UTC
      • Symmetricom PTP GM 6 ports +/- 4 ns, <25ns to UTC
      • GPS -> GM-Spectracom->B-Clock(Arista7150s-FPGA timing+NAT)->servers-PTP-sw with FPGA based NIC’s ex: Exablaze ExaNIC models) – or SolarFlare NIC’s with HW timestamps
        • Linuxptp – ptp4I & phc2sys (can act as B-Clk) sync PTP hw clock on client, including VLAN tagged interfaces and bonded interfaces to master (GM) but with kernel; Dmons can’t consume MC; K delivers pkt to bonded interface SF’s sfptpd does all in HW; can sync every SF adapter; ptpd – mult platforms but just sw.
          • Timemaster – on start, reads NTP & PTP time servers, starts daemons, can sync sys clock to all time servers in multiple PTP domains
        • Master-slave time sync (ex:
        • PTP Timing within 6 ns –
        • consider disable tickless kernel :  nohz=off (for accuracy) BUT test this and app impact
        • PTP in hardware best but costs; do ROI
        • If multiple interfaces in diff networks, set reverse FWD mode to loose mode
        • Cmd: ethtool –T <int> -verify timestamp —- (for hw)
        • “timemaster” reads config of PTP time source
        • Cmd: systemctl start timemaster
        • ExaNIC FPGA can be programmed for extra analytics; some base programs available
        • MC if sync msg from Master but UDP unicast delay msg from slave to Master
        • PTP assumptions:
          • Network path symmetry (hence switch, router, FW, OS impact this)
          • Master and slave accurately measure when at pt of send/receive
          • Every hop can reduce PTP accuracy
        • PTP options:
          • Each slave clock direct cables to master .. but complexity. Cost …
          • Dedicate PTP switch infrastructure; switch PTP aware & eliminate switch delay or act as PTP M B-Clk; do not mix traffic
          • In dedicated LAN, PTP thru switch L2 Bcast to PTP bridge (server as B-Clk & bonded interface mgr), sends MC to FW (–if no SF; FW has list MC groups, IGMPv3 config ), MC to PTP clients for Time Sync, best if clients have with SF <add PICTURE>
            • FW configured for IGMP3, has necessary config allowing PTP-Bridge & clients to join std PTP MC group 224.0.1.129
            • Sfptpd can work on bonded interfaces so PTP clients need specify mgt interface to get PTP TS (from PTP bridge)
          • Hardware time stamps at every point
        • More PTP details:
          • Slaves periodically send messages back to Master (sync)
          • Sfptpd à file or syslog; ptp4l à stdout
          • Offset: amt Slave Clk off from Master
          • Freq Adjustment: how much clock oscillator adjusts to run at same rate as Mstr
          • Path Delay: how long to Slv & VV
          • Metrics – collectd, applies RegEx
        • NTP – selects accurate Time servers from multiple (ex 3); polls 3 or more servers
          • Keep stratum levels to no more than 2
          • Keep 3 clock sources near for sync
          • Use switches with light of no queuing
          • Use “timekeeper” – transforms any server into a timing appliance
        • Class Exercise (we will do together in class)–Explain different approaches to kernel bypass of following: ExaBlaze, SolarFlare, Mellanox, Enyx).  Explain strengths and advantages of each; advise what specific electronic trading applications would best benefit from each.
          • Explain how following Linux Tuning options will will impact latencies
            • Swappiness =0;
            • Dirty-ratio 10;
            • Background- ratio 10
            • NIC interrupt coalescing (pre kernel-bypass)
            • Ring buffer increase
            • UCP receive buffer at 32 MB
            • Netdev-backup 1000000 (traf stored before TCP/IP proc; 1/core)
          • Explain what following commands produce for latency analysis:
            • ifconfig command
            • Netstat –s (send/recv Q’s)
            • Ss utility
          • Detail major benefits of VTune and DTrace and when you would use either

 HOMEWORK – complete prior week reading assignments, prepare for 30 minute quiz next class regarding:

  • Optimal Linux kernel tuning per application requirements (Red Hat Doc + class notes)
  • Benefits of FPGA’s and GPU’s (Text book + Algo-Logic doc)
  • How multi layer switches work (Metamako Doc)
  • Differences in tuning to Speed 1 (raw) vs Meta-Speed (Corvil & Tabb Doc)
  • **** IN CLASS — I will spend 15-20 minutes detailing what is most important from the above.

Session 3 – Tue June 13

Quizz then FPGA’s, MultiCast, Market Data

 QUIZZ (30 minutes)

 *** we will immediately review quiz over next 30 minutes

Remaining  1 ½ hours of class:

FPGA’s & Market Data

  • Hardware accelerated appliances for ULL and deterministic performance
  • Ted’s FPGA Hand-out: — Intro to FPGA’s including intro to FPGA design & programming (I/O blocks + Logic blocks, OpenCL for creating “kernels” + synchronization for parallelism )
  • Why performance tends to be very deterministic with FPGA’s & why deterministic performance (latencies) are critical for HFT and algo traders
  • Pitfalls of FPGA’s
  • FPGA’s vs GPU’s, Intel Phi (Intel Doc), and multi cores
  • Feeds in FPGA –architecture, performance, design, support
  • Switch crossbars or caches for fan out with TCP distribution
  • Ted’s MC hand-out
  • Multicast (MC) performance considerations
    • Turn on IGMP Snooping on Switch
      • Switch listens to IGMP conversations between hosts/routers; maps links that require MC streams; Routers periodically query; 1 member per MC group per subnet reports.
    • Clients issue IGMP join requests to MC groups
    • Routers solicit group member requests from direct connect hosts
    • PIM-SM (Sparse Mode …low % MC) requires a Rendezvous Point (RP) router
    • Routers in PIM domain provide mappings to RP (exchange info for other routers)
      • PIM domain: enable PIM on each router
      • Enable PIM sparse mode on each interface
    • After RP, forward to receivers down shared distribution tree
    • When receiver’s 1st hop router learns source, it sends join message directly to source
    • Protocol Independent Multicast (PIM) is used between the local and remote MC routers, to direct MC traffic from the MC server to many MC clients.
  • Message based appliances, including FPGA based
  • Direct feed normalization
  • Conflation to conserve bandwidth
  • NBBO
  • Levels 1 and 2 market data
  • Depth of book builds (in FPGA’s or new multi core servers)
  • Smart order routers
  • Doc regarding leading vendor solutions
  • Exablaze NICs and switches v Metamako switches for market data
  • ENYX FPGA NICs and Appliances for market data and order flow
  • Nova Sparks FPGA based market data ticker
  • Fixnetics Zero latency – multi thread risk checks in FPGA and order processing in parallel on a core(a)
  • Other products — Exegy, Algo logic, Redline, SR labs
  • Consolidated feed vendors Bloomberg and Thomson Reuters
  • Use of new Intel Technologies (hardware & software) for alpha seeking strategies
  • Class Ex – (1) white board sessions where students will design ULL market data and multi cast architectures, per specific business/application criteria. (2) Given a Visio of a large network but with only a few MC groups and subscribers, identify the likely path(s) to sources few.  Include choice of router as RP. 

HW – Mkt Data white paper.

Week 4 will have a visio assignments:

HOMEWORK – 2 Visio designs – 1 for a 1 uSec T2T, 2nd for more modest (includes internal alpha seeking) 10 uSec T2T

 Session 4 – Tue June 20

Review of Visio assignments, then  intro’s to Python for Algo Trading, FIX protocol, Wireshark (maybe Corvil), R-Neural Networks,

Quick intro Python – reference — Ultimate Algorithmic Trading Systems ToolBox, George Pruitt, Wiley, 2016

  • Python algo trading examples
  • Intro to Wireshark
  • Intro to FIX Protocol
  • Intro to Wireshark with FIX protocol “Plug-in”
  • TCP, UDP, multicast (MC), then analysis via WireShark, Corvil
  • Intro to R / Neural Networks

Last Hour of Class – FPGA and T2T 1 Usec  Deep Dive – slot open for FPGA expert  John Lockwood of Algo-Logic; if John NA then we will start with Class 5 content and hope John can join us

 HOMEWORK – additional Visio design TBD + basic Python algo trading program to code + reading assignments – Python, Wireshark, R-Neural Networks

 Session 5 – Tue June 27

Review of Visio  assignment, then  continue with Python for Algo Trading, Wireshark (maybe Corvil), R-Neural Networks, then cover latest ULL Intel Technologies, other server, memory, Flash devices, ULL messaging architectures, network protocols, SDN

 Programming with Multiple core multi thread, parallelism

  • Vectorize application code
  • Design – Internal loops with deep vector instructions, outer loops with parallelization (threads)
  • Servers, sockets, cores, caches. MCDRAM (Intel Phi)
  • Core speeds GHz vs more cores, larger and faster caches
  • Over clocked servers – features and what applications can benefit
  • Linux, Solaris, Windows, other ex SmartOS, Mesosphere DC OS
  • How to benchmark performance, analyze, tune
  • NUMA aware processes and threads
  • Optimize cache assignments per high priority threads
  • Intel technologies including …
  • AVX-512 deep vector instructions (speeds up FP ops)
    • 6-8 registers; more ops/instruction; less power
  • TBB thread Building blocks (limit oversubscription of threads)
    • OpenMP- explosion of threads
  • Omni-Path high speed / bandwidth interconnect (no HBA, fabric QoS, MTU to 10K, OFA verbs,105 ns thru switch ports, 50 GB/s bi ) & QPI
    • Uses Silicon Photonics (constant light beam, lower latencies and deterministic)
  • QuickPath: mult pairs serial links 25.6 GB/s (prior to Omni-Path)
    • Mem controllers integrated with microprocessors
    • Replaced legacy bus technology
    • Cache coherent
  • Shared Memory is faster than Mem Maps; allows multiple procs read/write into shared mem among the procs – without OS read/write commands. Procs just access the part of shared mem of interest.
    • Discuss ex of server proc sending HTML file to client; file is passed to mem then net function copies mem to OS mem; client calls OS function which copies to its own mem; contrast with Shared Mem
  • PCIE
  • C ++ vs Java for ULL
  • Lists vs vectors
  • Iterate lists
  • Role of FPGA, GPU, MicroWave networks for ULL
  • C/C++, Java, Python, CUDA, FPGA – OpenCL: programming design considerations
  • Java 8 new streams API and lambda expressions – for analytics
  • Class Ex – Explain how Quick Path & Omni Path both improve latencies and advise which is preferred for ULL and why

 

  • New age networks – Spine leaf to single tier
  • SDN (Software Defined Networks)
    • Cisco ACI + Tetration
    • Cloudistics
    • Plexxi
    • NSX
  • Pico – ULL SDN vendor
  • Options-IT – colo managed ULL infrastructure
  • Cisco and Arista switches for ULL
  • Cisco ACI and Cisco Tetration – Deep machine Learning to automatically optimize large networks
  • Switches with deep buffers, great for Big Data Analytics
  • Configure Routers for ULL – LLDP, MLAG, VRRP, VARP (active-active L3 gateway)
  • Network protocols – BGP, OSPF, HSRP
  • Arista 7124FX with  EOS
  • Plexxi switches – a disruptive technology – single tier
  • Plexxi optimal bandwidth via its SDN
  • Optimal VLANs configuration for analytics
    • Use trunks from 1 switch to another switch after defining a VLAN, or use router
  • VPLS (Virtual Private LAN Service) also for analytics
    • Enet based multiPt-multiPt over IP or MPLS
  • Decrease network hops for speed
    • (ex: Slim-Fly: Low diameter network architecture if not ready for Single Tier)
  • Network protocols:
    • EBGP: external, distance vector, via paths, network policies, rule sets, finite state machine; BGP peering of AS-AS
    • BGP-MP: multi protocol + IPv6, unicast & MC; use for MPLS-VPN
    • OSPF: interior within AS, link state routing with metrics of RTT, amt data thru specific links, and link reliability
    • MOSPF: uses group membership info from IGMP + OSPF DB, builds MC trees
    • EIGRP: OSPF + more criteria: latencies, effective BW, delays, MTU
    • MPLS: between network nodes, short path lables – avoid complex lookups in RTE table; multi protocol- includes ATM, Frame Relay, DSL
    • IB: hw, light wt, no pkt reordering; link level flow control, loss less, QoS virtual lanes 0-14, RDMA verbs (adds latency), UFM tool, 4000 byte MTU (adds latency as this MTU must fill before transmission)
    • OPA: Intel’s Omni Path Architecture: 100 Gbps, new 48 ports switch silicon, silicon photonics, no HBA, 50% decrease infra vs IB, 100-110 ns / port, congestion control – reroutes traffic, MTU up to 10K
    • FC: FiberChannel – optical pkts, units of 4 10-bit codes, 4 codes=TransWord; meta-data sets up link and Seq; tools-Agilent, CATC, Finisar,Xyratex; FC: similar to Enetpkt ex: mult frames assembled with src, dest
    • IGMP: used by hosts and adjacent routers on IPv4 networks to establish multicast group memberships
  • Next Gen Firewalls (ex: Fortinet)
    • 1 platform end-2-end, with multiple security related aspects including anti-virus, malware, intrusion detection, database and OS access controls, web filtering, web app security, user ID awareness, standard rules access, internal segmentation (into functional security zones, limits spread of malware & mischief, identifies mischief & quarantines infected devices; shares all info via its fabric to whole network; Zero Trust Policy – places the FW in network center, in front of data
    • Empow – Security Orchestration product
  • Kerberos
    • Client/server network authentication protocol via secret key cryptography, stronger than traditional firewalls as they focus on external threats whereas Kerberos focuses on internal
    • Each KDC has copy of Kerberos DB; Master KDC has copy of realm DB, which is replicated to slave KDC’s @ regular intervals; DB password changes are in the Master; slaves grant Kerberos ticket servers / services, time series critical & create ACLs too;  Kerberos daemons are started on the Master, assigns hostnames to Kerberos realms, ports, slaves
    • Opportunity for Docker container security:
      • Kerberos for access to multiple levels of container types (ex: checking account KYC vs withdrawal .. acct mgr vs authenticated client)
    • IPTables – may opt to disable for ULL, rely on external FW
      • Set up, inspect tables of IP packet filter rules; each table: built-in “chains” & user defined chains; chains list rules to match set of packets
      • Required for server “routers” NAT aware
        • Rte intercepts, determines NAT@
      • END- OPTIONAL
      • Class Exercise – Determine whether Single Tier Networks improve ULL versus Spine Leaf. If so explain why.  Several scenarios will be presented and students will architect networks on white boards.

 

Session 6 – Thu July 6 (1 ½ hour class, location TBD)

  • Special lab session: Python for algo trading, extra wireshark training, and extra R / neural networks

 

 

Session 7 – Tue July 11

Complete / Review  topics from last week; start with role of Big Data for alpha seeking and input into ULL trading algo’s

 

 

Middleware, Analytics, Machine Learning, leading to end-end ULL Architectures

 Analytics  & Machine Learning:  to seek alpha and for infrastructure analytics

  • Intro to Big Data Analytics & Machine Learning (focus on neural networks)
  • Role of Java 8 new streams API
    • Speeds up extracting insight from large collections via methods such as:
      • Filter, sort, max, map, flatmap, reduce, collect
    • Use with ArrayLists, HashMaps (does not replace them)
    • Stream is 1-time use object
  • Intro to Complex Event Processing (CEP) and Event Stream Processing (ESP)
  • Databases – Never in the path of ULL
  • Column based (contiguous memory) vs relational
  • KDB and OneTick – leading players in high speed market data tick databases
  • Event Stream Processing (ESP) – use ESP to seek alpha
  • Combine market data with News sentiment analytics to seek alpha,
  • Intro to Ravenpack news sentiment analytics
  • Intro to Spark
  • Role of new storage technology (ex NVMe Flash drives)
  • In-mem analytics ex HANA, Spark
  • Corvil – intro to how to configure Corvils and how to analyze FIX order flow with it
  • Machine learning, neural networks in R or Python – create equations to project latencies
  • Machine learning for Latency analysis, tuning insight, seeking alpha -trade opportunities
  • Programming for multi threaded trading risk analytics
  • Class Ex – output from Corvil streams will be provided. Students will analyze and determine how latencies can be projected using neural networks (design only – no programming)  — or we may do this together in class – Ted to attain sample data as input to neural networks for latency predictions + sample data for alpha seeking

 

HOMEWORK – prepare for 30 minute quiz start session 7 regarding Big Data analytics for ULL trading; key points will be stressed in class; also – 2 weeks to complete visio integrating alpha seeking opportunities to a ULL trading architecture – full details in class

 

 Session 8 – Tue July 18

Quiz then go over quiz; then cover high speed messaging & middleware, infrastructure ROI, and cloud technologies for ULL, + application specific details regarding ULL apps

 Middleware, High Speed Messaging

  • 60 West AMPS
  • 29 East LBM (UME)
  • New FIX Binary protocol in beta promises to lower latencies
  • Importance of High Speed messaging for Algo Trading
  • Intro to basic algo’s for trading equities (ex VWAP, Volume Participation, use of AVX and RSI)
  • How to back-test algo’s for trading
  • How to conduct ROI for new ULL architectures
  • Why traditional cloud architectures fall short for ULL
  • Cloud for analytics – pitfalls vs best practices
  • Micro services potential
  • Class Ex – output from application logs will be provided. Students will analyze and determine how AMPS can be configured for both high speed middleware and event stream processing for analytics

End-End ULL Architectures

  • Co-Lo with 500 ns order ack times (revisited with our new knowledge)
  • Dark pools
  • Algo Trading (servers, appliances, or FPGA’s) in the architecture
  • Smart Order routers
  • Prop Trading
  • Exchanges
  • Class Ex – VISIO or white boarding of a new trading system TBD, applying all learned in course

 

Session 9 – Tue July 25

Review visio assignment then cover future ULL architectures ; then cover high speed messaging & middleware, infrastructure ROI, and cloud technologies for ULL, + application specific details regarding ULL apps

 Futures, including Cloud Architectures for ULL

 Ted’s A-Team strategic projections for next 2 years

  • Projections on new technologies’ impacts on ULL – may include:
    • new Intel cores & software,
    • adoption of Single Tier networks,
    • impact of in memory machine learning for alpha generation of trading signals,
    • integration of deep machine learning from cloud to live trading networks via high speed interconnects, to an asynchronous Q, with NO latency impact,
    • applicability of block chains,
    • system reliability engineering (SRE),

< open slot to catch up on past material + student’s questions pertaining to future of ULL for ET>

 

 

One thought on “Update-ULL (Ultra Low Latency) Architectures for Electronic Trading @ NYU Summer 2017”

Leave a comment