Update-ULL (Ultra Low Latency) Architectures for Electronic Trading @ NYU Summer 2017

nyu

ULL (Ultra Low Latency) Architectures for Electronic Trading
NYU SPS Summer 2017 Session 9 sessions (approx 2 ½ hours each) –– Ted Hruzd
Est: Tuesdays May 30, Jun 6-13-20-27, July 6-11-18-25, 6:15-8:45 pm

On-Line Registration to begin in February 2017 (approx cost $650-$675):

https://www.sps.nyu.edu/professional-pathways/topics.html

Course Objectives

Develop advanced skills in architecting electronic trading (ET) and market data applications for ultra low latency (ULL), for competitive advantage, and for positive ROI. At end of course one will have developed expertise in end-end architecture of ET applications and infrastructure, including:

Tick-2-Trade applications with single digit micro seconds, even with sub 1 micro seconds
How to architect for deterministic latencies even in times of volume spikes
Why ‘Meta-Speed’ (info how to used speed) is more important than pure speed
Proper use of multi-layer 48 port $18K ULL switches, FPGA’s, GPU’s, MicroWave technologies
Integration of FPGA’s and Intel cores via high speed caches, eventually FPGA’s and cores on same die (Intel-Altera current and upcoming enhancements)
When, how to architect market data order books and FIX engines in FPGA based NIC’s
Multi core, high speed cache Intel based servers
Linix 7.2 kernel and NIC tuning
Kernel bypass technologies including RDMA and LDMA
Leading FPGA based NIC(s) – from SolarFlare, ExaBlaze, Enyx
Single tier (or simplified spine-leaf); Ex: from Plexxi networks
Layer 1 network switches (Metamako & ExaBlaze)
SDN (Software Defined Networks) – when applicable for ULL trading applications
New binary FIX protocol for ULL order routing
ULL messaging middleware (29 West LBM/UME) and 60 East Tech AMPS
ULL software design (deep vectors ex Intel’s AVX-512 & multi threading – OpenMP, TBB)
Storage, including NVME Flash
Tools (some free) to attain performance optimization insights
Network appliances – detailed timings/analytics – network, market data, and order routing
Big Data and Event Stream processing, real time analytics for seeking alpha (trade opportunities)
Fundamentals of FPGA design and programming
Network performance analysis via WireShark and potentially also via Corvil
Programming trading algo’s via basic Python
Machine learning / neural networks for seeking alpha via basic R programming
ROI analysis

PreReq – (for most, expecting basic to intermediate expertise, unless noted)

Most important: at least 2 years working with electronic trading applications/infrastructures as Developer, SA, network admin/engineer, Architect, QA analyst, tech project mgr, operations engineer, manager, CTO, CIO, CEO, vendor or consultant providing technology to Wall Street IT,
TCP/IP, UDP, multicast (basic knowledge),
Linux OS and shell or scripting (ex bash, perl); at minimum basic familiarity of output and usefulness of core Linux commands such as sysctl –a, ethtool, ifconfig, top, ls, grep, awk, sed, and others listed later in this syllabus
Intel servers, cores, sockets, GHz clock speed, NUMA
Network routers, switches
1 or more network protocols from BGP, OSPF, EIGRP, MPLS, IB
FIX protocol
Market Data, at minimum contents of equities consolidated feeds
Visio (will use for homework assignments)
Python (very basic will be fine – a 2 hour reading assignment will be arranged for beginners). We will use a text written for traders with zero programming experience that quickly trains them how to use small set of Python for creating trading algo’s
R programming (nice to have. Will use basics that one can learn in 1-2 hours),

Course Logistics

8 or 9 sessions -, 2 ½ hours each ex: 6:30-9pm or 6:15-8:45 pm), start May 30 (Tue) or 31 (Wed), once a week
Tech book(s) to download to kindle
- Architects of Electronic Trading, Spephanie Hammer, Wiley 2013
- Ultimate Algorithmic Trading Systems ToolBox, George Pruitt, Wiley, 2016
- (optional) Trading and Electronic Markets: What Investment Professionals Need to Know, Larry Harris, CFA, 2015
Multiple web site links to technical white papers and tech analyses -ex: nextplatform.com, http://intelligenttradingtechnology.com/, http://datamanagementreview.com, www.tabbforum.com, www.tradersmagazine.com
Visio (some homework assignments)
Extensive use of white board by instructor and students. Sessions will present students with few infrastructures to architect per specific business success criteria
Grading:
- 1/3 class participation in in-class architecture designs -white board sessions)
- 1/3 quizzes / tests
- 1/3 – Homework – visio, wireshark analysis, basic python algo programming

Session 1 – Tue May 30

ULL components: CoLo, switches, FPFA, servers, OS, networks, software & middleware, market data

Will present a Visio diagram with a co-lo ULL architecture that generates orders destined for trading venues, utilizing Layer 1 switching + FPGA’s for market data & order flow with a target of sub 1 micro second Tick-2-Trade (T2T) latencies. Latencies will be deterministic, even at peak loads, as long as switch and FPGA’s process at line speed
Will briefly present an alternative architecture utilizing a Single Tier network (Plexxi)
We will periodically revisit the co-lo ULL architecture throughout this course when we cover specific architecture components in depth (Algo Trading and/or SOR that feeds this architecture, use of FPGA, and Layer 1 switching)
Partial FPGA and non FPGA alternative architectures
Key advantages of FPGA (Ted’s A-Team Doc)
Why speed of processing (& ULL market data) still matter & will for next several years at least
Why Meta-Speed is more important than pure speed (reference Corvil-Tabb Doc)
- Meta-Speed Deep Dive (10 minutes)
Why Layer 1 switches
Layer 1 switch with integrated cores and FPGA for risk checks
High speed real time analytics for seeking alpha (trade opportunities) & infrastructure analytics
Exchange (Trading Venue) connectivity
Layer 2/3 aggregation in new switch appliances
Some leading ULL vendors:
- Metamako
- Algo-Logic
- Nova-Sparks, with Nova-Link product
- Corvil
- Intel / Lenovo
- SolarFlare
- ExaBlaze
- Enyx
Role of Linux kernel tuning for ULL – use network-latency profile & common Linux best practices
Present some Linux configurations to critique (ex: no K bypass, same NIC for mkt data & order flow)
What electronic trading organizations will prosper in space of ULL ET now & in future? Which may very well fail, even disappear? Why is role of ROI critical? Difficulties of proper ROI analysis

lass Exercise (at end of class we will do this together)– Given few server, Linux configurations with flaws, respond with measures to optimize performance & lower latencies

HOMEWORK – only reading assignments (2 weeks to complete) –
- Architects of Electronic Trading, Spephanie Hammer, Wiley 2013
- https://developers.redhat.com/blog/2015/02/11/low-latency-performance-tuning-rhel-7/
- https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf
was Dec 8, 2016 Panelist for an A-Team Webinar re: perspectives on strategic ULL market data architectures & how trading firms can realize ROI, seek alpha, expand market share, address risks and compliance.
- access webinar recording here:http://bit.ly/2fXujEo.
(may add more reading assignments here …):
- Get a head start and read parts Ted will specify in Ultimate Algorithmic Trading Systems ToolBox, George Pruitt, Wiley, 2016 ( 4 weeks to complete )

Session 2 – Tue June 6

Deep Dive into Red Hat Linux 7.2 low latency configuration & tuning, kernel bypass, PTP & NTP, then more details regarding ULL architectures from Class 1

Entertain review of last week + questions & discussion of assigned readings.

I will present/explain following best practices regarding Linux tuning in a way that will lead to some white boarding designs. To large extent the below is an outline of the Linux reading assignment. **** Do we have access to NYU Linux server(s) for review of Linux configurations and to run some basic commands – else is Dev CoLo available? Other alternative – Ted will have same Linux and server configurations with flaws, open for optimization

Deep dive into Linux 7.2 network-latency configuration
- Base config includes (perf over power saving):
  - Tcp_fastopen=3 (2 way handshake – encryption of cookie of client @ init, so reconnect is 2 way, using the cookie)
  - Enable Intel_pstat & min_perf_pct =100 (Ghz steady; disable fluctuations)
  - Disable THP (Transparent Huge Pages of 2 MB under K control)
  - Cpu_dma_latency
    - @ c_states, keep cores from sleeping; part of QoS
  - Busy_read 50 uSec (100 uSec for large# pkts) & busy_poll 50 uSec (skt poll recvQ of NIC, disable net interrupt); cores “active”
    - BUT — K bypass much better (discuss 3 methods of K bypass)
  - Numa_balance 0 (no auto NUMA mgt)
- Disable unnecessary daemons and services (ex firewalld & iptables)
- Max # ring buffer size
  - Dev driver drains buf via soft IRQ (other tasks not interr vs hard interr)
- Set RFS (Recv Flow Steering)- increase CPU cache hits,forwards pkts to consuming app
- TCP SACK- retrains only missed bytes)- tcp_sack+1
- TCP Window scaling – up to 1 GB
- Sysctl –w net.ipv4.tcp_low_latency=1
- Timing and scheduling:
  - Sched_latency_ns (20 ms default; increase!!)
  - Sched_min_granularity (4 ms default; increase!)
    - Increase # procs, threads – formula may lower this 4 ms
  - Some applications may benefit from tickles kernel
    - (ex: small # procs, threads at no more than # cores)
  - Sched_migration_ns (default500 uS; increase!)
    - This pertains to period of “hot” cache, prevents pre task migration
  - Basic Linux and Server measures and utilities for performance analytics:
    - BIOS updates and tuning
    - Turbostat
    - Lstopo
    - Lscpu
    - Numactl
    - Numastat
    - Tuned
    - Tuned-admin network-latency configuration (set profile)
    - Isolcpus
    - Interrupt affinity or isolation
    - Irqbalance
    - Busy_poll
    - Check gamut of process (pid) info, much pertaining to performance in /proc/<pid>; for ex: files numa_maps, stat, syscall
    - Tuna – control processor and scheduler affinity
      - Options: Isolate sockets from user space, push to socket 0
    - VTune Amplifier 2016
      - CPU, GPU, threads, BW, cach, locks, spinTm, FxCalls, serial+Par Tm,
      - ID code section ID for parallelization; ex: TBB – more control over OpenMP
      - MPI analysis ex locks, MCDRAM analysis
    - Intel’s PCM (Performance Counter Monitor) – major enhamcements
      - Ex: times specific threads hit/miss L1-2-3 caches and measures cache times and impacys of misses; helps ID priority procs, threads for cache
    - Tx Profilers – Wily, VisualJVM, BeaWLS, valgrind, custom FREE– T/S, ESP correl, ML
    - Perf: perf top –g (functions)
      - Perf counters in hardware (cpu) , with kernel trace points (ex: cache miss, cpu-migration, softirq’s)
    - strace
    - Ftrace- uses the frysk engine to trace systemcalls
      - sycalls of procs and threads
      - Dynamic kernel fx trace, including latencies (ex: how long proc wakes/starts
      - /debug/tracing
      - Trace_clock
    - Dtrace for Linux:
      - Dynamic – cpu, fs, net resources by active procs, can be quite specific
      - Log of args /fx
      - Procs accessing specific files
      - # New processes with arguments
      - dtrace -n ‘proc:::exec-success { trace(curpsinfo->pr_psargs); }’

§ # Pages paged in by process§ dtrace -n ‘vminfo:::pgpgin { @pg[execname] = sum(arg0); }’§ # Syscall count by process§ dtrace -n ‘syscall:::entry { @num[pid,execname] = count(); }’ ….. specific syscall ct per process or thread§ also ‘canned’ scripts for processes with top tcp and udp traffic, ranking of processes by bandwidth o SystemTap – ex:probe tcp.setsockopt.return§ Uses strace points for kernel and user probes§ Script thief.stp – interrupts by procs histogramo dynamically instrumenting running production Linux kernel-based operating systems. System administrators can use SystemTap to extract, filter and summarize data in order to enable diagnosis of complex performance or functional problems.o SysDig Tool – only syscalls, dump for post processing scripting

Oprofile uses hw counters, tracks mem access and L2 cache, hw interrupts
- Mpstat, vmstat, iostat, nicstat, free, top, netstat, ss [filter/script for analytics]
VM (Virtual Memory) and page flushes, optimize market data caches
- Slab allocation= mem mgt for k objects, eliminates frag
Slow network connections and packet drops
Intro to NetPerf tool
NIC tuning
Kernel bypass, LDMA, RDMA
Kernel bypass with NIC vendors (SolarFlare, Mellanox, ExaBlaze,) – description how each work
- SolarFlare OpenOnLoad sets up all socket calls in user space instead of kernel space, with dedicated socket connection & data handled in NIC memory
- Mellanox VMA linked library to user space, also sets up user space calls to NIC; Connect-IB NIC allows non-contiguous memory transfers for app-app; RV offload – speeds up MC; MLNX OFED open fabric verbs for IB and Ethernet; PCIe switch & NVMe over Fabric; MPI offloads; 2 ports at 100 Gbps; IB & Ethernet connections < 600 ns latency
- Enyx NICs: differs from SF and MX network stack in user space (can be CPU intensive)
  - Enyx places full TCP stack in hardware (FPGA); reduce jitter
- Network appliances:
  - ExaBlaze Fusion
  - Metamako MetaApp
  - FixNetics ZeroLatency
- Precision Timing – PTP and NTP
- PTP Symmetricom Sync Server S300s – NTP & PTP, owned by Microsemi GM
- GPS Satellite satisfies UTC Req.
- MIFIF II and PTP (software (sw) + hardware(hw) critical for accuracy; Req: 100 uSec + UTC
  - Symmetricom PTP GM 6 ports +/- 4 ns, <25ns to UTC
  - GPS -> GM-Spectracom->B-Clock(Arista7150s-FPGA timing+NAT)->servers-PTP-sw with FPGA based NIC’s ex: Exablaze ExaNIC models) – or SolarFlare NIC’s with HW timestamps
    - Linuxptp – ptp4I & phc2sys (can act as B-Clk) sync PTP hw clock on client, including VLAN tagged interfaces and bonded interfaces to master (GM) but with kernel; Dmons can’t consume MC; K delivers pkt to bonded interface SF’s sfptpd does all in HW; can sync every SF adapter; ptpd – mult platforms but just sw.
      - Timemaster – on start, reads NTP & PTP time servers, starts daemons, can sync sys clock to all time servers in multiple PTP domains
    - Master-slave time sync (ex:
    - PTP Timing within 6 ns –
    - consider disable tickless kernel : nohz=off (for accuracy) BUT test this and app impact
    - PTP in hardware best but costs; do ROI
    - If multiple interfaces in diff networks, set reverse FWD mode to loose mode
    - Cmd: ethtool –T <int> -verify timestamp —- (for hw)
    - “timemaster” reads config of PTP time source
    - Cmd: systemctl start timemaster
    - ExaNIC FPGA can be programmed for extra analytics; some base programs available
    - MC if sync msg from Master but UDP unicast delay msg from slave to Master
    - PTP assumptions:
      - Network path symmetry (hence switch, router, FW, OS impact this)
      - Master and slave accurately measure when at pt of send/receive
      - Every hop can reduce PTP accuracy
    - PTP options:
      - Each slave clock direct cables to master .. but complexity. Cost …
      - Dedicate PTP switch infrastructure; switch PTP aware & eliminate switch delay or act as PTP M B-Clk; do not mix traffic
      - In dedicated LAN, PTP thru switch L2 Bcast to PTP bridge (server as B-Clk & bonded interface mgr), sends MC to FW (–if no SF; FW has list MC groups, IGMPv3 config ), MC to PTP clients for Time Sync, best if clients have with SF <add PICTURE>
        
        FW configured for IGMP3, has necessary config allowing PTP-Bridge & clients to join std PTP MC group 224.0.1.129
        
        Sfptpd can work on bonded interfaces so PTP clients need specify mgt interface to get PTP TS (from PTP bridge)
      - Hardware time stamps at every point
    - More PTP details:
      - Slaves periodically send messages back to Master (sync)
      - Sfptpd à file or syslog; ptp4l à stdout
      - Offset: amt Slave Clk off from Master
      - Freq Adjustment: how much clock oscillator adjusts to run at same rate as Mstr
      - Path Delay: how long to Slv & VV
      - Metrics – collectd, applies RegEx
    - NTP – selects accurate Time servers from multiple (ex 3); polls 3 or more servers
      - Keep stratum levels to no more than 2
      - Keep 3 clock sources near for sync
      - Use switches with light of no queuing
      - Use “timekeeper” – transforms any server into a timing appliance
    - Class Exercise (we will do together in class)–Explain different approaches to kernel bypass of following: ExaBlaze, SolarFlare, Mellanox, Enyx). Explain strengths and advantages of each; advise what specific electronic trading applications would best benefit from each.
      - Explain how following Linux Tuning options will will impact latencies
        
        Swappiness =0;
        
        Dirty-ratio 10;
        
        Background- ratio 10
        
        NIC interrupt coalescing (pre kernel-bypass)
        
        Ring buffer increase
        
        UCP receive buffer at 32 MB
        
        Netdev-backup 1000000 (traf stored before TCP/IP proc; 1/core)
      - Explain what following commands produce for latency analysis:
        
        ifconfig command
        
        Netstat –s (send/recv Q’s)
        
        Ss utility
      - Detail major benefits of VTune and DTrace and when you would use either

HOMEWORK – complete prior week reading assignments, prepare for 30 minute quiz next class regarding:

Optimal Linux kernel tuning per application requirements (Red Hat Doc + class notes)
Benefits of FPGA’s and GPU’s (Text book + Algo-Logic doc)
How multi layer switches work (Metamako Doc)
Differences in tuning to Speed 1 (raw) vs Meta-Speed (Corvil & Tabb Doc)
**** IN CLASS — I will spend 15-20 minutes detailing what is most important from the above.

Session 3 – Tue June 13

Quizz then FPGA’s, MultiCast, Market Data

QUIZZ (30 minutes)

*** we will immediately review quiz over next 30 minutes

Remaining 1 ½ hours of class:

FPGA’s & Market Data

Hardware accelerated appliances for ULL and deterministic performance
Ted’s FPGA Hand-out: — Intro to FPGA’s including intro to FPGA design & programming (I/O blocks + Logic blocks, OpenCL for creating “kernels” + synchronization for parallelism )
Why performance tends to be very deterministic with FPGA’s & why deterministic performance (latencies) are critical for HFT and algo traders
Pitfalls of FPGA’s
FPGA’s vs GPU’s, Intel Phi (Intel Doc), and multi cores
Feeds in FPGA –architecture, performance, design, support
Switch crossbars or caches for fan out with TCP distribution
Ted’s MC hand-out
Multicast (MC) performance considerations
- Turn on IGMP Snooping on Switch
  - Switch listens to IGMP conversations between hosts/routers; maps links that require MC streams; Routers periodically query; 1 member per MC group per subnet reports.
- Clients issue IGMP join requests to MC groups
- Routers solicit group member requests from direct connect hosts
- PIM-SM (Sparse Mode …low % MC) requires a Rendezvous Point (RP) router
- Routers in PIM domain provide mappings to RP (exchange info for other routers)
  - PIM domain: enable PIM on each router
  - Enable PIM sparse mode on each interface
- After RP, forward to receivers down shared distribution tree
- When receiver’s 1^st hop router learns source, it sends join message directly to source
- Protocol Independent Multicast (PIM) is used between the local and remote MC routers, to direct MC traffic from the MC server to many MC clients.
Message based appliances, including FPGA based
Direct feed normalization
Conflation to conserve bandwidth
NBBO
Levels 1 and 2 market data
Depth of book builds (in FPGA’s or new multi core servers)
Smart order routers
Doc regarding leading vendor solutions
Exablaze NICs and switches v Metamako switches for market data
ENYX FPGA NICs and Appliances for market data and order flow
Nova Sparks FPGA based market data ticker
Fixnetics Zero latency – multi thread risk checks in FPGA and order processing in parallel on a core(a)
Other products — Exegy, Algo logic, Redline, SR labs
Consolidated feed vendors Bloomberg and Thomson Reuters
Use of new Intel Technologies (hardware & software) for alpha seeking strategies
Class Ex – (1) white board sessions where students will design ULL market data and multi cast architectures, per specific business/application criteria. (2) Given a Visio of a large network but with only a few MC groups and subscribers, identify the likely path(s) to sources few. Include choice of router as RP.

HW – Mkt Data white paper.

Week 4 will have a visio assignments:

HOMEWORK – 2 Visio designs – 1 for a 1 uSec T2T, 2^nd for more modest (includes internal alpha seeking) 10 uSec T2T

Session 4 – Tue June 20

Review of Visio assignments, then intro’s to Python for Algo Trading, FIX protocol, Wireshark (maybe Corvil), R-Neural Networks,

Quick intro Python – reference — Ultimate Algorithmic Trading Systems ToolBox, George Pruitt, Wiley, 2016

Python algo trading examples
Intro to Wireshark
Intro to FIX Protocol
Intro to Wireshark with FIX protocol “Plug-in”
TCP, UDP, multicast (MC), then analysis via WireShark, Corvil
Intro to R / Neural Networks

Last Hour of Class – FPGA and T2T 1 Usec Deep Dive – slot open for FPGA expert John Lockwood of Algo-Logic; if John NA then we will start with Class 5 content and hope John can join us

HOMEWORK – additional Visio design TBD + basic Python algo trading program to code + reading assignments – Python, Wireshark, R-Neural Networks

Session 5 – Tue June 27

Review of Visio assignment, then continue with Python for Algo Trading, Wireshark (maybe Corvil), R-Neural Networks, then cover latest ULL Intel Technologies, other server, memory, Flash devices, ULL messaging architectures, network protocols, SDN

Programming with Multiple core multi thread, parallelism

Vectorize application code
Design – Internal loops with deep vector instructions, outer loops with parallelization (threads)
Servers, sockets, cores, caches. MCDRAM (Intel Phi)
Core speeds GHz vs more cores, larger and faster caches
Over clocked servers – features and what applications can benefit
Linux, Solaris, Windows, other ex SmartOS, Mesosphere DC OS
How to benchmark performance, analyze, tune
NUMA aware processes and threads
Optimize cache assignments per high priority threads
Intel technologies including …
AVX-512 deep vector instructions (speeds up FP ops)
- 6-8 registers; more ops/instruction; less power
TBB thread Building blocks (limit oversubscription of threads)
- OpenMP- explosion of threads
Omni-Path high speed / bandwidth interconnect (no HBA, fabric QoS, MTU to 10K, OFA verbs,105 ns thru switch ports, 50 GB/s bi ) & QPI
- Uses Silicon Photonics (constant light beam, lower latencies and deterministic)
QuickPath: mult pairs serial links 25.6 GB/s (prior to Omni-Path)
- Mem controllers integrated with microprocessors
- Replaced legacy bus technology
- Cache coherent
Shared Memory is faster than Mem Maps; allows multiple procs read/write into shared mem among the procs – without OS read/write commands. Procs just access the part of shared mem of interest.
- Discuss ex of server proc sending HTML file to client; file is passed to mem then net function copies mem to OS mem; client calls OS function which copies to its own mem; contrast with Shared Mem
PCIE
C ++ vs Java for ULL
Lists vs vectors
Iterate lists
Role of FPGA, GPU, MicroWave networks for ULL
C/C++, Java, Python, CUDA, FPGA – OpenCL: programming design considerations
Java 8 new streams API and lambda expressions – for analytics
Class Ex – Explain how Quick Path & Omni Path both improve latencies and advise which is preferred for ULL and why

New age networks – Spine leaf to single tier
SDN (Software Defined Networks)
- Cisco ACI + Tetration
- Cloudistics
- Plexxi
- NSX
Pico – ULL SDN vendor
Options-IT – colo managed ULL infrastructure
Cisco and Arista switches for ULL
Cisco ACI and Cisco Tetration – Deep machine Learning to automatically optimize large networks
Switches with deep buffers, great for Big Data Analytics
Configure Routers for ULL – LLDP, MLAG, VRRP, VARP (active-active L3 gateway)
Network protocols – BGP, OSPF, HSRP
Arista 7124FX with EOS
Plexxi switches – a disruptive technology – single tier
Plexxi optimal bandwidth via its SDN
Optimal VLANs configuration for analytics
- Use trunks from 1 switch to another switch after defining a VLAN, or use router
VPLS (Virtual Private LAN Service) also for analytics
- Enet based multiPt-multiPt over IP or MPLS
Decrease network hops for speed
- (ex: Slim-Fly: Low diameter network architecture if not ready for Single Tier)
Network protocols:
- EBGP: external, distance vector, via paths, network policies, rule sets, finite state machine; BGP peering of AS-AS
- BGP-MP: multi protocol + IPv6, unicast & MC; use for MPLS-VPN
- OSPF: interior within AS, link state routing with metrics of RTT, amt data thru specific links, and link reliability
- MOSPF: uses group membership info from IGMP + OSPF DB, builds MC trees
- EIGRP: OSPF + more criteria: latencies, effective BW, delays, MTU
- MPLS: between network nodes, short path lables – avoid complex lookups in RTE table; multi protocol- includes ATM, Frame Relay, DSL
- IB: hw, light wt, no pkt reordering; link level flow control, loss less, QoS virtual lanes 0-14, RDMA verbs (adds latency), UFM tool, 4000 byte MTU (adds latency as this MTU must fill before transmission)
- OPA: Intel’s Omni Path Architecture: 100 Gbps, new 48 ports switch silicon, silicon photonics, no HBA, 50% decrease infra vs IB, 100-110 ns / port, congestion control – reroutes traffic, MTU up to 10K
- FC: FiberChannel – optical pkts, units of 4 10-bit codes, 4 codes=TransWord; meta-data sets up link and Seq; tools-Agilent, CATC, Finisar,Xyratex; FC: similar to Enetpkt ex: mult frames assembled with src, dest
- IGMP: used by hosts and adjacent routers on IPv4 networks to establish multicast group memberships
Next Gen Firewalls (ex: Fortinet)
- 1 platform end-2-end, with multiple security related aspects including anti-virus, malware, intrusion detection, database and OS access controls, web filtering, web app security, user ID awareness, standard rules access, internal segmentation (into functional security zones, limits spread of malware & mischief, identifies mischief & quarantines infected devices; shares all info via its fabric to whole network; Zero Trust Policy – places the FW in network center, in front of data
- Empow – Security Orchestration product
Kerberos
- Client/server network authentication protocol via secret key cryptography, stronger than traditional firewalls as they focus on external threats whereas Kerberos focuses on internal
- Each KDC has copy of Kerberos DB; Master KDC has copy of realm DB, which is replicated to slave KDC’s @ regular intervals; DB password changes are in the Master; slaves grant Kerberos ticket servers / services, time series critical & create ACLs too; Kerberos daemons are started on the Master, assigns hostnames to Kerberos realms, ports, slaves
- Opportunity for Docker container security:
  - Kerberos for access to multiple levels of container types (ex: checking account KYC vs withdrawal .. acct mgr vs authenticated client)
- IPTables – may opt to disable for ULL, rely on external FW
  - Set up, inspect tables of IP packet filter rules; each table: built-in “chains” & user defined chains; chains list rules to match set of packets
  - Required for server “routers” NAT aware
    - Rte intercepts, determines NAT@
  - END- OPTIONAL
  - Class Exercise – Determine whether Single Tier Networks improve ULL versus Spine Leaf. If so explain why. Several scenarios will be presented and students will architect networks on white boards.

Session 6 – Thu July 6 (1 ½ hour class, location TBD)

Special lab session: Python for algo trading, extra wireshark training, and extra R / neural networks

Session 7 – Tue July 11

Complete / Review topics from last week; start with role of Big Data for alpha seeking and input into ULL trading algo’s

Middleware, Analytics, Machine Learning, leading to end-end ULL Architectures

Analytics & Machine Learning: to seek alpha and for infrastructure analytics

Intro to Big Data Analytics & Machine Learning (focus on neural networks)
Role of Java 8 new streams API
- Speeds up extracting insight from large collections via methods such as:
  - Filter, sort, max, map, flatmap, reduce, collect
- Use with ArrayLists, HashMaps (does not replace them)
- Stream is 1-time use object
Intro to Complex Event Processing (CEP) and Event Stream Processing (ESP)
Databases – Never in the path of ULL
Column based (contiguous memory) vs relational
KDB and OneTick – leading players in high speed market data tick databases
Event Stream Processing (ESP) – use ESP to seek alpha
Combine market data with News sentiment analytics to seek alpha,
Intro to Ravenpack news sentiment analytics
Intro to Spark
Role of new storage technology (ex NVMe Flash drives)
In-mem analytics ex HANA, Spark
Corvil – intro to how to configure Corvils and how to analyze FIX order flow with it
Machine learning, neural networks in R or Python – create equations to project latencies
Machine learning for Latency analysis, tuning insight, seeking alpha -trade opportunities
Programming for multi threaded trading risk analytics
Class Ex – output from Corvil streams will be provided. Students will analyze and determine how latencies can be projected using neural networks (design only – no programming) — or we may do this together in class – Ted to attain sample data as input to neural networks for latency predictions + sample data for alpha seeking

HOMEWORK – prepare for 30 minute quiz start session 7 regarding Big Data analytics for ULL trading; key points will be stressed in class; also – 2 weeks to complete visio integrating alpha seeking opportunities to a ULL trading architecture – full details in class

Session 8 – Tue July 18

Quiz then go over quiz; then cover high speed messaging & middleware, infrastructure ROI, and cloud technologies for ULL, + application specific details regarding ULL apps

Middleware, High Speed Messaging

60 West AMPS
29 East LBM (UME)
New FIX Binary protocol in beta promises to lower latencies
Importance of High Speed messaging for Algo Trading
Intro to basic algo’s for trading equities (ex VWAP, Volume Participation, use of AVX and RSI)
How to back-test algo’s for trading
How to conduct ROI for new ULL architectures
Why traditional cloud architectures fall short for ULL
Cloud for analytics – pitfalls vs best practices
Micro services potential
Class Ex – output from application logs will be provided. Students will analyze and determine how AMPS can be configured for both high speed middleware and event stream processing for analytics

End-End ULL Architectures

Co-Lo with 500 ns order ack times (revisited with our new knowledge)
Dark pools
Algo Trading (servers, appliances, or FPGA’s) in the architecture
Smart Order routers
Prop Trading
Exchanges
Class Ex – VISIO or white boarding of a new trading system TBD, applying all learned in course

Session 9 – Tue July 25

Review visio assignment then cover future ULL architectures ; then cover high speed messaging & middleware, infrastructure ROI, and cloud technologies for ULL, + application specific details regarding ULL apps

Futures, including Cloud Architectures for ULL

Ted’s A-Team strategic projections for next 2 years

Projections on new technologies’ impacts on ULL – may include:
- new Intel cores & software,
- adoption of Single Tier networks,
- impact of in memory machine learning for alpha generation of trading signals,
- integration of deep machine learning from cloud to live trading networks via high speed interconnects, to an asynchronous Q, with NO latency impact,
- applicability of block chains,
- system reliability engineering (SRE),

< open slot to catch up on past material + student’s questions pertaining to future of ULL for ET>

AI-FIT

Update-ULL (Ultra Low Latency) Architectures for Electronic Trading @ NYU Summer 2017

One thought on “Update-ULL (Ultra Low Latency) Architectures for Electronic Trading @ NYU Summer 2017”

Leave a comment Cancel reply

AI Fit LLC Web-Site

Share this:

Related

One thought on “Update-ULL (Ultra Low Latency) Architectures for Electronic Trading @ NYU Summer 2017”

Leave a comment Cancel reply

AI Fit LLC Web-Site