NUMA 구조

3 분 소요

1. NUMA란?

NUMA = Non Uniform Memory Access
메모리를 관리하는 방법 중 하나

1.1. UMA

UMA = Uniform Memory Access

모든 프로세서들이 CPU와 메모리 사이의 공용 BUS를 이용해서 메모리에 접근
문제점
- 여러 소켓들이 BUS를 동시에 사용할 수 없다는 점
- 하나의 CPU가 메모리에 접근하게 되면 나머지 CPU들은 block 되어 메모리 처리가 늦어짐(병목현상)

1.2. NUMA

UMA의 문제를 해결하기 위해 생겨난 메모리 관리 기법

위 그림은 NUMA 구조를 표현하며, 4개의 CPU를 각 그룹으로 나누고 메모리 일부를 연결 시킴
분류된 CPU-메모리를 하나의 Node라고 지칭하며 동일한 CPU 내의 메모리 접근은 Local Access라고 부름
서로 다른 Node 간의 메모리 접근은 Remote Access 라고 부름

NUMA는 CPU별로 접근하는 메모리 area를 나누기 때문에 해당 메모리만들 사용하도록 하여 병목현상을 해결하는 장점을 가짐
다시 말해, CPU n개를 하나의 NUMA node로 묶어서 사용
NUMA node 내에서 접근을 local access라고 하며 각각의 NUMA node에서 각각의 local access는 동시에 일어날 수 있고 이를 바탕으로 병목현상을 해결
NUMA의 성능은 설계에서 remote access를 고려해야 됨
- Remote access는 local memory allocation 등의 작업을 진행할 때 문제가 발생하면,
- remote access가 발생하고 현재 CPU가 존재하는 NUMA node가 아닌 다른 NUMA node에서 작업을 처리
- 이때 발생하는 시간적인 소요가 15% 정도 나는 것으로 보임
- 따라서 NUMA를 사용할 때 최적화를 위해서는 어떻게 NUMA node를 설계해서 remote access를 최소화 하는 지가 관건

2. NUMA 구조

x86에서 NUMA 시스템을 위한 초기화 설정 부분은 x86_numa_init 함수에서 동작

void __init x86_numa_init(void)
{
	if (!numa_off) {
#ifdef CONFIG_ACPI_NUMA
		if (!numa_init(x86_acpi_numa_init))
			return;
#endif
#ifdef CONFIG_AMD_NUMA
		if (!numa_init(amd_numa_init))
			return;
#endif
	}

	numa_init(dummy_numa_init);
}

numa_init()을 호출하게 되고 numa_init_array(), numa_set_node() 함수를 통해서 CPU에 NUMA Node를 설정
NUMA의 각 메모리 노드들은 아래의 pglist_data로 관리

/*
 * On NUMA machines, each NUMA node would have a pg_data_t to describe
 * it's memory layout. On UMA machines there is a single pglist_data which
 * describes the whole memory.
 *
 * Memory statistics and page replacement data structures are maintained on a
 * per-zone basis.
 */
typedef struct pglist_data {
	struct zone node_zones[MAX_NR_ZONES];
	struct zonelist node_zonelists[MAX_ZONELISTS];
	int nr_zones;
#ifdef CONFIG_FLAT_NODE_MEM_MAP	/* means !SPARSEMEM */
	struct page *node_mem_map;
#ifdef CONFIG_PAGE_EXTENSION
	struct page_ext *node_page_ext;
#endif
#endif
#ifndef CONFIG_NO_BOOTMEM
	struct bootmem_data *bdata;
#endif
#ifdef CONFIG_MEMORY_HOTPLUG
	/*
	 * Must be held any time you expect node_start_pfn, node_present_pages
	 * or node_spanned_pages stay constant.  Holding this will also
	 * guarantee that any pfn_valid() stays that way.
	 *
	 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
	 * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG.
	 *
	 * Nests above zone->lock and zone->span_seqlock
	 */
	spinlock_t node_size_lock;
#endif
	unsigned long node_start_pfn;
	unsigned long node_present_pages; /* total number of physical pages */
	unsigned long node_spanned_pages; /* total size of physical page
					     range, including holes */
	int node_id;
	wait_queue_head_t kswapd_wait;
	wait_queue_head_t pfmemalloc_wait;
	struct task_struct *kswapd;	/* Protected by
					   mem_hotplug_begin/end() */
	int kswapd_order;
	enum zone_type kswapd_classzone_idx;

	int kswapd_failures;		/* Number of 'reclaimed == 0' runs */

#ifdef CONFIG_COMPACTION
	int kcompactd_max_order;
	enum zone_type kcompactd_classzone_idx;
	wait_queue_head_t kcompactd_wait;
	struct task_struct *kcompactd;
#endif
#ifdef CONFIG_NUMA_BALANCING
	/* Lock serializing the migrate rate limiting window */
	spinlock_t numabalancing_migrate_lock;

	/* Rate limiting time interval */
	unsigned long numabalancing_migrate_next_window;

	/* Number of pages migrated during the rate limiting time interval */
	unsigned long numabalancing_migrate_nr_pages;
#endif
	/*
	 * This is a per-node reserve of pages that are not available
	 * to userspace allocations.
	 */
	unsigned long		totalreserve_pages;

#ifdef CONFIG_NUMA
	/*
	 * zone reclaim becomes active if more unmapped pages exist.
	 */
	unsigned long		min_unmapped_pages;
	unsigned long		min_slab_pages;
#endif /* CONFIG_NUMA */

	/* Write-intensive fields used by page reclaim */
	ZONE_PADDING(_pad1_)
	spinlock_t		lru_lock;

#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
	/*
	 * If memory initialisation on large machines is deferred then this
	 * is the first PFN that needs to be initialised.
	 */
	unsigned long first_deferred_pfn;
	/* Number of non-deferred pages */
	unsigned long static_init_pgcnt;
#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
	spinlock_t split_queue_lock;
	struct list_head split_queue;
	unsigned long split_queue_len;
#endif

	/* Fields commonly accessed by the page reclaim scanner */
	struct lruvec		lruvec;

	/*
	 * The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on
	 * this node's LRU.  Maintained by the pageout code.
	 */
	unsigned int inactive_ratio;

	unsigned long		flags;

	ZONE_PADDING(_pad2_)

	/* Per-node vmstats */
	struct per_cpu_nodestat __percpu *per_cpu_nodestats;
	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
} pg_data_t;

결국 하나의 노드는 pg_data_t 구조체를 통해 관리되며
이 구조체는 해당 노드에 속해있는 physical 메모리의 실제 양(node_present_pages)
physical 메모리가 메모리 map의 몇 번지에 위치하고 있는지를 나타내는 변수(node_start_pfn) 등이 정의

참고

https://wogh8732.tistory.com/399
https://bluemoon-1st.tistory.com/85
https://youngswooyoung.tistory.com/64

This is personal diary for study documents.
Please comment if I'm wrong or missing something else 😄. 

Top

Twitter Facebook LinkedIn

NUMA 구조

1. NUMA란?

1.1. UMA

1.2. NUMA

2. NUMA 구조

참고

공유하기

댓글남기기

참고

[C++] 부스트 파이버(Boost fiber)

[Design Pattern] 반복자 패턴(Iterator) in C++

[Design Pattern] 커맨드 패턴(Command) in C++

[Design Pattern] 책임 연쇄 패턴(Chain of Responsibility) in C++