<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>CSAPP Chapter 5 on Jiho Kim</title><link>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/</link><description>Recent content in CSAPP Chapter 5 on Jiho Kim</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>© 2026 Jiho Kim</copyright><lastBuildDate>Wed, 08 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/index.xml" rel="self" type="application/rss+xml"/><item><title>CSAPP 5.2 Expressing Program Performance</title><link>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.2-expressing-program-performance/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.2-expressing-program-performance/</guid><description>&lt;h2 class="relative group"&gt;📝 상세 정리
 &lt;div id="-상세-정리" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%83%81%ec%84%b8-%ec%a0%95%eb%a6%ac" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;프로그램 성능 개선을 표현하기 위해 CPE(cycles per element)라는 지표를 도입하겠다.
&lt;ul&gt;
&lt;li&gt;이는 반복 프로그램의 루프 성능을 설명하기 좋다.&lt;/li&gt;
&lt;li&gt;이미지 픽셀 처리, 행렬 곱셈과 같이 반복연산을 하는 프로그램에 적합하다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;앞에서 배웠듯 프로세서에 의한 활동의 순서는 클럭에 의해서 제어되고, 클럭의 각 사이클마다 파이프라인이 하나 진행된다.&lt;/li&gt;
&lt;li&gt;간단한 누적합을 계산하는 코드에서, 한 사이클/반복 당 한개의 항을 계산하는 대신 두개씩 계산하면 반복 횟수를 줄일 수 있다.
&lt;ul&gt;
&lt;li&gt;이를 &lt;strong&gt;루프 언롤링&lt;/strong&gt;이라고 한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;❔질문 사항
 &lt;div id="질문-사항" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#%ec%a7%88%eb%ac%b8-%ec%82%ac%ed%95%ad" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;🔗 참고 자료
 &lt;div id="-참고-자료" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%b0%b8%ea%b3%a0-%ec%9e%90%eb%a3%8c" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>CSAPP 5.3 Program Example</title><link>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.3-program-example/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.3-program-example/</guid><description>&lt;h2 class="relative group"&gt;📝 상세 정리
 &lt;div id="-상세-정리" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%83%81%ec%84%b8-%ec%a0%95%eb%a6%ac" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;앞으로는 벡터 자료구조를 기반으로 예제를 생각할 것이다.
&lt;ul&gt;
&lt;li&gt;벡터는 헤더와 데이터 배열이라는 두개의 메모리 블록으로 표현된다.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;1	/* Create abstract data type for vector */
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2	typedef struct {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;3		long len;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;4		data_t *data;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;5	} vec_rec, *vec_ptr;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;data_t, OP등을 조절해서 최적화 없이 그대로 어셈블리로 옮긴 경우와 -O1을 적용한 경우를 비교하면, int / float 여부, + / * 여부에따라 조금 다르지만 아무튼 최적화가 일어난다.
&lt;ul&gt;
&lt;li&gt;따라서 일반적으로 어떤 수준의 최적화를 활성화하는 습관을 들이는 것이 좋다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;❔질문 사항
 &lt;div id="질문-사항" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#%ec%a7%88%eb%ac%b8-%ec%82%ac%ed%95%ad" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;🔗 참고 자료
 &lt;div id="-참고-자료" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%b0%b8%ea%b3%a0-%ec%9e%90%eb%a3%8c" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>CSAPP 5.4 Elimination Loop Inefficiencies</title><link>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.4-elimination-loop-inefficiencies/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.4-elimination-loop-inefficiencies/</guid><description>&lt;h2 class="relative group"&gt;📝 상세 정리
 &lt;div id="-상세-정리" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%83%81%ec%84%b8-%ec%a0%95%eb%a6%ac" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;함수의 다음 부분을 최적화 해보자.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7		for (i = 0; i &amp;lt; vec_length(v); i++) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;8			data_t val;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;9			get_vec_element(v, i, &amp;amp;val);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;10			*dest = *dest OP val;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;11		}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;vec_length(v)를 매 반복마다 호출해서 비교하게된다.
&lt;ul&gt;
&lt;li&gt;이를 위에서 변수로 저장하고, 그 변수랑만 비교하면 해당 함수로 인한 반복되는 계산을 줄일 수 있다.&lt;/li&gt;
&lt;li&gt;이 최적화를 code motion 이라고 한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;최적화 컴파일러는 이 code motion을 수행하려고 시도하지만, 앞에서 배운것과 같이 어떤 예외사항이 발생할지 모르기때문에 꽤나 신중하다.
&lt;ul&gt;
&lt;li&gt;프로그래머가 명시적으로 수행하는게 좋을 수도 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;극단적인 예시로, 다음을 보자.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;1	/* Convert string to lowercase: slow */
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2	void lower1(char *s)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;3	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;4		long i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;5	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;6		for (i = 0; i &amp;lt; strlen(s); i++)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7			if (s[i] &amp;gt;= `A&amp;#39; &amp;amp;&amp;amp; s[i] &amp;lt;= `Z&amp;#39;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;8				s[i] -= (`A&amp;#39; - `a&amp;#39;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;9	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;10	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;11	/* Convert string to lowercase: faster */
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;12	void lower2(char *s)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;13	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;14		long i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;15		long len = strlen(s);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;16	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;17		for (i = 0; i &amp;lt; len; i++)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;18			if (s[i] &amp;gt;= `A&amp;#39; &amp;amp;&amp;amp; s[i] &amp;lt;= `Z&amp;#39;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;19				s[i] -= (`A&amp;#39; - `a&amp;#39;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;20	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;21	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;22	/* Sample implementation of library function strlen */
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;23	/* Compute length of string */
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;24	size_t strlen(const char *s)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;25	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;26		long length = 0;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;27		while (*s != `\0&amp;#39;) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;28				s++;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;29				length++;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;30		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;31		return length;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;32	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;strlen함수의 시간복잡도가 $O(N)$이라서, $O(N^2)$가 될 정도로 차이가 존재한다.&lt;/li&gt;
&lt;li&gt;하지만 이는 컴파일러가 인식하기 어려운 상황이고, 따라서 프로그래머가 직접 변환을 수행해주어야 한다.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;❔질문 사항
 &lt;div id="질문-사항" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#%ec%a7%88%eb%ac%b8-%ec%82%ac%ed%95%ad" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;🔗 참고 자료
 &lt;div id="-참고-자료" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%b0%b8%ea%b3%a0-%ec%9e%90%eb%a3%8c" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>CSAPP 5.5 Reducing Procedure Calls</title><link>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.5-reducing-procedure-calls/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.5-reducing-procedure-calls/</guid><description>&lt;h2 class="relative group"&gt;📝 상세 정리
 &lt;div id="-상세-정리" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%83%81%ec%84%b8-%ec%a0%95%eb%a6%ac" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;앞서 살펴본 바와 같이 프로시져 호출은 오버헤드를 발생시키고, 프로그램 최적화를 발생할 수 있다.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;8		for (i = 0; i &amp;lt; length; i++) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;9			data_t val;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;10			get_vec_element(v, i, &amp;amp;val);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;11			*dest = *dest OP val;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;12		}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;이런 부분에서, get_vec_element가 매 루프 반복마다 호출되고, 이 안에서 i가 유효한 인덱스인지 매번 검사하는데, 이 또한 비효율적이다.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;1	data_t *get_vec_start(vec_ptr v)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;3		return v-&amp;gt;data;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;4	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;1	/* Direct access to vector data */
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2	void combine3(vec_ptr v, data_t *dest)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;3	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;4		long i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;5		long length = vec_length(v);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;6		data_t *data = get_vec_start(v); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;8		*dest = IDENT;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;9		for (i = 0; i &amp;lt; length; i++) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;10			*dest = *dest OP data[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;11		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;12	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;위와 같이 바꾸면 함수를 호출하는 대신 배열에 직접 접근하고, 더 빨라질 것을 기대할 수 있다.&lt;/li&gt;
&lt;li&gt;하지만 실제로 테스트해보면 명백한 성능 향상을 일어나지 않고, 정수 ADD는 오히려 성능이 저하되었다.
&lt;ul&gt;
&lt;li&gt;5.11.2에서 왜 위의 함수가 성능향상이 발생하지 않는지 확인할 수 있다.&lt;/li&gt;
&lt;li&gt;아직은 이 변환이 궁극적으로 성능향상을 위한 단계중 하나로 작용할 수는 있다는 것 까지만 알아두자.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;❔질문 사항
 &lt;div id="질문-사항" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#%ec%a7%88%eb%ac%b8-%ec%82%ac%ed%95%ad" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;🔗 참고 자료
 &lt;div id="-참고-자료" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%b0%b8%ea%b3%a0-%ec%9e%90%eb%a3%8c" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>CSAPP 5.6 Eliminating Unneeded Memory References</title><link>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.6-eliminating-unneeded-memory-references/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.6-eliminating-unneeded-memory-references/</guid><description>&lt;h2 class="relative group"&gt;📝 상세 정리
 &lt;div id="-상세-정리" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%83%81%ec%84%b8-%ec%a0%95%eb%a6%ac" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.5-reducing-procedure-calls/" &gt;CSAPP 5.5 Reducing Procedure Calls&lt;/a&gt; 에서 최적화한 코드를 어셈블리로 보면 다음과 같다.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;1	. L17:				loop:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2	 vmovsd (%rbx), %xmm0		 Read product from dest
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;3	 vmulsd (%rdx), %xmm0, %xmm0	 Multiply product by data[i]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;4	 vmovsd %xmm0, (%rbx)		 Store product at dest
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;5	 addq $8, %rdx		 Increment data+i
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;6	 cmpq %rax, %rdx		 Compare to data+length
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7	 jne .L17			 If !=, goto loop&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;문제점이 조금 보인다.
&lt;ul&gt;
&lt;li&gt;dest는 %rbx에 저장되어있고, 반복문 인덱스 i는 %rdx에, 루프 종료는 %rax의 값이랑 비교하면서 감지한다.&lt;/li&gt;
&lt;li&gt;dest, 즉 %rbx를 계속해서 읽고쓰고 하고있다!! 그럴 필요가 없는데.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;1	/* Accumulate result in local variable */
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2	void combine4(vec_ptr v, data_t *dest)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;3	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;4		long i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;5		long length = vec_length(v);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;6		data_t *data = get_vec_start(v);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7		data_t acc = IDENT;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;8	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;9		for (i = 0; i &amp;lt; length; i++) {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;10			acc = acc OP data[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;11		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;12		*dest = acc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;13	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;그 파트를 위와 같이 수정해보자.
&lt;ul&gt;
&lt;li&gt;data_t acc가 생긴게 차이점이다.&lt;/li&gt;
&lt;li&gt;메모리를 다시 읽지 않고, 레지스터 하나에서 고정해서 계산할 수 있게 되었으므로 훨씬 빨라진다!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;이는 컴파일러 최적화로 왜 자동으로 잡히지 않았을까?
&lt;ul&gt;
&lt;li&gt;*dest가 v의 원소엿다고 생각해보자. 이런 alias가 발생했다면, 마음대로 바꾸는것만으로는 배열이 훼손되면서 값이 달라지게 된다!!&lt;/li&gt;
&lt;li&gt;따라서 컴파일러는 그런 오류를 방지하기 위해 최적화하지 못하고, 느리게 작동한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;❔질문 사항
 &lt;div id="질문-사항" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#%ec%a7%88%eb%ac%b8-%ec%82%ac%ed%95%ad" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;🔗 참고 자료
 &lt;div id="-참고-자료" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%b0%b8%ea%b3%a0-%ec%9e%90%eb%a3%8c" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;</description></item><item><title>CSAPP 5.7 Understanding Modern Processors</title><link>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.7-understanding-modern-processors/</link><pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260408_til_csapp-5.7-understanding-modern-processors/</guid><description>&lt;h2 class="relative group"&gt;📝 상세 정리
 &lt;div id="-상세-정리" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%83%81%ec%84%b8-%ec%a0%95%eb%a6%ac" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;지금까지는 대상 머신의 기능들을 활용하지 않고, 프로시져의 오버헤드 줄이기, 최적화 방해요소 없애기 등에 초점을 두었다.&lt;/li&gt;
&lt;li&gt;성능을 더욱 극대화하기 위해 프로세서의 명령어 단에서의 기반 설계까지 활용해서 최적화해보자.&lt;/li&gt;
&lt;li&gt;현대 프로세서는 프로그램 성능을 극대화 하기 위해 복잡한 하드웨어를 이용하는데, 이때문에 생각보다 실제 작동 방식이 기계어랑은 조금 다를 수 있다.
&lt;ul&gt;
&lt;li&gt;코드 수준에서는 명령어가 하나씩 순차적으로 수행하는걸로 보이지만&lt;/li&gt;
&lt;li&gt;실제로는 여러 명령어가 동시에 평가되고 있으니까!&lt;/li&gt;
&lt;li&gt;앞에서 파이프라이닝 해봤으니 잘 알지&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;현재 프로그램의 최대 성능을 특징짓는 두가지 하한선이 있는데
&lt;ul&gt;
&lt;li&gt;지연시간 한계
&lt;ul&gt;
&lt;li&gt;한 연산의 결과가 다음 연산이 시작되기 전에 필요할 때.&lt;/li&gt;
&lt;li&gt;연산이 순서대로 발생해야 하는 경우&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;처리량 한계
&lt;ul&gt;
&lt;li&gt;프로세스의 기능 유닛이 가진 원시적인 컴퓨팅 능력&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;5.7.1 Overall Operation
 &lt;div id="571-overall-operation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#571-overall-operation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;❔질문 사항
 &lt;div id="질문-사항" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#%ec%a7%88%eb%ac%b8-%ec%82%ac%ed%95%ad" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;🔗 참고 자료
 &lt;div id="-참고-자료" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%b0%b8%ea%b3%a0-%ec%9e%90%eb%a3%8c" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item><item><title>CSAPP 5.1 Capabilities and Limitations of Optimizing Compilers</title><link>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260407_til_csapp-5.1-capabilities-and-limitations-of-optimizing-compilers/</link><pubDate>Tue, 07 Apr 2026 00:00:00 +0000</pubDate><guid>https://blog.wlgh7407.com/posts/books/csapp/chapter-05-optimizing-program-performance/260407_til_csapp-5.1-capabilities-and-limitations-of-optimizing-compilers/</guid><description>&lt;h2 class="relative group"&gt;📝 상세 정리
 &lt;div id="-상세-정리" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%83%81%ec%84%b8-%ec%a0%95%eb%a6%ac" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;현대 컴파일러는 프로그램에서 어떤 값이 계산되고 어떻게 사용되는지를 결정하기 위해 정교한 알고리즘을 사용한다.
&lt;ul&gt;
&lt;li&gt;식 단순화, 계산결과 재활용, 계산 횟수 감소 등을 지원한다.&lt;/li&gt;
&lt;li&gt;대부분의 컴파일러는 -Og, -O1, -O2&amp;hellip;처럼 최적화 레벨도 지정할 수 있다.&lt;/li&gt;
&lt;li&gt;여기서는 일단 -O1로 컴파일도니 코드를 위주로 고려하겠다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;컴파일러는 프로그램에 대해 안전하지 않은 최적화를 적용해서는 안된다.
&lt;ul&gt;
&lt;li&gt;다음과 같은 함수가 있다고 생각해보자.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;1	void twiddlel(long *xp, long *yp)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;3		*xp += *yp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;4		*xp += *yp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;5	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;6	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7	void twiddle2(long *xp, long *yp)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;8	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;9		*xp += 2* *yp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;10	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;첫번째 함수는 메모리 참조 6번, 두번째 함수는 3번이니 알아서 최적화하지 않을까? 싶다.&lt;/li&gt;
&lt;li&gt;하지만 xp와 yp가 같다면, 그렇게 하면 결과가 달라지게 된다!&lt;/li&gt;
&lt;li&gt;이렇게 두 포인터가 동일한 메모리 위치를 가리킬 수 있는 경우를 &lt;strong&gt;메모리 별칭(memory aliasing)&lt;/strong&gt; 이라고 한다.&lt;/li&gt;
&lt;li&gt;안전한 최적화만 수행하는 컴파일러는 서로 다른 포인터가 겹칠 수 있다 (alias 될 수 있다)고 가정해야 한다.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;❔질문 사항
 &lt;div id="질문-사항" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#%ec%a7%88%eb%ac%b8-%ec%82%ac%ed%95%ad" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;🔗 참고 자료
 &lt;div id="-참고-자료" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#-%ec%b0%b8%ea%b3%a0-%ec%9e%90%eb%a3%8c" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>