Download as pdf or txt
Download as pdf or txt
You are on page 1of 212

Institute of Aeronautics & Space Studies - Blida University (Algeria)

Algorithmic Linear Algebra


and Computational Methods in
Regular Matrix Polynomials
Authored by:
BEKHITI Belkacem

2020
Professor Kamel Hariche
‫بسم هللا الرحمــــــــــــــــن الرحيـــــــــــــــــم‬
‫مقدمــــــــــــــــــــــــــــــــة‬
‫ال و د‬ ‫ال اددن الهلل ال رددن ال دد‬ ‫الحمددهلل ر ا اللددنلمال الالوددسال الالىددسر اددل ودد‬
‫ملال ال ال ل اره الوح ه الطا ال الطنه ال ال لهلل‪:‬‬
‫فدي كرن ده‬ ‫د‬ ‫ار ال د حاد‬ ‫الر ال هلل هللا محمهلل ل مالىل ال الا زمي مؤى‬
‫الالمقن اة ال ال د الل حدل م ه دي لاملدنهللت ال طادة الالر الادة‪ .‬الم رود‬ ‫حىنا ال‬
‫فددي حىددنا ال د الالمقن اددة هددال كرددنا انضددي كرددا حددالالي ددنر ‪830‬م‪ .‬الموددطا ال د‬
‫مورن مل اىر إحهللى اللماادن األىنىداة مد الملدنهللت الردي الودت فدي هداا الكردنا الهدي‬
‫الكىال ‪ .‬ر ر الكرنا إلل السرا اة ل دالال ‪ ،Liber algebrae et almucabala‬مدل‬
‫ال د روىدر (ىداوالفان‪ ،)1145 ،‬ال اضدن ر مده اد ا هلل ال ك امال دن‪ .‬الرال دهلل‬ ‫ط‬
‫الزال‪ .‬الرال دهلل ر مدة‬ ‫اة ف اهللال محتالظة في الكىتال هلل ر مهن نر ‪ 1831‬إ‬ ‫ى ة‬
‫ترا اة محتالظة في كنم اهلل ‪.‬‬
‫دنر ‪1844‬م ود هيرمانن رراسامنن “ ظ ادة اتمردهللاهلل” الردي رضدم‬ ‫فتي اللو الحهللا‬
‫ال طدي‬ ‫ال طي؛ كمن إ ثق هلل اىة ال‬ ‫مالضال ن رأىاىاة هللاهللال لمن اىمل الاالر نل‬
‫ألالل م د ال مددل هلل اىددة المحددهللهللا ‪ ،‬الرددي كن د ُرىددرلمل فددي حددل ظددر الملددنهللت ال طاددة‪.‬‬
‫تا دز ‪ Gottfried Wilhelm Leibniz‬فدي دنر ‪،1693‬‬ ‫الإىرلما المحهللهللا مدل طد‬
‫الفامن لدهلل‪ ،‬اىدر ا غن اادل ك امد ‪ Cramer- James J‬قن دهللال ك امد الردي رمكدل مدل‬
‫حل األ ظمة ال طاة ال كنل الك نر ‪ .1750‬ثدر لدهلل الدك‪ ،‬مدل غانس فدي ظ ادة حاحادة‬
‫األ ظمدددة ال طادددة نىدددرلمنل ط اقدددة الحدددا الونالىدددي‪ .‬ثدددر مدددن ل ثددد ل ظهددد هلل اىدددة‬
‫الموددتالفن ألالل م د ال فددي إ ار د ا‪ ،‬الكددنل الددك فددي ددهللاان الق د ل الرنى د و د ‪ .‬فددي ددنر‬
‫الزادد ىااتىددر ‪ James Joseph Sylvester‬موددطا‬ ‫‪ ،1848‬ددهلل ددام‬
‫‪( Matrix‬منر اك نلسرا اة الالري رر ر إلدل الاودة الل ادة مودتالفة) الرل دي هداك كامدة‬
‫الد حر المراللدهللال م ده الملدنهللت ال طادة‪ .‬فمودطا ‪ Matrix‬ال دي نلاودة السرا ادة الد ح حمر‪.‬‬
‫ر كا دن الرحدالاس‬ ‫ثدال كدنااي ‪ Arthur Cayley‬ادهلل‬ ‫ال هللمن كنل دنلر ال انضدان‬
‫ال طاة‪ ،‬هللى ه الك إلل رل ا ض ا الموتالفن الإلل رل ا ملكال موتالفة من‪ .‬كمن‬
‫دددنلر‬ ‫ال دددهلل اضدددن اللسقدددة الردددي رددد ط المودددتالفن نلمحدددهللهللا ‪ .‬الفدددي ىددد ة ‪ ،1882‬لددد‬
‫ال طي"‪ .‬لــــكل إلل هاا الزمل لر‬ ‫ال انضان اللثمن ي حىال رالفان نون كرن ًن ىمنك "ال‬
‫طاة‪.‬‬ ‫ن ال ل مل ملنهللت‬ ‫ركل الموتالفن هاا المتهالر الحهللا ل كن‬
‫مددن وددال الملددنهللت ال اددة ف وددأ ظمددة الملددنهللت ال طاددة فددي ال ال ددن مد رقددهللار‬
‫اإلحهللاثان في اله هللىة في نر ‪1637‬م الاىطة ا اه هللاكدن ‪René Descartes‬؛ المدل‬
‫ااه الاالر‪ ،‬في هاك اله هللىة ال هللاهللال‪ ،‬الري رىمل اآلل اله هللىة الهللاكن رادة‪ ،‬ده ادرر‬ ‫المرلن‬
‫رمثاددل ال طددالط الالمىددرالان نلملددنهللت ال طاددة‪ ،‬الالطددي حىددنا هدداك الرقنطلددن حاددالت‬
‫تا اددز كط اقددة‬ ‫أل ظمددة الملددنهللت ال طاددة‪ .‬إىددر هللم المحددهللهللا ألالل مدد ال مددل طدد‬
‫م ه اة لحل األ ظمة ال طاة‪ ،‬الكنل الك في نر ‪1693‬م؛ الفدي دنر ‪1750‬م كد اىدر هللر‬
‫هدداك المحددهللهللا ال انضددي غن ااددل ك ام د ‪ Gabriel Cramer‬إل طددنح حاددالل و د احة‬
‫دنال ط اقدة‬ ‫لأل ظمة ال طاة‪ ،‬الالري رىمل اآلل قن هللال ك ام ؛ الفدي القد تحدن‪ ،‬الود‬
‫اإلزالددة‪ ،‬الالرددي رددر إهلل ا هددن فددي ال هللااددة اددل هددن رقددهللر فددي ال االهللاىددان‪ .‬ال ددهلل الرباار الخط ا‬
‫هلل اىدة المر هددن فددي التضددنحا الهللاكن راددة ث نئاددة الثسثاددة األ لددنهلل‪ .‬ثددر رمدهللهلل إلددل مددن اىددمل‬
‫ال طي الحهللا لاأ ا في ات ر ن فضنحا اا لدنهلل ت هنئادة‪ .‬حاد امكدل هلل اىدة‬ ‫نل‬
‫فضنح ودلن ي ده دالل )𝑛( مدل األ لدنهلل الادهلل ل التضدنح ال دال ي‪ .‬كمدن امكدل رالىدا الرلمدار‬
‫ل هلل اىة التضنحا ث نئاة الثسثاة األ لنهلل إلل فضنحا اا لنهلل‬ ‫ملظر ال رنئج الري ر‬
‫ك كثا ل الحرل إلل لنهلل تم رهاة‪ .‬الكمن ه مل الولا غنل ن ر ادل ودلة ال ادة ال لدهلل‬
‫فإ ه مل الممكل ال ظ إلاهن ال هن ن ال ل م مال دن ال ادة م ر دة ر را دن م د هللا دل‬
‫متهالمه اله هللىي حا امكل فهمه الاإلىرتنهللال م ه في رمثال ال ان ن الري ُا اهلل ملنل رهدن فدي‬
‫الكثا د مددل اللاددالر‪ .‬فنألوددلة ددن ال ددل قنئمددة نو د (مكال ددن ) م ر ددة‪ ،‬الالرددي إمكن هددن‬
‫را ددا الملنل ددة ال ان ددن وددكل فلددنل ضددمل هدداا األىدداالا الر اددهلل ‪ .‬ال نل ى د ة لاتضددنح‬
‫ادل ادهلل ‪ Grassmann‬ال‪Cayley‬‬ ‫الودلن ي ال التضدنح ال طدي كمودطا ر ادهلل ظهد‬
‫ال اد ا ودنا الرلدن ا ال هنئادة ‪ ،1888 Giuseppe Peano‬الامكدل ا ر دن ك قىدمن مدل‬
‫فاه‪:‬‬ ‫الر اهلل حا ا ى ر رمنمن م الك الت مل الهلل اىة‪ .‬المل هر مناُهلل‬ ‫ال‬
‫▪ رواا القن هللال‪ ،‬الالرون ه‬ ‫▪ المر هن في ‪ ℝn‬ال ‪ℂn‬‬
‫▪ الرلنمهلل الالرقطا‬ ‫الموتالفن‬ ‫▪‬
‫▪ الحهللالهللان فالن حقل‬ ‫▪ الموتالفن الم لة‬
‫▪ األوكنل القن ال اة‬ ‫▪ ال ل ال اة‬
‫▪ الهللالان ال طاة‪ ،‬الالتضنحالث ال‬ ‫▪ التضنحا الالتضنحا ال زئاة المر هاة‬
‫▪ األوكنل ال طن اة الالر الاة الاله ماراة‬ ‫▪ الر ا ط ال طي‪ ،‬القن هللال‪ ،‬ال ُلهلل‬
‫▪ المؤث ا ال طاة ال فضنحا ال هللاح الهللا اي‬ ‫▪ الرط اقن‬
‫▪ رط اقن في اله هللىة الالحىنا ال قمي‬ ‫▪ الرط اقن ال طاة‬
‫▪ فضنحا الرط اقن ال طاة‬
‫‪------------------------------‬‬ ‫▪ الموتالفن الالرط اقن ال طاة‬
‫األشكنل الخطننية تعن (ثننئية الخطية)‬
‫كز ظ ان الموتالفن الم ك ال ال هللال المحهللهللا هللت ل الموتالفن وكل‬
‫مىرقل‪ .‬اللر اظه متهالر الموتالفة وكل مىرقل حرل ى ة ‪ 1858‬الالك م ثال كنااي‬
‫ال ظ انره حالل الموتالفن ‪ .‬ظ اة الموتالفن هي ف ال انضان الا ا كز ال‬
‫ال طي‪ ،‬ثر مل لاوطي‬ ‫هلل اىة الموتالفن الفلا حان رلر الموتالفن حهلل ف ال ال‬
‫سقة ظ اة الم ططن الال ‪ ,‬الالرالافقان الاإلحونح‪ .‬في ى ة‬ ‫مالضال ن اا‬
‫الزا ىااتىر‬ ‫ام‬ ‫‪ ،1848‬ا رك موطا الموتالفة نل ُر ال انضان اإل ااز‬
‫اىمن لم مال ة م ر ة مل األ هللاهلل ال طان ااه مىمل الموتالفة‪ ،‬إاا فنلموتالفة رمثل‬
‫م ظالمة م لة (‪ )rectangular array‬مل األ هللاهلل‪ .‬الفي ‪ ،1855‬قهللر ا ث كنااي‬
‫ال طل ال هللااة‬ ‫هللااة ال‬ ‫الموتالفة ال هن رمثال لل نو طاة‪ .‬هاك التر ال ا ر‬
‫ظ اة الموتالفن ‪ .‬في الكر الالق هلل هلل اىة فضنح المر ه ال م نل المحهللهلل إلل ظهال‬
‫ال طل اتاهلل في ظ اة الروتا ‪ ،‬الاقالهلل ط ا لان إلل هلل اىة الاىر هللار‬ ‫هللاهلل مل ال‬ ‫ف‬
‫ل الم نل المحهللهلل في ظ اة الروتا ‪.‬‬ ‫الموتالفن‬
‫في مطا ى ة ‪ 1858‬و ‪ Arthur Cayley -‬ط الحة حالل الرحالت اله هللىاة نىر هللار‬
‫الموتالفن الري لر ركل ى ً ن مر نال ة مل الملنمس الري ارر الرحقان فاهن كمن رر ىن ًقن ‪.‬‬
‫ماان مثل ال م الالط ح الالض ا الالقىمة ال هن‬ ‫ال هللتً مل الك ‪ ،‬قنر رل ا‬
‫الر مالاة الالرالزالاة محققة ‪.‬ال احل‬ ‫ل ال ونئ‬ ‫ال ظه‬ ‫لراك الموتالفن‬ ‫رحالاس‬
‫‪ Cayley‬ال نواة غا الر نهلللاة ل هللاح الموتالفة نإلضنفة إلل ال نواة الر نهلللاة إلضنفة‬
‫ظ اة الموتالفة الم ك ال ال اىر هللار الموتالفن وكل حو‬ ‫الموتالفة‪ .‬الاقرو‬
‫ماان الموتالفة الم هللال آل ث كنااي ثال ال اماة مل ل الكامة‪.‬‬ ‫رق ا ًن لامحهللهللا الكن‬
‫كنل له هللال فلنل في اقر اح متهالر موتالفة مىرقل ل ظمة الملنهللت ‪ .‬في نر ‪1858‬‬
‫اقر ح آ ث كنااي ظ اة كنااي هنمارالل‪ .‬ال لهلل الك نح اللنلر ال انضي اإل ااز الالا‬
‫اُهلل ل ‪ C. E. Cullis‬فكنل الل مل هلل ل األقالا في الر ماز الحهللا لاموتالفن في نر‬
‫‪ 1913‬القهلل ظه في ت الالق الل اىر هللار مهر لا مز ] 𝑗𝑖𝑎[ = 𝑨‪ ،‬لرمثال موتالفة حا‬
‫هاك الى ة الل ى ة رر فاهن ركمار الموتالفن‬ ‫اوا 𝑗𝑖𝑎 ‪ ،‬إلل الو 𝑖 ال اللمالهلل 𝑗 ‪ .‬فن ر‬
‫لرهلل ل نلـــــــــر ال انضان الرط اقاة‪ .‬في الق ل الرنى و وأ الهلل اىة الحهللاثة‬
‫لامحهللهللا مل هللال مونهلل ‪ .‬فطال غال ‪ 1855-1777 Carl Friedrich Gauss‬هاك‬
‫المتنهار المل لهللك طال آاز ورنال ‪ 1852-1823 Gotthold F Eisenstein‬هاك‬
‫المتنهار وكل ك ‪ ،‬من في الك المسحظة القنئاة أل هللاح موتالفرال‪ ،‬في الاوة الحهللاثة‪،‬‬
‫غا ر نهلللاة‪ .‬ال كنل الت ىي كالوي ‪ 1857-1789‬الل مل ث الل ن ا اللنمة حالل‬
‫لمحهللهلل الموتالفة ] 𝑗𝑖𝑎[ = 𝑨‪ .‬كمن ال كالوي ‪Augustin-‬‬ ‫المحهللهللا ‪ ،‬مىر هللمًن كرل ا‬
‫‪ ،Louis Cauchy‬في نر ‪ ،1829‬ل القار الااراة لاموتالفن المرمنثاة حقاقاة‪ .‬الهلل‬
‫نكال ي ‪" 1851-1804 Carl Gustav Jacob Jacobi‬المحهللهللا الالظاتاة" ‪ -‬الري‬
‫نكال ي مل ق ل ‪ -James Joseph Sylvester‬الالري امكل‬ ‫طان ااهن تح ًقن محهللهللا‬
‫اىر هللامهن لالو الرحالت اله هللىاة ال مىرالى محاي ( ال مر نهي الوو )‪ .‬الرر إ ونح‬
‫اللهللاهلل مل ال ظ ان ألالل م ال لاموتالفن الووا ال فقط‪ ،‬ال ى ال المثنل‪ ،‬رر إث ن‬
‫ظ اة كنااي هنمارالل لموتالفن ‪ 2×2‬الاىطة كنااي في الماك ال الماكال ال سك‪ ،‬المل‬
‫مل ال األوكنل ث نئاة‬ ‫‪ ،Frobenius‬الا‬ ‫‪ .4×4‬قنر ال نح‬ ‫سل هنمارالل لموتالفن‬
‫و ‪ ،‬قنر‬ ‫ال طالط‪ ،‬رلمار ال ظ اة ال ما األ لنهلل ‪ .1898‬اضًن في هناة الق ل الرنى‬
‫إ ونح ط اقة ‪ Gauss–Jordan elimination‬الالري هي رلمار لط اقره األاللل‬ ‫غال‬
‫‪ Gauss elimination‬الفي الائل الق ل اللو ال‪ ،‬اكرى الموتالفن هللال ً ا م كزاًن‬
‫ــــظمة األ هللاهلل‬ ‫الك زئاًن إلل اىرـــــــ هللامهن في رو ا‬ ‫ال طي ‪ ،‬الا‬ ‫في ال‬
‫المت طة في اللقهللاة ال من اىمل ــ‪ :‬منفالن اللقهلل ‪ Hypercomplex‬لاق ل الىن ن‪.‬‬

‫ال طي ىنىي ل ما م نت ال انضان رق ا ًن؛ ال ى ال المثنل‪ ،‬الهلل ال د‬ ‫فالر ال‬


‫ال طي مد ً ا ىنىداًن فدي اللد ال الرقهللامادة الحهللاثدة لاه هللىدة‪ ،‬مدن فدي الدك رحهللادهلل الكنئ دن‬
‫األىنىاة ؛ اضًن‪ ،‬امكل ا ر ن الرحاال الالظاتي‪ ،‬الهال ف مل ف ال الرحاال ال انضي‪ ،‬في‬
‫ضدن فدي‬‫ال طي ال ف اغدن الالظدنئ ‪ .‬كمدن اىدر هللر ال د ال طدي ا ً‬ ‫األىن رط ا ًقن لا‬
‫ملظر م دنت اللادالر الاله هللىدة‪ ،‬أل ده امك ده ما دة اللهللادهلل مدل الظدالاه الط الادة‪ ،‬الامك ده‬
‫اىر هللار هاك ال منا إل اح حىن ن فلنلة؛ الفامن ارلان نأل ظمة غا ال طاة الري ت امكل‬
‫ال طي‪ ،‬فإ ه ارر اىر هللامهن دنهلل ًال لارلنمدل مد الرقدهللا ا الرق ا ادة‬ ‫ما رهن ل ط ان ال‬
‫مل الهلل ة األاللل‪ ،‬ألل رمناز هللالة مرلهللهللال المروا ا في قطة مدن هدال فضدل رلادال طدي‪.‬‬
‫فمل حا الرن اخ‪ ،‬إىرلمل متهالر المحهللهلل ق ل ا ر ن الموتالفن كثا ‪ ،‬في األول اىرلمل‬
‫نوداة أل ظمدة الملدنهللت ال طادة إا ده امكدل‬ ‫ً‬ ‫المحهللهلل الهدال مدن راللدهلل مدل حدر المودتالفة ك‬
‫طاة من حل الاحهلل مل هللمه‪ ،‬الاكالل لهاا ال ظدنر حدل‬ ‫الرحقن ممن إاا كنل لـــ ظنر ملنهللت‬
‫الاحهلل إاا كنل المحهللهلل م راتن ل الوت ‪ .‬كمن راك ل المودنهلل ده اىدرلمل هداا المل دل‬
‫كرددنا ال انضددان الوددا ي المل ددالل ـددـ‪ :‬الددهلل ال الرىددلة حددالل فددل‬ ‫ألالل م د ال مددل ط د‬
‫ال انضان (الالا ُكرا في حالال الق ل الدثس ق دل المداسهلل)‪ .‬الفدي ال ال دن‪ ،‬ا ُر د المحدهللهلل‬
‫و ‪ ،‬الا ُر المحهللهلل مدل‬ ‫نلر ال انضان كن هللا ال في هناة الق ل الىنهلل‬ ‫‪ 2×2‬مل ط‬
‫نلر ال انضان األلمن ي تا ز‪.‬‬ ‫مل الك مل ط‬ ‫ح ر ك‬
‫إهلل نل‬ ‫ال طي الاله هللىة‪ ،‬الالري هلل‬ ‫ف ك ال قالل ه ه نك سقة قالاة ال ال‬
‫في نر ‪1637‬م؛ الفي هاك اله هللىة ال هللاهللال‬ ‫اإلحهللاثان الهللاكن راة الاىطة ا اه هللاكن‬
‫(في الك الالق )‪ ،‬الري رىمل اآلل اله هللىة الهللاكن راة‪ ،‬ارر رمثال ال قنط الاىطة اإلحهللاثان‬
‫الهللاكن راة‪ ،‬الهي ن ال ل رىاىس مل ثسثة قنر حقاقاة (في حنلة التضنح ثسثي األ لنهلل‬
‫الملرنهلل)؛ كمن ارر رمثال الكنئ ن األىنىاة لاه هللىة‪ ،‬الهي ال طالط الالمىرالان نلملنهللت‬
‫ال طاة؛ ال نلرنلي‪ ،‬فإل حىنا رقنطلن ال طالط الالمىرالان ا قل إلل حل ظمة‬
‫ال طي الالا اُىر هللر في‬ ‫الملنهللت ال طاة؛ الكنل هاا حهلل الهللالاف ال ئاىاة لرطالا ال‬
‫ما م نت ال انضان رق ا ًن‪ ،‬الاىر هللامه المت ط في غاا ف ال ال انضان كنل ممن‬
‫لاه الثان الواة ما الم نت الرط اقاة الري رىر هللر ال انضان رق ا ًن؛ الإاا ظ ن‬
‫إلل هاك الم نت فإ ه امكل رقىامهن إلل هللال فئن الاىلة مثل اله هللىــــــــــــــة الت اغاـــة‬
‫‪ geometry Space‬الالرحااــــــل الالظاتي ‪ functional analysis‬الهلل اىة األ ظمة‬
‫المــــلقهللال ‪ Study complex analysis‬الالحىـــــــــنا اللامــــــــــــــي ‪Scientific‬‬
‫‪ computations‬ال غا هن كثا ‪ ...‬فمل الم ظال الرط اقي مثس ررضمل ما الحىن ن‬
‫ال طي وكل ك ا ؛‬ ‫ال طي؛ ال نلرنلي‪ ،‬رر رحىال الا زمان ال‬ ‫اللاماة رق ا ًن ال‬
‫الالهلل كل مل ‪ BLAS‬ال ‪ LAPACK‬فضل الرط اقن المل الفة؛ اللرحىال الكتنحال‪ ،‬اقالر‬
‫مل ال نحثال ركالال ال الا زمان راقنئاًن‪ ،‬الفي الق الرووال‪ ،‬لركااتهن م‬ ‫ال ل‬
‫والوان الكم االر (ح ر ااك ال الر زال المؤق ‪ ،‬هللهلل ال الى المرنحة‪.)… ،‬‬
‫ال مـــــــــددـن ال د ال طددي اللددهللهلل ‪ ،‬الددا اطاددن اادده حان ً ددن ال د ال طددي الرط اقددي ‪ ،‬هددال‬
‫هلل اىة كاتاة اىر هللار ماان الموتالفة إل ونح الا زمان كم ادالر ردالف إ ن دن رق ا ادة‬
‫لألىئاة المط الحة فدي ال انضدان المىدرم ال ال كتدنحال الهللقدة نلادة‪ .‬إل م دنل ال د ال طدي‬
‫ال طي‪ .‬فتاه رىدر هللر هدزال الكم ادالر‬ ‫اللهللهلل ف مل ف الالرحاال اللهللهلل ال ال مل ال‬
‫مددن اىددمل دـ ‪ floating-point arithmetic‬الت امك هددن رمثاددل ال ان ددن غا د الم طقاددة‬
‫‪ ،irrational data‬لــــالك هلل رط ان الا زماة الكم االر ادل مودتالفة مدل ال ان دن ‪،‬‬
‫امكل حان ً ن زانهللال الت ن ال ال قر الم زل في الكم االر الاللهللهلل الحقاقدي الدا امثدل رق ا ًدن‪.‬‬
‫اىددر هللر ال دد ال طددي اللددهللهلل وددنئ المر هددن الالموددتالفن لرطددالا الا زماددن‬
‫الكم اددالر الرددي رقاددل مددل ال طددأ الدددا اقهللمدده الكم اددالر ‪ ،‬الاهددرر اضً ددن ضددمنل ل ركدددالل‬
‫ال طي اللهللهلل إلل حل مونكل ال انضان‬ ‫ال الا زماة فلنلة قهلل اإلمكنل‪ .‬كمن اههلل ال‬
‫المىرم ال نىر هللار هزال كم االر اا هللقدة محدهللالهللال‪ ،‬لداا فدإل رط اقنرده فدي اللادالر الط الادة‬
‫الات رمن اة الاىلة مثدل رط اقدن ال انضدان المىدرم ال‪ .‬غنل ًدن مدن اكدالل دزحًا ىنىداًن مدل‬
‫مودددنكل اله هللىدددة الاللادددالر الحىدددن اة ‪ ،‬مثدددل ملنل دددة الودددال الاإلودددن ا ‪ ،‬الاترودددنت ‪،‬‬
‫الالرمالادددل الحىدددن ي‪ ،‬المحنكدددنال ادددالر المدددالاهلل‪ ،‬الال االلال ادددن الهاكاادددة‪ ،‬الاىدددر ا ال ان دددن ‪،‬‬
‫الالملاالمنراة الحاالاة‪ ،‬الهللا نماكن الىالائل‪ُ .‬رىدر هللر طد ن المودتالفة ودكل دن فدي طد ن‬
‫التددد الن المحدددهللالهللال الطددد ن الل نوددد المحدددهللالهللال ال ما دددة الملدددنهللت الرتنضدددااة‪ .‬الررضدددمل‬
‫ال طي اللدهللهلل ‪ :‬الحودالل ادل رتكادك ال رحاادل المودتالفن‬ ‫الموكس الملهالهللال في ال‬
‫إلل دهللاح مثدل رحااــدـل القدار المتد هللال ‪ ،singular value decomposition‬ال الرتكادك‬
‫‪ ،QR‬ال الرتكاددك إلددل ‪ ،LU‬ال ركددالال ‪ ،Eigen-decomposition‬الالددا امكددل اىددر هللامه‬
‫لهلل الك لإل ن ة ال الموكس ال اة ال طاة الونئلة مثل حل ظمة الملنهللت ال طادة‪،‬‬
‫ال رحهللاهلل القار الااراة ‪ ،Eigenvalues‬ال رحىال الم لن الوو ى ‪Least Squares‬‬
‫‪ .Optimisation‬الاتهرمدددنر ال ئاىدددي الاألىنىدددي لا ددد ال طدددي اللدددهللهلل هدددال رطدددالا‬
‫ال الا زمان الري ت رقهللر طنح هلل رط اقهن ادل ان دن حقاقادة ادل هدنز كم ادالر ا‬
‫هللقة محهللالهللال مل سل األىنلاا الرك ا اة هللتً مل الط ن الم نو ال‪.‬‬
‫ردر رطددالا ال د ال طددي اللددهللهلل مددل ق ددل الاهلل الكم اددالر مثددل ددالل فددالل االمددنل ‪ ،‬ال تل‬
‫رددال ج ‪ ،‬ال ددام إرددل الااكا ىددالل ‪ ،‬ال لىددرالل ىددكال هنالىددهاللهلل ‪ ،‬ال ددال فال ىددنا ‪،‬‬
‫الهنا ز الراوالز ‪ ،‬مل ل رط ان قدهللر هدزال الكم ادالر ادل المودكس فدي ال انضدان‬
‫المىرم ال‪ ،‬مثل مونكل المقاالفن ال حاالل ظمة الملنهللت الرتنضااة ال زئادة‪ .‬الل محناللدة‬
‫ال ال ان ن الحقاقاة هي مدل دالل‬ ‫نهللال لرقاال طنح الكم االر في رط ان ال الا زمان‬
‫فالل االمنل الها منل اللهللىرال في نر ‪ .1947‬ال هللا نلاك في هاا المقنر ل ده ادل‬
‫م هنر الهال ه من هاا الم نل نلرزامل م رقدهللر الرك اللال ادن الالحنىدالا الرمكدل ال دنحثالل‬
‫وكل مرزااهلل مدل حدل المودكس الملقدهللال ادل مودتالفن ك اد ال لاونادة نلادة الهللقدة‪ ،‬ال مد‬
‫لاددد مدددل رق ادددن الحالىددد ة المرالازادددة‬ ‫لددد ال الا زمادددن اللهللهللادددة ودددكل دددن ز حاددد‬
‫مااة لحل المونكل اللاماة‪ .‬ال من ل هاا اللادر رطدال دنلرزامل مد الحنىدالا فإ ده‬ ‫م ه ان‬
‫نلاك اضن ل ركار ل وأال الرطال ال الا زمان في هاا الزمل‪.‬‬ ‫مل ال هللا‬
‫■ الت‪ :‬لــــدـوة ‪ Algol‬هدي ا رودن لكامردي “‪ “Algorithmic Language‬الهدي الاحدهللال‬
‫نلاة المىدرالى الم وودة لا م دة اللامادة الالحىدن اة حاد ودمم الاودة مدل‬ ‫مل الاون‬
‫ق ل ل ه نلماة لكي رو لوة نلاة المىرالى‪ .‬اللقهلل رر رطالا هن نر ‪ 1950‬حا ط ح‬
‫ال وكل رق ا ل الال ”‪ “Algol 58‬الرطال مل سل الرقن ا إلل ‪ Algol 60‬المدل ثدر‬
‫‪ ،Algol 68‬الكددنل لهددن رددأثا ك اد اددل الاوددن األ د ى‪ .‬لوددة ‪ Algol‬رلر د هددر لوددة فددي‬
‫هدن‬ ‫و هن مل نحاة رأثا هن القال ال الاودن القنهللمدة حاد ل ظدنر المتد هللا ال دن‬
‫و وهللاهلل الوه ال لهلل ة ة فلاان ً ل لون ال م ة اقدنل هدن ود اهة لاودة ‪ .Algol‬المدل‬
‫ردنئج ال ثدن هداك الاودة ال م اددة ظهدال كدل مدل‪ (Pascal 1961) :‬ثدر )‪ (C 1972‬ثددر‬
‫)‪ (C++ 1979‬ثدر )‪ (Python 1991‬ال هدر م دنل لاودة ‪ Algol‬كدنل اىدر هللامهن لأل حدن‬
‫اللاماددة الالحىددن ن الاىددطة اللامددنح فددي ال ال ددن ال ما كددن‪ .‬لوددة ‪ Algol‬هددي الل لوددة تددا‬
‫رل اتدددددن الهللالدددددة المرهللا ادددددة مددددد طدددددنن المتددددد هللا ال كن ددددد الل لودددددة نلددددد اهرمنمدددددن ً‬
‫نلوـــــــــــــــــــــــــن‪.‬‬
‫■ ثن اـــدددـن‪ :‬فدددال ر ال ( نإل اازادددة‪ )Fortran :‬هدددي لودددة م دددة مرلدددهللهللال اتىدددر هللامن‬
‫الا رودددن لكامردددال فدددي اإل اازادددة (‪ )FORmula TRANslation‬مل نهدددن ر مدددة‬
‫الملدنهللت الررمادز هداك الاودة نل ىدنطة الاإلا ددنز الالمقدهلل ال ادل الرتن دل الحىدن ي حرددل ال‬
‫ودـلن هن هددال وددلن الهللالددة‪ .‬المددل هددر ونئوددهن اتىددرهللامة فددي الرطددال إا هددن مددل د ز‬
‫دالل دنكال ( ‪John‬‬ ‫الاون الري قا حاة ألكث مل و ق ل‪ .‬في دنر ‪ 1954‬د‬
‫‪ )Backus‬م د م مال ددة مددل و د كة آ ددي إر ‪ IBM‬اوددة التددال ر ال الكن د اللددل لوددن‬
‫ال م ة االا المىرالى اللنلي الرىر هللر ىنىن في الرحااس اللهللهللاة الفي الحالى ة اللاماة‪.‬‬
‫■ ثنلثــــن‪ :‬الفي ى ة ‪ 1970-1969‬رر رطالا لوة ال م ة نىكنل ىنىن مل ق ل اكا‬
‫‪ .IFIP‬قنر‬ ‫‪ Worth Nicholas‬الهال ضال التهلل الاة اللنلماة لملنل ة ال وال‬ ‫ال‬
‫رطالا نىكنل لرحال الممازا الري ر اال م هن لون ال م ة في‬ ‫ال فىال اكا ال‬
‫الك الالق ‪ .‬الهي مل الفة الضالحهن القالرهن الىهاللة إ ونح ال امج ل ط اقهن‪ .‬الهال من‬
‫ا لل مل نىكنل الاوة ال م اة األكث اىرلمنت في الرهلل ا حرل األالقن الحنلاة‪ .‬ال‬
‫نإلضنفة إلل ىهاللرهن القالرهن‪ ،‬رمرنز لوة نىكنل لهللال قالاىر مور كة م لوة الـىي‪.‬غا ل‬
‫نىكنل كنل قهلل ومر في ال هللااة ألى نا رلااماة حرة الكنل مقرو ا ال نواحن محهللالهللال‬
‫هللا (فلال ى ال المثنل لر اكل ه نك ال الهلل لىسىل األح )؛ ال نإلضنفة لكل الك فإل كل‬
‫ال ىخ المىرلماة ل نىكنل ن اللنلر الرلاامي هي ىخ ملهلللة مثل ر ال نىكنل‪ ،‬هلللتي‪،‬‬
‫ال ك نىكنل‪.‬‬
‫‪1969‬‬ ‫فدي ملنمدل دل فدي دنر‬ ‫م ة طال‬ ‫ا لـــن‪ :‬لوة ي ( نإل اازاة‪ )B :‬هي لوة‬ ‫■‬

‫الهدددي اآلل رق ا دددن م ق ضدددة الحدددل محاهدددن لودددة ىدددي ( نإل اازادددة‪ ،)C :‬ا رك هدددن اللنلمدددنل‬
‫اروددي ال كددال رالمىددل الاىددطة لوددة ال م ددة األىنىدداة الم راطددة‪ .‬ثددر‬ ‫األم اكاددنل هللا دا‬
‫ومم لوة الىدي فدي م ر د ا ادل مدن دال دنمي ‪ 1972‬ال ‪ ،1973‬الىدما نلىدي ألل‬
‫زحاً ك ا اً م هن اقر مل لوة ي‪ ،‬القهلل رطال هاك الاودة إلدل هلل دة ود ح فاهدن دنلقالال‬
‫الكنفاة لرحالال الاال اال ك مل لوة الر ما (األىم اي) إلل الىدي‪ .‬ال لدالت ظهدال لودة الىدي‬
‫الر رهن الري فنق الثسثال نمن ً لمن كنل مل الممكدل ظهدال الاودن الحهللاثدة الالودائاة مثدل‬
‫اا مه الادالر‪ .‬ىدنهر فدي رطدالا الاودة كدل مدل‬ ‫الىي‪ ++‬ال ال األقل لمن و ح كمن رل‬
‫و كة مناك الىدالف الود كة آ دي إر ملدن ً اللدالك فدإل األىنىدان اللنمدة ألىداالا كرن رهدن‬
‫رىر هللر إلل اتل الفي اللهللاهلل مل الاون المرطال ال الالم وأال مل سلهن حرل ا لون الىدي‬
‫الهي لوة الىي ون ا ‪ C #‬الالري رل حهلل حهلل رطال لاوة الىي‪.‬‬
‫■ نمىــددـن‪ :‬لوددة ىددي‪( ++‬ر طددن‪ :‬ىددي ا د ا د ) ( نإل اازاددة‪ )C ++ :‬هددي لوددة م ددة‬
‫كنئ اة‪ ،‬مرلهللهللال مدنط ال م دة‪ ،‬مود فة‪ ،‬ىدكال اة األ مدنط‪ .‬الرضدر اللهللادهلل مدل مادزا لودن‬
‫امرهللاهللا لاوة ىي رح اىر‬‫ً‬ ‫ال م ة نلاة المىرالى الم تضة المىرالى‪ .‬هلل رطالا هاك الاوة‬
‫(ىي م األو ن ) مل ق ل ان ل ىر الىر الا في م ر د ا دل دنر ‪ 1979‬الردر إضدنفة‬
‫اللهللاهلل مل المازا األ ى تحقن ً الرواد اتىدر دنر ‪ 1983‬لاود ىدي‪ ++‬رلر د ىدي‪++‬‬
‫اددل طددنن الاى د فددي ددنح ظمددة الروددوال‬ ‫إحددهللى الاوددن األكث د ودداال ن ً القددهلل اىددر هللم‬
‫الالرلنمل م ال اة الوا ة لاحنىالا ا رهللا ًح مل األ ظمة ال م اة ال امج المىدر هللر‪ ،‬مد ال اً‬
‫مودددوس األ هدددزال الاأل ظمدددة المضدددم ة الا رهدددن ًح دددنل الاهللر نلادددة األهللاح ال ددد امج الرىدددااة‬
‫كنألللنا التاهللاالاة ظ اً لقهلل ال رو اتهن إلل كالهلل لوة ر ما وهللاهلل التلنل‪.‬‬
‫■ ىنهللىــددـن‪ :‬لوددة ددناثالل ( نإل اازاددة‪ )Python :‬هددي لوددة م ددة‪ ،‬نلاددة المىددرالى ىددهاة‬
‫الرلار مترالحة المودهلل قن ادة لارالىدا ‪ ،‬رلرمدهلل ىداالا ال م دة الكنئ ادة (‪ .)OOP‬لودة دناثالل‬
‫هددي لوددة مُت َّىد ال‪ ،‬المُرلدهللمهللال اتىددر هللامن ‪ ،‬الرىددر هللر وددكل الاىد فددي اللهللاددهلل مددل الم ددنت ‪،‬‬
‫ك دددنح ال ددد امج المىدددرقاة نىدددر هللار الالا هدددن ال ىدددالماة الفدددي رط اقدددن الالادددا‪ ،‬الامكدددل‬
‫اىددر هللامهن كاوددة م ددة ودداة لاددرحكر فددي هللاح اللهللاددهلل مددل ال م اددن ‪ .‬وددكل ددنر‪ ،‬امكددل‬
‫اىددر هللار ددناثالل للمددل ال د امج ال ىدداطة لام رددهللئال‪ .‬الإل ددنز الموددن ا الضد مة فددي الالقد‬
‫تىه‪ .‬غنل ن ً مدن ُا ود الم ردهللؤالل فدي مادهللال ال م دة درلار هداك الاودة أل هدن مدل دال ىد‬
‫ناثالل في ملهدهلل ال انضدان الالملاالمنرادة الهالل دهلل (‪)CWI‬‬ ‫الاون ال م اة رلامن ً‪ُ .‬‬
‫طالِّ‬
‫في مهللا ة مىر هللار ال اهلل ناهللال فنل الىر في الا ثمن ا ن الق ل اللو ال‪ ،‬الكدنل الل‬
‫إ سل هن في نر ‪1991‬ر‪ُ .‬كر الاال الاوة نىرلمنل لوة ىي‪.‬‬
‫■ ىن لــدـن‪ :‬مدنرسا (‪ Matrix-Laboratory MATLAB‬ال دي م ر د المودتالفن )‬
‫هال نمج ائهلل في الرط اقن اله هللىاة الال انضاة مل إ رن و كة منثالال ك ؛ ‪MATLAB‬‬
‫اىددددم ‪ :‬نلرس ددددا حىددددن ان ً نلموددددتالفن ‪ ،‬ال نل ىددددر ال اددددن ي لارالا دددد ال انضدددداة‪ ،‬ال ر تاددددا‬
‫ال الا زمادددن الم راتدددة‪ ،‬الإ ودددنح الا هدددن المىدددر هللر ال ىدددالماة‪ ،‬الالرالاودددل مددد ال ددد امج‬
‫ى‪ ،‬من في الك ‪ ،C ++ - C‬نفدن‪ ،‬الفدال ر ال‪ .‬ردر رأىدا هداا ال دنمج‬ ‫المكرال ة اون‬
‫اددي اددهلل ددنلمال الهمددن كلي ا مااسلر س راان ليتاال ‪ .‬الكددنل كاا د مددالل ددااك المددل كنىددرنا‬
‫انضان ال االر الحنىا الهال المؤل ال ئاىدي ل دنمج المدنرسا ال مضدي مىدة ىد الا‬
‫في محناللة رو ا الإ ونح هاا ال نمج ثر إىرلنل ال الك زماادة دنك لاردل (مه دهلل فدي‬
‫رالمنراكن) الهال الا الض ر طاط نمج المنرسا‪ .‬مــــن‬ ‫الكه نح الاإللكر ال اك ر و‬
‫ددل كاا د ددن مدداللا ( نإل اازاددة‪ )Cleve Barry Moler :‬فهددال ددنلر انضددان‬
‫الم مج حنىا مر و في الرحاال اللدهللهلل ‪ .‬مدل م رود إلدل آ د الىد لا ن كدنل حدهلل‬
‫المددددؤلتال لددددـ ‪ LINPACK‬ال ‪ EISPACK‬الهمددددن مكر رددددنل م ارددددنل لاحىددددن ن اللهللهللاددددة‬
‫مكرال رنل اوة فال ر ال‪ .‬هال م ر منرسا الهال حزمة رط اقن لاقادنر نلحىدن ن اللهللهللادة‪،‬‬
‫الكددنل قددهلل ا ر دده لارددا لطا ردده فددي نملددة االمكىدداكال الوددالت ىددهس لامكر ددن ال م اددة‬
‫الىن ن اك هن ىهاللة الاى ‪ .‬في نر ‪ 1984‬قنر كاا ن مداللا نتودر اك فدي رأىدا‬
‫مدل ال دنمج‪ .‬حودل كااد دن مداللا ادل‬ ‫و كة منثالال ك م نك لارل واة الر‬
‫هلل ددة ال كددنلال اال فددي ال انضددان مددل ملهددهلل كنلاتال اددن لارك اللال اددن ددنر ‪ 1961‬ال اددل‬
‫هلل ة الهللكرال اك مدل نملدة ىدرن تال هلل‪ .‬كمـدـن هلل ح ال انضدان ال ادالر الحنىدا لمدن اقدن ا‬
‫اللوددد ال نمدددن فدددي نملدددن مارودددا ال الىدددرن تال هلل ال االمكىددداكال‪ .‬ق دددل ل ا ضدددر لوددد كة‬
‫مددنثالال ك ددهللالار كنمددل فددي ددنر ‪ 1989‬كددـنل قددهلل مددل لمددهللال مدد ىدد الا لددهللى إ رددل‬
‫هــدـنا كاالا ‪ Intel Hypercube organization‬اللدهللى مؤىىدة هللا د لاحالىد ة‬
‫‪ .Ardent Computer‬سل هلل اىره كدنل ودهلل القد ا مدل ال انضدان المط قدة ال لدهلل مدن‬
‫اكــــــــالل ل ال انضان الم هللال الهللائمن من كنل رحودااه فدي رادك المدالاهلل فدي ال انضدان‬
‫الم هللال قااس (لألى !)‪ .‬في ىرن تال هلل رراما ادل ادهلل دال فال ىدا الهدال انضدي م د ز‬
‫ا طدة‬ ‫في هداا التدل الالدا الر د م ودأ ال ئدا قىدر ادالر الحنىدا فدي ىدرن تال هلل ال ئدا‬
‫مكنئل الحالى ة (‪ )ACM‬لتر ال مل التر ا ‪ .‬ال من نك لارل فكنل الد ئا الالمدهللا الر تادا‬
‫ال حهلل المؤىىال لو كة (‪ )Mathworks‬إضنفة إلل و ك ل نمج منرـسا مــــــ وهللاقه‬
‫كاا ماللـ ال و ك و هللالن هللاال ملنلــ ة اإلوــدـن ال (‪)Signal Processing-Toolbox‬‬
‫الو هللالن هللاال الرحكر نأل ظمدة (‪ ، )Control-Systems-Toolbox‬حنودل ادل هلل دة‬
‫الزمنله مل ملههلل مه هللىي الكه نح الاتلكر ال ان (‪ )IEEE‬إضنفة إلل وهنهللال ال كنلال اال‬
‫‪ 1978‬فدددي اله هللىددده الكه نئاددده مدددل ملهدددهلل منىنروالىدددر لارك اللال ادددن (‪ )MIT‬الودددهنهللال‬
‫المن ىرا ‪ 1980‬مل نملة ىرن تال هلل (‪.)Stanford‬‬
‫ى) اقالر لماان رحاال الرمثال‬ ‫نمج المنرسا هال نمج ه هللىي (الله م نت‬
‫ال ان ن مل سل ملنل ة راك ال ان ن ر لن ً لقن هللال ال ان ن ال نوة ه‪ ,‬فمثسً اىرطا‬
‫ال نمج مل الرتنضل ‪ Differentiation‬الالركنمل ‪ Integration‬ال كالك اقالر حل‬
‫الملنهللت ال اة ‪ Algebric Equations‬الكالك الملنهللت الرتنضااة ‪Differential‬‬
‫‪ Equations‬اا ال را اللاان الالري قهلل رول مل الولال ة من رول‪ ,‬لا فقط الك ل‬
‫اىرطا ال نمج مل الرتنضل ال زئي‪ ,‬الاقالر لمل اان الكى ال زئي ‪Partial‬‬
‫‪ fraction‬ىهاللة الاى الالري رىرازر القرن ً ك ا اً للماهن نلط ن الرقااهللاة‪ ,‬هاا مل ال نحاة‬
‫األكنهللاماة‪ ,‬من مل ال نحاة الرط اقاة فاىرطا ال نمج اللمل في ما الم نت اله هللىاة‬
‫مثل ظمة الرحكر ‪ ,Control System‬الفي م نل الماكن اكن ‪,Mechanical Field‬‬
‫‪Automotive‬‬ ‫‪ Electronics‬الو ن ة الىان ا‬ ‫الكالك محنكة اإللكر ال ان‬
‫‪ ,Industry‬الكالك م نل الطا ال الالهللفن ال ال ‪,Aerospace and Defense‬‬
‫الالكثا مل الرط اقن اله هللىاة‪ .‬فم الرقهللر الى ا في الرك اللال ان و ح الحن ة ماحة‬
‫ال رلار مثل هاا ال نمج حرل و في ى نن الر نف الو ن ي‪ .‬مــــــــــــــــــن ل‬
‫مكر ة ‪ )"Linear Algebra Package"( LAPACK‬هي مكر ة امج قانىاة لا‬
‫ى ة ‪ .1992‬االف إ احا لحل ظمة الملنهللت ال طاة‬ ‫ال قمي ال طي ظه‬
‫الالم لن الوو ى ال طاة ‪ ،‬الموكس القامة الااراة ‪ ،‬الرحال القامة المت هللال‪ .‬ارضمل اضًن‬
‫احا لر تاا الامل الموتالفة الم ر طة مثل رحال ‪ LU‬ال ‪ QR‬ال ‪ Cholesky‬ال ‪ .Schur‬رم‬ ‫إ‬
‫كرن ة ‪ LAPACK‬في األول اوة ‪.FORTRAN‬‬

‫الفي األ ا الهلل ل لطي القن ئ وال ال ونماة الالاضحة الالك إلقنح ظ ال نمة ال إا نز‬
‫ال طي اللهللهلل م ا األاــنر الري كن ق ال الح ا اللنلماة الثن اة إلل‬ ‫حالل رطال ال‬
‫سل هلل اىة‬ ‫غناة الاالر‪ .‬فأالل ويح ركار ه هال ظهال متهالر القيم الذاتية الالري ز‬
‫لاال ن هلل الاا‬ ‫الواغ الر الاة الالملنهللت الرتنضااة‪ .‬فتي الق ل الثنمل و ‪ ،‬هلل‬
‫‪ 1783-1707‬الح كة الهللال ا اة ل ىر واا‪ ،‬هلل اىة ه هللىاة الاكرو مل سلهن هماة‬
‫المحنال ال ئاىاة‪ .‬ثر مل لهللك الزا لالا تغ ا ج ‪ 1813-1736‬الا هلل ك ل‬
‫المحنال ال ئاىاة هي المر هن الااراة لموتالفة القوال الااري اللكل لر الطاهن هاا‬
‫نوارهن فقط‪ .‬الكن الموتالفة ا ااك ن ال ل مل الملنهللت‬ ‫اإلىـــر ل إكرو‬
‫ال طاة اللر ارر ركمامهن لهلل‪ .‬ثر نح لهلل الك نلر ال انضان األلمن ي هللاتاهلل هاا‬
‫طل هاا ال ال مل القار إىر القار الااراة الالك ى ة ‪ 1904‬ال قا‬ ‫‪ 1943-1862‬الا‬
‫زم ن طالاس القار الااراة نواة الملنهللت الرتنضااة الفقط إلل غناة ‪ 1913‬اللنر الا‬
‫نلر ال انضان‬ ‫طا فاه الموتالفن ركمامن انضان ال الوتن ال ملقالفرال مل ط‬
‫ال اطن ي ‪ C. E. Cullis‬حا رظه فاه ال سح األ مهللال الاألىط ‪ ،‬فمل ثمة رزاال‬
‫ال طي الإزهلله هاا التل الإلرر ومل‬ ‫الم هلل لار الموتالفن الراللهلل همن ال‬ ‫ال‬
‫ظ انرهن الازهللاهلل مظه هن ال قن ال منت ىنح ا ال اح‬ ‫ضهللهن الر اال‬ ‫األى ال القال‬
‫الرط اقن ال طاة ال هلل‬ ‫ررمكل مل قاالا ال انضال الال نحثال االمن لهلل االر الظه‬
‫متهالر األولة الااراة الالقار الااراة في نلر الموتالفن المن ح هاا التل اىالهلل في طالل‬
‫لتك لوز اإلنيرمن األلمن اة راكر األلة‬ ‫آلة رال غ الري إ رك‬ ‫الكرا نألقسر حرل ظه‬
‫ال ة الهلله الالري رلر ك انح ال ال األلمن ي آ ااك الكن اإل ا من‬ ‫المحا ال الري كن‬
‫هاك لة روتا كه الماكن اكاة فقنر ال ال األم اكي م ال ال ال اطن ي ال ل الحاتنح‬
‫ر اهلل و ة مل ال نحثال لتك الاوز المحا فقنر آتل رال ا غ ن ر ا آلة تسرنغ الهي‬
‫ال مالا ال ظ ال ىاط الا احنكي ط اقة مل الحنىالا ى ة ‪ 1936‬الق ال هاا ثس‬
‫ى الا ظه متهالر موتالفن كثا ال الحهللالهلل ال رىمل اضن كثا الحهللالهلل الموتالفي ى ة‬
‫ال طي ارطال ال نلرالاز م رك اللال اة الحنىالا لالطي ل ن ال‬ ‫‪ 1933‬ال هللح ار ال‬
‫الرط اقي ال اغ ا الره في راكر األانر حرل ى ة ‪ 1962-1961‬نح ال انضي ال اطن ي‬
‫الازماة اللو الهي ط اقة حالى ة القار الااراة لموتالفة مهمن‬ ‫ف ىاىك الا إكرو‬
‫كن حنلرهن ال طان ااهن "رك ا اة الـــ ‪ "QR‬الر نف في هاك التر ال م مال ة مل‬
‫ال انضال في حنح الملمال ال لرطالا الا زمان اللهلل الرق ان الحالى ة ال انضاة فكنل‬
‫الااكا ىالل في اطن ان ‪ 1986-1919‬الت كزالز ‪ 1974-1893‬في ه ون ان الإاـــهللالا هلل‬
‫ىراـــتل ‪ 1978-1909‬في الحهللالهلل الىالاى اة ال من اة الرامااك ها ز الراوالى –‪1918‬‬
‫‪ 1970‬الرامااك الثن ي ار ه اىي ‪ 1987–1923‬الكسهمن في الحهللالهلل الىالاى اة‬
‫ال من اة ال ما هاك الثسثاة األ ا ال في ملههلل ال انضان الرط اقاة زال اخ الا وأك‬
‫إاهللالا هلل ىراتل الر ىه ى ة ‪ 1943‬الكنل الههلل إااك رومار حنىالا فا ال الهللال ال مل‬
‫فاه رامااك ها ز الراوالى ك نح الا ر في هاك التر ال اللهللاهلل مل األوانح فكنل مل ال‬
‫من نح ه هال لوة ‪ Algol-58‬ال الا زماة ‪ LR‬ال الا زماة ‪ QD‬الرومار الل حنىالا‬
‫ىالاى ‪ .‬الكنل فنهللا هللمار ‪ 1907–1989‬في الىان الالا مل في الرحاال ال قمي‬
‫اللهللهلل الاكرو اللهللاهلل مل ال الا زمان وه هن ال الا زماة الري رحمل إىمه‪.‬‬ ‫الال‬
‫الكنل ااضن لاكى هلل ااركل ‪ 1895–1967‬في االزل هللا المل ال هر األ حن الري قنر هن‬
‫هي ماه ال إا نهلل اال كثا الحهللالهلل نإل رمنهلل ال محهللهللا ه كل الالري ىن هللال فامن‬
‫لهلل في ظهال الا زماة ‪ QD‬ال نلرزامل م هاا ال نح كنل ال انضي الاهالهلل ىنلمالل‬
‫مل ال إا نهلل كل ال اال لكثا الحهللالهلل‬ ‫هنهللمن هلل ‪ 1963–1865‬في ف ىـــن الالا‬
‫ط اقة اللزالر‪ .‬الفي الىان كالك ظه الظنه ال الكال اة وال ‪ 1875–1941‬إىن الا‬
‫ل‬ ‫كمل ل حانره في لمن ان الرراما ال اهلل فال ا اال ‪ 1917–1849‬حا كن‬
‫الموتالفن الملناا المموتالفن ال ظ اة الزم ال ظ اة األ هللاهلل ل ل‬ ‫منلهر في‬
‫ال وال المل ال هر الا زمان‬ ‫وال قنر تر م نل هللاهلل في ال انضان اىمل الاالر‬
‫الحالى ة ال انضاة هي الا زماة ‪ Schur decomposition‬الالري رلرمهلل ىنىن ال‬
‫إحهللى ظ انره‪ .‬الممل ظه في الىان ال انضي اضن ك اااال ‪ 1863–1945‬الكنل‬
‫مه هللىن في ال ح اة ال الىاة الهال مكرو فضنح ولن ي احمل اىمه الاالا هللال ا ال كازال‬
‫األىنىاة غي االر الحالى ة الرك ا اة ال االر الرحكر األلي‪ .‬الفي الهلل امن ك كنل غ ار‬
‫اهللا ىل ‪ 1850-1916‬المزام ن له ال انضي األلمن ي ىماهلل إا هن هلل ‪1876-1959‬‬
‫ل مـــــــنل هـاك الث ـــــنئاة هر الا زمان رتكاــــك الموتالفـــــن التـــــك‬ ‫رم‬
‫المرالحمهلل ‪ .QR decomposition‬الظه رواللىكي ‪ 1875–1918‬في ف ىن الكنل‬
‫رتكاكه الموهال لاموتالفن المال ة الالا احمل‬ ‫ضن طن في المهللفلاة الت ىاة الال‬
‫إىمه‪ .‬الفي لمن ان ظه ال الفاىال الالمه هلل ال انضي هللالل هاىم ا ‪1904–1959‬‬
‫الموتالفن القنر إكرون الموتالفة الري رحمل إىمه الالري لل‬ ‫مل في‬ ‫الالا‬
‫هللال ا محال ان في رتكاك وال الإا نهلل القار الاارــــاة‪ .‬الرزامــــ ن م هاا األ ا كنل اللر‬
‫الالقار الااراة الهال موهال‬ ‫ل منله في ال‬ ‫اللهلل ‪ 1917–1995‬في م اكن الكن‬
‫الا زماة الرك ا الري رحمل إىمه الهي غناة في األهماة في الموتالفن المر نث ال الفي‬
‫حل الملنهللت الرتنضااة ال زئاة‪ .‬ال اضن في م اــــكن كنل مزام ن لهر ال انضي‬
‫هنالزهاللهلل ‪ 1904–1993‬الكنل ر ووه هال ال انضان ال االلال اة الالرحاال ال قمي‬
‫م ك غا إر نك حثه لاالاكا و‬ ‫الموالفن إت لهلل م رو‬ ‫اللر امل إهرمنمه‬
‫الا زماره الري رحمل إىمه الهي مل ىنىان ط ائن إا نهلل القار‬ ‫الحنىالا فإكرو‬
‫‪1910–1993‬‬ ‫ام‬ ‫ال نوة لاموتالفن ‪ .‬الفي م اكن قنر ال نح ال انضي غات‬
‫الا زماة مون هة ل الا زماة هنالزهاللهلل لك هن رلرمهلل رق اة الهللال ال‪ .‬المل ال‬ ‫إكرون‬
‫ال غاالا هالا هلل ال ون ل ف ىا فنل‬ ‫الو وان ال نز ال في الك الزمل‪ :‬ا ىراالا‬
‫لالال ال ال فال ىنا ‪ .‬هاا كاه نل ى ة لاموتالفن الىاماة من نل ى ة لكثا ا الحهللالهلل‬
‫الموتالفاة فقنر رطالا هاا الم نل كل مل نر ت كنىر مل ك هللا الزمااه غال ا إى ائال‬
‫الاهالهلل ال الىي الا اىرالطل رل اا طااة حانره الف ز ال كاللالمنل الاهالهلل ال الهللمنل‬
‫ال ا كالهال الاهالهلل اضن الكل هؤتح قنمالا رطالا هاا ال ن ا في الى لا ان ل حرل‬
‫الكنل قهلل ظه في م اكن في ثمن ا ان الق ل الم و ر و ة في نملة‬ ‫الرىلا ان‬
‫هاالىرالل كــ هللا منل ال واه ال امال ال ماة كثا ال مل ال نحثال الاال ماالا في هاا الم نل‬
‫التاىل ن ل اك هر كاهر اللكل اك مل ا هر ىرنا الك ا كمنل حريش ‪ 1987‬في‬
‫نملة هاالىرالل ــ ركىن ثر نح مل لهللك األىرنا دحيمنن عبد الحكيم ‪ 1992‬ال مس في‬
‫ت الم نل‪ ،‬الق حهللمن مس اوههلل لهن الرن اخ الهي مهللال ة في ماك ارهر‪ .‬الت اتالر ن ل ه‬
‫الهللالل كنلمنل ى ة ‪1961‬‬ ‫ال ل متهمالر فضنح الحنلة الا ظه ال اهلل الهال ون‬
‫الهال في م نل الرحكر األلـــــــــي الاألرالمنراكــــــــن‪ ،‬فاحىل الحظ ل هاا اللار كنل مالازان‬
‫لرطال الحنىالا الالحالى ة ال انضاة فكن األالرالمنراك ال ظ اة الالرط اقة مل كث اللاالر‬
‫ال طي الرط اقي‪.‬‬ ‫الري إىرهاك ال كل منرحماه الكامة مل مل ل ار ال‬

‫الفي ال رنر اهلل ل ه القن ئ الك ار ل كل مل اك ن ىمنحهر مل هللاح هللا هال فقط مل‬
‫ن ا الرىاىل الرن ا ي لالار الاك رتنوال رطال هاا التل الإت فإ ن ت واهلل أ هللاح هللا‬
‫المل كاا هللا ال ىالله ال حهللالا أانره الاىرك الا هن الهر نلأل ال هر كنف الل الت رور‬
‫اهن الا اا لان ل ال هنل الرم اهللهر لااهالهلل الال ون ى هلل الى الال األهللانل المىنالاال‬
‫األهللانل الح اة اإل رقنهلل المح ة األ فإ هن الهللا هلل نالى فن غة ترقالر ال هنل اللا لهن‬
‫مل هللا ىاطنل الإل هؤتح الو اه فالل من تال فالل الإ ه لا مل اللقل في ويح ل‬
‫رحا مل ا وضك ال رىنلر مل النهللاك ل مل اى ك الاىا اك ل الاىا ك لا الاهالهلل‬
‫اى الل هللا قاللهر زا ا ل هللا لا ال ون ى اى الل هللا قاللهر ثنل ثسثة الغا هر مل‬
‫هل المال األ ى كاهر اى الل هللا ال ىالله اىنل الحنل ل اىنل المقنل فمل اللقل ل‬
‫رحا هللاح هللا ثر رهلل ي حا هللا فهكاا تامكل ل اكالل حا هللا كنمس كس‪ .‬قنل هللا رلنلل‬
‫تر هلل قالمن اؤم الل نر الالاالر األ االاهللالل مل حنهلل هللا ال ىالله اللال كنل نحهر ال‬
‫إ الا هر ال وا رهر ‪ .‬القنل رلنلل قهلل هللا ا ن ال ا كر اللهللاالال الال وضنح هللا حرل رؤم الا‬
‫نر الحهللك ‪ .‬القنل ل اك ك ان اُّهن الَّامال آم ُال ْا ت ر َّر ُماال ْا مطن ًة مِّل ُهللال م ُك ْر ت اأْلُال ُك ْر نتً‬
‫ال ُّهللال ْا من م ُّر ْر ق ْهلل هلل م ْال ْوضنح مملْ ْفالاه ممه ْر المن ُر ْ تمي ُ‬
‫و ُهللال ُ ُه ْر ْك ُ ق ْهلل َّا َّ ن ل ُك ُر اآلان م إمل‬
‫ُك ُر ْر رلْ مقاُالل ‪ .‬ال قنل رلنلل ‪:‬ر ى كثما ً ا ِّم ْ ُه ْر اراللَّ ْالل الَّامال كت ُال ْا ل م ْئ من ق َّهللم ْ ل ُه ْر‬
‫الل ْال كن ُالا ا ُْؤ مم ُالل منر الال َّ ميِّ المن‬ ‫ا ُه ْر نلم ُهللالل‬ ‫اا مْه ْر الفمي ْاللاا م‬ ‫تُ ُى ُه ْر ل ى مط ح‬
‫هللاُ‬
‫اقالل واخ اإلىسر ا ل راماة في‬ ‫ْاللمانح اللـكملَّ كثما ً ا ِّم ْ ُه ْر فنىم ُقالل‪.‬‬ ‫ُ مزل إمل ْا مه من ا َّر ُاال ُه ْر‬
‫رالضا هاك اآلاة‪" :‬فاك ماة و طاة رقرضي ه إاا ال هلل الو ط ال هلل المو الط ح‬
‫(لال) الري رقرضي م الو ط ا رتنح المو الط‪ ،‬فقنل‪{ :‬الل ْال كن ُالا ا ُْؤ مم ُالل م ْ م‬
‫نر الال َّ ملح المن‬
‫ُ ْ مزل إمل ْا مه من ا َّر ُاال ُه ْر ْاللمانح}‪ ،‬فهللل ال ل اإلامنل الماكال ا تي ار ناهر اللانح الاضنهللك‪،‬‬
‫الت ا رم اإلامنل الار ناهر اللانح في القاا‪ ،‬الهللل الك ال ل مل ار اهر اللانح‪ ،‬من فلل‬
‫اإلامنل الالا ا مل اإلامنل نر الال ي المن ـــزل إلاه" إ رهل ▪‬
‫الزكدنال الهُد ْر‬ ‫هللاُ ال ىُاللُ ُه الالَّامال آم ُال ْا الَّامال ُاقمامُدالل الوَّدسال الا ُْؤ ُردالل َّ‬ ‫قنل رلنلل ‪:‬إم َّ من اللم ُّا ُك ُر ح‬
‫‪ ،‬القددنل‬ ‫هللا ال ُىددالل ُه الالَّدامال آم ُددال ْا فددإملَّ محد ْدزا ح م‬
‫هللا ُه د ُر ْالوددنلم ُالل‬ ‫ا مك ُلددالل المددل ارددال َّل ح‬
‫ْ‬
‫ض ُه ْر ْاللمان ُح لْ ٍ اأ ُم ُ الل م ْنلملْ ُ ال م الا ْ ه ْالل مل ْال ُم ْ ك م‬ ‫رلنلل ‪:‬الا ْلم ُْؤ مم ُالل ال ْالم ُْؤ مم ن ُ لْ ُ‬
‫هللا مزاددز‬ ‫هللا ال ُىددالل ُه ُالل مئددك ىددا ْ ح ُم ُه ُر َّ‬
‫هللاُ إملَّ َّ‬ ‫الزكددنال الاُطم ا ُلددالل َّ‬
‫وددسال الا ُْؤ ُرددالل َّ‬ ‫ال ُاقما ُمددالل ال َّ‬
‫ح مكار‪.‬‬

‫قددنل ودداخ اإلىددسر ا ددل راماددة‪ :‬فمددل كددنل فادده إامددنل الفادده ف ددال طم ددي مددل المددالاتال حىددا‬
‫حىدددا ف دددال ك‪ ،‬الت ا ددد مدددل اإلامدددنل نلكاادددة م ددد ح هلل الدددا الا‬ ‫إامن ددده‪ ،‬المدددل الددد و‬
‫الالملنودددي كمدددن اقدددالل ال دددالا الالملرزلدددة ‪ .‬الت ا لدددل األ ادددنح الالودددهللاقالل الالودددههللاح‬
‫الالونلحالل م زلة التىنن في اإلامنل الالهللال الالحا الال و الالمالاتال الالملدنهللاال‪ .‬إ رهدل ▪‬
‫قا ‪ :‬الالكتن تاح الل مل كل ال ه دل الا وضدالل الد و المطادن الالمىدامالل ت ا وضدالل‬
‫مل كل ال ده فمد هر مدل احدا مدل كدل ال ده كـدـ األ ادنح الالودهللاقالل الالودههللاح الالودنلحالل‬
‫الم هر مل ا رم فاه هاا الهاا حىا من مله مل اإلامنل‪.‬‬
‫القنل الواخ ونل التالزال في قىنر ال ن فامن ا ا في حقهدر مدل الدالتح الال د اح‪" :‬ال دن‬
‫في الالتح الال اح ال ثسثة قىنر‪ :‬الاك األو ن المرقهللمدة مح دة ت ود فاهدن ال ود ت‬
‫مح ة فاه الم هر مل ا رم فاه هاا الهاا‪ .‬القدنل الوداخ محمدهلل دل ودنل اللثامدال الت ا ودي‬
‫هللاً ل اثن المؤمل وا المؤمل مهمدن ظهد مدل المدال حهللال ال دهللى مدل ال ود ؛ فدإل هللا رلدنلل‬
‫‪:‬ال ُّهللال ْا ل ْال ر ْكتُد ُ الل كمدن كتد ُ ال ْا فر ُكال ُدالل ىدالاح فدس ر َّر ُمداال ْا ممد ْ ُه ْر ْاللمادنح ح َّردل‬ ‫اقالل هر‬
‫هللا فإمل راللَّ ْال ْا ف ُ ُاال ُه ْر الا ْق ُراُال ُه ْر ح ْا ُ ال هلل َّرــــمُال ُه ْر الت ر َّر ُماال ْا مم ْ ُه ْر اللم ًحاـدـن‬
‫ال ح م‬
‫اُهن م ُ ال ْا فمي ى م م‬
‫‪ ،‬الاقـــدـالل ىد حن ه ل اده ‪:‬اللدل ر ْ ضدل دك ْالاهُدال ُهلل الت ال َّ ودن ى‬ ‫الت وم دا ً‬
‫ح َّرل ر َّر م مماَّر ُه ْر قُ ْل إملَّ هُهللى ح م‬
‫هللا هُال ْالهُهللى اللئ ممل ا َّر لْ هْ الاحهُر لْ هلل الَّ ما نحك ممل ْالل ْما مدر مدن‬
‫‪ .‬إ رهل ▪‬ ‫هللا ممل اللميٍّ الت وم ا ٍ‬ ‫لك ممل ح م‬

‫المل اهلل اإلىرزاهللال مل اللـــــــــار فاا ا‬


‫كرنا‪ :‬الت قنل ال اللانح ال حمل ال اللادنح الوداطنل ت دل رامادة ال كردنا الودن ر المىداالل‬
‫ال ونرر ال ىالل ت ل راماة الكرنا‪ :‬حكنر هل الامة ت ل القار‪.‬‬
‫كرنا‪ :‬اإل ونهلل إلل وحا اإل رقدنهلل لودنل التدالزال ال ىدنلة‪ :‬الدالتح الال د اح فدي اإلىدسر‬
‫لونل التالزال ال كرنا‪ :‬كامة حن ألحمهلل محمهلل ونك ‪.‬‬
‫ال زاهلل‬
‫كرنا‪ :‬اإل طنل ل ظ اة ال اط ال اإلىسر ال غا ك مل األهللانل ل ك‬
‫ىنلة‪ :‬الثن ى اإلامنل \ لمحمهلل ل هلل الالهنا‬
‫ىنلة‪ :‬ال الا هللاالال الاهالهلل الالمو كال الغا هر مل الكتن لل هلل اللزاز ل نز‬
‫الم اهلل الثنل ‪ :‬مل م مال فرنالى ا ل ثامال‬
‫الم اهلل‪ :‬الثن ي‪ :‬مل فرنالى الا ة الهللائمة‬
‫كرنا‪ :‬اقرضنح الى اط المىرقار لم نلتة وحنا ال حار ت ل راماة‬
‫سكتبه الدكتسر بخيت بلقنسم‬
CHAPTER I:
Birth of Computational Linear
Algebra

 
Birth of Computational Linear Algebra
Introduction: When talking on computational algebra the, first thing that comes to
mind is the following question: What is the meaning of a MATRIX and when are matrices
born? And these are the implications of understanding, so it is natural to ask such
questions. The term "matrix" (Latin for "womb", derived from mater—mother) was
introduced by the 19th -century English mathematician James Sylvester, but it was his
friend the mathematician Arthur Cayley who developed the algebraic aspect of matrices
in two papers in the 1850s. Cayley first applied them to the study of systems of linear
equations, where they are still very useful. They are also important because, as Cayley
recognized, certain sets of matrices form algebraic systems in which many of the
ordinary laws of arithmetic (e.g., the associative and distributive laws) are valid but in
which other laws (e.g., the commutative law) are not valid. An English mathematician
named C. E. Cullis was the first to use modern bracket notation for matrices in 1913
and he simultaneously demonstrated the first significant use of the notation 𝑨 = [𝑎𝑖,𝑗 ] to
represent a matrix where 𝑎𝑖,𝑗 refers to the 𝑖 𝑡ℎ row and the 𝑗 𝑡ℎ column.

Numerical linear algebra, sometimes called applied linear algebra, is the study of how
matrix operations can be used to create computer algorithms which efficiently and
accurately provide approximate answers to questions in continuous mathematics. It is a
subfield of numerical analysis, and a type of linear algebra. Numerical linear algebra
uses properties of vectors and matrices to develop computer algorithms that minimize
the error introduced by the computer, and is also concerned with ensuring that the
algorithm is as efficient as possible. Common problems in numerical linear algebra
include obtaining matrix decompositions like the singular value decomposition, the QR
factorization, the LU factorization, or the Eigen-decomposition, which can then be used to
answer common linear algebraic problems like solving linear systems of equations,
locating eigenvalues, or least squares optimization. Numerical linear algebra's central
concern with developing algorithms that do not introduce errors when applied to real
data on a finite precision computer is often achieved by iterative methods rather than
direct ones (from Wikipedia).

Numerical linear algebra was developed


by computer pioneers like Alan Turing,
James H. Wilkinson, Alston Scott
Householder, George Forsythe, and
Heinz Rutishauser, in order to apply the
earliest computers to problems in
continuous mathematics, such as
ballistics problems and the solutions to
systems of partial differential equations.
The first serious attempt to minimize
computer error in the application of
algorithms to real data is John von
Neumann and Goldstine's work in 1947.
Wilkinson, Givens, Forsythe, Householder, Henrici, Bauer. 1964
The field has grown as technology has increasingly enabled researchers to solve complex
problems on extremely large high-precision matrices, and some numerical algorithms
have grown in prominence as technologies like parallel computing have made them
practical approaches to scientific problems.

Algebra Software: Several programming


languages use numerical linear algebra
optimization techniques and are designed
to implement numerical linear algebra
algorithms. These languages include
MATLAB, Analytica, Maple, and
Mathematica. Other programming
languages which are not explicitly
designed for numerical linear algebra
have libraries that provide numerical
linear algebra routines and optimization;
C and Fortran have packages like Basic
Linear Algebra Subprograms and
LAPACK, python has the library NumPy,
and Perl has the Perl Data Language. Early Days at Argonne, Tek4081

Cleve Moler, the chairman of the computer science department at the University of New
Mexico, started developing MATLAB in the late 1970s. He designed it to give his students
access to LINPACK and EISPACK without them having to learn Fortran. It soon spread to
other universities and found a strong audience within the applied mathematics
community. Jack Little, an engineer, was exposed to it during a visit Moler made to
Stanford University in 1983. Recognizing its commercial potential, he joined with Moler
and Steve Bangert. They rewrote MATLAB in C and founded MathWorks in 1984 to
continue its development. These rewritten libraries were known as JACKPAC. In 2000,
MATLAB was rewritten to use a newer set of libraries for matrix manipulation, LAPACK.

MATLAB was first adopted by researchers and practitioners in


control engineering, Little's specialty, but quickly spread to
many other domains. It is now also used in education, in
particular the teaching of linear algebra and numerical analysis,
and is popular amongst scientists involved in image processing.

‫ قد سحبت‬MATLAB ‫ ذكرت وسائل اإلعالم الحكومية الصينية أن‬، 2020 ‫في هذا العام عام‬
‫ وقالت إنه سيتم الرد على ذلك من خالل‬، ‫خدماتها من جامعتين صينيتين نتيجة للعقوبات األمريكية‬
‫ فنصيحة لمن يكتب خوارزمياته‬.‫زيادة استخدام بدائل مفتوحة المصدر ومن خالل تطوير بدائل محلية‬
.‫بلغة ماتالب أن يتريث أويحاول أن يكتبها بلغة الباك فهي لغة الجبر األم‬

There are a number of competitors to MATLAB. Some notable examples include: Maple,
IDL, and Wolfram Mathematica. There are also free open source alternatives to MATLAB,
in particular: GNU Octave, Scilab, FreeMat, Julia, and SageMath. which are somewhat
compatible with the MATLAB language. GNU Octave is unique from the others in that it
aims to be drop-in compatible with MATLAB syntax-wise.
MATLAB started out as a simple “Matrix Laboratory"- hence the name. Three individuals,
J. H. Wilkinson, George Forsythe, and John Todd, played important roles in the origins
of MATLAB.

⦁ 1967: "Computer solution of linear algebraic equations", Forsythe and Moler et. al.
⦁ 1971: "Handbook for automatic computations" in ALGOL, J. H. Wilkinson
⦁ 1976: "Matrix Eigensystem Routines, EISPACK Guide" in FORTRAN
⦁ 1979: "LINPACK" in FORTRAN (Jack Dongarra. Cleve Moler. Pete Stewart. Jim Bunch)
⦁ 1977~: "MATLAB Environment", Cleve Moler
⦁ 1979: "Numerical analysis" lecture at Stanford, met with Jack Little.
⦁ 1984: MathWorks founded by Jack and Moler.

Cleve Moler. Jack Dongarra. Pete Stewart. Jim Bunch 2011 and DEC PDP-1 1963

The SWAC (Standards Western Automatic Computer) was an early electronic digital
computer built in 1950 by the U.S. National Bureau of Standards (NBS) in Los Angeles,
California. It was designed by Harry Huskey.
Contributors in Numerical linear algebra and algorithms: Numerical linear algebra
(NLA) is a relatively vast area of research, with about two hundred active participants.
However, it is an integral component of numerical analysis, which contributors from a
wide variety of disciplines whose ideas are often helpful to research in many others. It
was developed by computer pioneers like John von Neumann, James H. Wilkinson,
Alston Scott Householder, George Forsythe, and Heinz Rutishauser, in order to apply the
earliest computers to problems in continuous mathematics. Here we are going to
introduce a brief overview on the most famous contributors in this area.

Issai Schur (1875–1941) was a Russian mathematician who worked in Germany for
most of his life. He studied at the University of Berlin. He obtained his
doctorate in 1901, became lecturer in 1903 and, after a stay at the
University of Bonn, professor in 1919. As a student of Ferdinand Georg
Frobenius, he worked on group representations (the subject with which
he is most closely associated), but also in combinatorics and number
theory and even theoretical physics. He is perhaps best known today for
his result on the existence of the Schur decomposition and for his work
on group representations (Schur's lemma). Concepts named after him:
Schur algebra, Schur complement, Schur product, Schur's theorem, Jordan–Schur
theorem, Schur decomposition.

Jacques Salomon Hadamard (1865–1963) was a French mathematician who made


major contributions in number theory, complex analysis, differential
geometry and partial differential equations. He obtained his doctorate in
1892 and in the same year was awarded the Grand Prix des Sciences
Mathématiques for his essay on the Riemann zeta function. The the same
year 1982 he took up a lectureship in the University of Bordeaux, where
he proved his inequality on determinants, which led to the discovery of
Hadamard matrices when equality holds. In 1896 he made two important
contributions: he proved the prime number theorem, using complex function theory.

Philipp Ludwig von Seidel (1821–1896) was a German mathematician. The Gauss–
Seidel method is a useful numerical iterative method for solving linear
systems. Seidel progressed rapidly at Munich. He was appointed as an
extraordinary professor in Munich in 1847 and then an ordinary
professor in 1855. He received many honours such as appointment as a
Royal Privy Councillor. He received many medals for his work and, in
1851, was elected to the Bavarian Academy of Sciences. Other academies
also honoured him, for example he was elected to the academies of
Göttingen and of Berlin. He lectured on probability theory, and also on the method of
least squares.

Johann Carl Friedrich Gauss (1777–1855) was a German mathematician


and physicist who made significant contributions to many fields in
mathematics and science. In his 1799 doctorate in absentia, A new proof
of the theorem that every integral rational algebraic function of one
variable can be resolved into real factors of the first or second degree,
Gauss proved the fundamental theorem of algebra which states that every non-constant
single-variable polynomial with complex coefficients has at least one complex root.

André-Louis Cholesky (1875–1918) was a French military officer and mathematician.


Known for Cholesky decomposition. He served in the French military as an
artillery officer and was killed in battle a few months before the end of
World War I; his discovery was published posthumously by his fellow
officer Commandant Benoît in the Bulletin Géodésique.

Sir William Rowan Hamilton (1805-1865) was an Irish mathematician, Andrews


Professor of Astronomy at Trinity College Dublin, and Royal Astronomer of
Ireland. He worked in both pure mathematics and mathematics for
physics. He made important contributions to optics, classical mechanics
and algebra. Although Hamilton was not a physicist–he regarded himself
as a pure mathematician–his work was of major importance to physics,
particularly his reformulation of Newtonian mechanics, now called
Hamiltonian mechanics.

Arthur Cayley FRS (1821–1895) was a prolific British mathematician who worked
mostly on algebra. He helped found the modern British school of pure
mathematics. He postulated the Cayley–Hamilton theorem—that every
square matrix is a root of its own characteristic polynomial, and verified it
for matrices of order 2 and 3.

Ferdinand Georg Frobenius (1849–1917) was a German mathematician, best known for
his contributions to the theory of elliptic functions, differential equations,
number theory, and to group theory. He is known for the famous
determinantal identities, known as Frobenius–Stickelberger formulae,
governing elliptic functions, and for developing the theory of biquadratic
forms. He was also the first to introduce the notion of rational
approximations of functions (nowadays known as Padé approximants),
and gave the first full proof for the Cayley–Hamilton theorem.

Erhard Schmidt (1876-1959) was a Baltic German mathematician whose work


significantly influenced the direction of mathematics in the twentieth
century. Schmidt was born in Tartu (German: Dorpat), in the
Governorate of Livonia (now Estonia). He attended his local university in
Dorpat before going to Berlin where he studied with Schwarz. His
doctorate was obtained from the University of Göttingen in 1905 under
Hilbert's supervision. After leaving Bonn, Schmidt held positions in
Zürich, Erlangen and Breslau before he was appointed to a professorship
at the University of Berlin in 1917.
Alexander Craig "Alec" Aitken (1895–1967) was one of New Zealand's most eminent
mathematicians. In a 1935 paper he introduced the concept of
generalized least squares, along with now standard vector/matrix
notation for the linear regression model. Aitken was one of the best
mental calculators known, and had a prodigious memory. He knew the
first 1000 digits of π , the 96 recurring digits of 1/97, and memorised the
Aeneid in high school. However, his inability to forget the horrors he
witnessed in World War I led to recurrent depression throughout his life.

Aleksey Nikolaevich Krylov (1863–1945) was a Russian naval engineer, applied


mathematician and memoirist. In 1931 he published a paper on what is
now called the Krylov subspace and Krylov subspace methods. The
paper deals with eigenvalue problems, namely, with computation of the
characteristic polynomial coefficients of a given matrix. In 1888 Krylov
joined the department of ship construction of St Petersburg Maritime
Academy. There he was taught advanced mathematics by Aleksandr
Nikolaevich Korkin, a student of Chebyshev, who was an expert in
partial differential equations.

Jørgen Pedersen Gram (1850-1916) was a Danish actuary and mathematician who was
born in Nustrup, Duchy of Schleswig, Denmark and died in Copenhagen,
Denmark. The mathematical method that bears his name, the Gram–
Schmidt process, was first published in the former paper, in 1883. Gram's
theorem and the Gramian matrix are also named after him. In 1873 Gram
graduated with a Master's degree in mathematics. This degree was of a
higher level than the present British/American Master's degree and more
on a par with today's British/American Ph.D. Gram had published his first important
mathematics paper before he had graduated. This was a work on modern algebra which
appeared first in Tidsskrift for Mathematik but, in 1874, Gram published a fuller
account of the same material in French.

Dmitry Konstantinovich Faddeev (1907–1989) was a Soviet mathematician. Dmitri


was born June 30, 1907, about 200 kilometers southwest of Moscow. In
1928 he graduated from Petrograd State University, as it was then called.
His teachers included Ivan Matveyevich Vinogradov and Boris
Nicolaevich Delone. In 1930 he married Vera Nicolaevna Zamyatina. The
couple also wrote Numerical Methods in Linear Algebra in 1960 with an
enlarged edition in 1963. For instance, they developed an idea of Urbain
Leverrier to produce an algorithm to find the resolvent matrix by
iterations, the method computed the adjugate matrix and characteristic
polynomial for 𝐴. Dmitri was committed to mathematics education and aware of the need
for graded sets of mathematical exercises. With Iliya Samuilovich Sominskii he wrote
Problems in Higher Algebra.
Peter Karl Henrici (1923–1987) was a Swiss mathematician best known for his
contributions to the field of numerical analysis. Henrici was born in
Basel and studied law for two years at University of Basel. After World
War II he transferred to ETH Zürich where he received a diploma in
electrical engineering (1948) and a doctorate in mathematics with
Eduard Stiefel as his advisor (1952). In 1951 he moved to the United
States and worked on a joint contract with American University and the
National Bureau of Standards. Then, from 1956 to 1962, he taught at
University of California, Los Angeles where he became a professor. In
1962 he returned to ETH Zürich as a professor, a position he kept for the rest of his life.
Notable of his students Gilbert Strang.

Cornelius (Cornel) Lanczos: was a Hungarian mathematician and physicist, who was
born in Kingdom of Hungary on February 2, 1893, and died on June 25,
1974. According to György Marx he was one of The Martians. Lanczos'
Ph.D. thesis (1921) was on relativity theory. He sent his thesis copy to
Albert Einstein, and Einstein wrote back, saying: "I studied your thesis I
decided that the degree of doctorate should be obtainable. I gladly
accept the honorable". Lanczos developed a number of techniques for
mathematical calculations using digital computers, including: the
Lanczos algorithm for finding eigenvalues of large symmetric matrices,
the Lanczos approximation for the gamma function, and the conjugate gradient method
for solving systems of linear equations.

Eduard L. Stiefel (1909–1978) was a Swiss mathematician. Together with Cornelius


Lanczos and Magnus Hestenes, he invented the conjugate gradient
method, and gave what is now understood to be a partial construction of
the Stiefel–Whitney classes of a real vector bundle, thus co-founding the
study of characteristic classes. Stiefel achieved his full professorship at
ETH Zurich in 1943, founding the Institute for Applied Mathematics five
years later. The objective of the new institute was to design and construct
an electronic computer. See https://math.ethz.ch/sam/the-
institute/history.html

Heinz Rutishauser (1918–1970) was a Swiss mathematician and a pioneer of modern


numerical mathematics and computer science. Rutishauser studied
mathematics at the ETH Zürich where he graduated in 1942. From
1942 to 1945, he was assistant of Walter Saxer at the ETH, and from
1945 to 1948, a mathematics teacher in Glarisegg and Trogen. In 1948,
he received his Doctor of Philosophy (PhD) from ETH with a well-
received thesis on complex analysis. From 1949 to 1955, he was a
research associate at the Institute for Applied Mathematics at ETH
Zürich recently founded by Eduard Stiefel. He contributed especially in
the field of compiler pioneering work and was eventually involved in defining the
languages ALGOL 58 and ALGOL 60.
Karl Adolf Hessenberg (1904–1959) was a German mathematician and engineer. The
Hessenberg matrix form is named after him. From 1925 to 1930 he
studied electrical engineering at the Technische Hochschule Darmstadt
(today Technische Universität Darmstadt) and graduated with a
diploma. From 1931 to 1932 he was an assistant to Alwin Walther at the
Technische Hochschule Darmstadt, afterwards he worked at the power
station in Worms, Germany. From 1936 he worked as an engineer at
AEG, first in Berlin and later in Frankfurt. In 1940 he received his PhD
from Alwin Walther at the Technische Hochschule in Darmstadt.

Walter Edwin Arnoldi (New York, 1917–1995) was an American engineer mainly known
for the Arnoldi iteration, an eigenvalue algorithm used in numerical
linear algebra. Arnoldi graduated in mechanical engineering at the
Stevens Institute of Technology in 1937 and attended a Master of
Science course at Harvard. He worked at United Aircraft Corp. from
1939 to 1977. His main research interests included modelling
vibrations, acoustics and aerodynamics of aircraft propellers. His 1951
paper "the principle of minimized iterations in the solution of the
eigenvalue problem" is one of the most cited papers in numerical linear algebra.

Alston Scott Householder (1904–1993) was an American mathematician who


specialized in mathematical biology and numerical analysis. He is the
inventor of the Householder transformation and of Householder's
method. Householder taught mathematics in a number of different
places and began to work for his doctorate in mathematics in 1934. He
was awarded a Ph.D. by the University of Chicago in 1937 for a thesis
on the calculus of variations. However his interests were moving
towards applications of mathematics, particularly applications of
mathematics to biology. When the computer began to evolve he changed
topic, leaving behind his research interest of mathematical biology and
moving into numerical analysis which was increasing in importance due to the advances
in computers. He started publishing on this new topic with Some numerical methods for
solving systems of linear equations which appeared in 1950. Even before this first
publication in numerical analysis, Householder had been appointed Head of the
Mathematics Panel of the Oak Ridge National Laboratory in 1948. This role certainly did
not prevent him from taking a leading role in research in numerical analysis in general
and in numerical linear algebra in particular.

James Hardy Wilkinson: (1919-1986 England) At an earlier age Wilkinson seemed set
to become a classical analyst but the Second World War changed the
direction of his research. In 1940, rather than going into the infantry,
Wilkinson began war work on the thermodynamics of explosions,
ballistics, supersonic flow, and fragmentation of shells. At first he tried
to solve analytically the problems he was presented with, which after all
was the way he had learnt at university to solve problems, but he soon
realized that he would have to use approximate numerical methods if
he was to obtain useful results. He began to put his greatest efforts into
the numerical solution of hyperbolic partial differential equations, using finite difference
methods and the method of characteristics. He carried out the calculations on a
mechanical calculating machine which he operated by turning a handle. Taking up war
work in 1940, he began working on ballistics but transferred to the National Physical
Laboratory in 1946, where he worked with Alan Turing on the ACE computer project.
Later, Wilkinson's interests took him into the numerical analysis field, where he
discovered many significant algorithms. He continued work becoming more involved in
writing many high quality papers on numerical analysis, particularly numerical linear
algebra. Having written subroutines to do floating-point arithmetic before a computer
had been built to run them on, he was now in the fortunate position of being able to
progress rapidly, gaining experience with floating-point computing. In numerical linear
algebra he developed backward error analysis methods. He worked on numerical
methods for solving systems of linear equations and eigenvalue problems.

James Wallace Givens, (1910–1993) was an American mathematician and a pioneer in


computer science. He is the eponym of the well-known Givens rotations.
His master's degree is from the University of Virginia under Ben Zion
Linfield in 1931 (after a one-year fellowship at the University of
Kentucky); and his doctorate from Princeton University in 1936 under
Oswald Veblen. (Dissertation title: Tensor Coordinates of Linear Spaces.)
He is the inventor of the Givens transformation.

G.W. "Pete" Stewart: is a world-renowned expert in computational linear algebra. He


has made fundamental and highly cited advances in the analysis of
rounding error in numerical computations, perturbation of eigensystems,
generalized inverses, least squares problems, and matrix factorizations.
He has developed efficient algorithms for the singular value
decomposition, updating and downdating matrix factorizations, and the
eigenproblem that are widely used in applications. Stewart received his
doctorate from the University of Tennessee in 1968.

Gene Howard Golub: (1932–2007), Born in Chicago, he was educated at the University
of Illinois at Urbana-Champaign, receiving his B.S. (1953), M.A. (1954)
and Ph.D. (1959) all in mathematics. His M.A. degree was more
specifically in Mathematical Statistics. His PhD dissertation was entitled
"The Use of Chebyshev Matrix Polynomials in the Iterative Solution of
Linear Equations Compared to the Method of Successive Overrelaxation".
Gene Golub served as the president (SIAM) from 1985 to 1987 and was
founding editor of both the SIAM Journal on Scientific Computing (SISC) and the SIAM
Journal on Matrix Analysis and Applications (SIMAX). In 1983 he published Matrix
computations written jointly with Charles F Van Loan.

John G.F. Francis (born 1934) is an English computer scientist, who in 1961 published
the QR algorithm for computing the eigenvalues and eigenvectors of
matrices, which has been named as one of the ten most important
algorithms of the twentieth century. The algorithm was also proposed
independently by Vera N. Kublanovskaya of the Soviet Union in the same
year. By 1962, Francis had left the field of numerical analysis, and
subsequently had no idea of the impact his work on the QR algorithm had had, until re-
contacted by Gene Golub in 2007, by which time he was retired and living in Hove,
England (near Brighton).

Cleve Barry Moler: is an American mathematician and computer programmer


specializing in numerical analysis. In the mid to late 1970s, he was one of
the authors of LINPACK and EISPACK, Fortran libraries for numerical
computing. He invented MATLAB, a numerical computing package, to
give his students at the University of New Mexico easy access to these
libraries without writing Fortran. In 1984, he co-founded MathWorks with
Jack Little to commercialize this program.

Charles Francis Van Loan (born September 20, 1947) Originally from Orange, New
Jersey, Van Loan attended the University of Michigan, where he
obtained the B.S. in applied mathematics (1969) and the M.A. (1970)
and Ph.D (1973) in mathematics. His PhD dissertation was entitled
"Generalized Singular Values with Algorithms and Applications" and
his thesis adviser was Cleve Moler. Now he is an emeritus professor of
computer science and the Joseph C. Ford Professor of Engineering at
Cornell University, He is known for his expertise in numerical analysis,
especially matrix computations. In 2016, Van Loan became the Dean of
Faculty at Cornell University.

Magnus Rudolph Hestenes (1906-1991) Born in Bricelyn, Minnesota, was an American


mathematician best known for his contributions to calculus of
variations and optimal control. Hestenes earned his Ph.D. at the
University of Chicago in 1932 under Gilbert Bliss. His dissertation was
titled "Sufficient Conditions for the General Problem of Mayer with
Variable End-Points." After teaching as an associate professor at
Chicago, in 1947 he moved to a professorship at UCLA. He continued
there until his retirement in 1973, and during that time he served as
department chair from 1950–58. As a pioneer in computer science, he devised the
conjugate gradient method, published jointly with Eduard Stiefel.

Peter Lancaster (born 14 November 1929) is a British-Canadian mathematician. He is


professor emeritus at the University of Calgary, where he has worked
since 1962. His research focuses on matrix analysis and related fields,
motivated by problems from vibration theory, numerical analysis,
systems theory, and signal processing. Lancaster served as Department
Chairman from 1973 to 1977, and President of the Canadian
Mathematical Society from 1979 to 1981. He was elected a Fellow of the
Royal Society of Canada in 1984, and received a Humboldt Research
Award in 2000. In 2018 the Canadian Mathematical Society listed him in their inaugural
class of fellows. A brief biography can be found in "An interview with Peter Lancaster" by
N.J.Higham of the University of Manchester.
Israel Gohberg (1928–2009) Came from a Jewish family. was a Bessarabian-born Soviet
and Israeli mathematician, most known for his work in operator
theory and functional analysis, in particular linear operators and
integral equations. He was appointed to a professorship at Tel Aviv
University and he also was appointed to the Weizman Institute in
Rehovot. This Jew has written many international books and
articles and is considered the cornerstone of operator theory. Who
wrote an article: Peter Lancaster, my Friend and Co-author.

What is Matrix Algorithms in MATLAB: Matrix computations are very important to


many scientific and engineering disciplines. Many successful public and commercial
software packages for the matrix computations, such as LAPACK and MATLAB, have
been widely used for decades. LAPACK stands for Linear Algebra PACKage. It is a
Fortran library of routines for solving linear equations, least square of linear equations,
eigenvalue problems and singular value problems for dense and band of real and
complex matrices. MATLAB stands for MATrix LABoratory. It is an interpretive computer
language and numerical computation environment. It includes a lot of built-in matrix
computation algorithms, most of which are built upon LAPACK. Its powerful sub-matrix
indexing capability makes it a good tool for the rapid prototype of numerical algorithms.

This book tries to shorten the wide gap between the rigorous mathematics of matrix
algorithm and complicated computer code implementations. It presents many matrix
algorithms using real MATLAB codes. For each algorithm, the presentation starts with a
brief but simple mathematical exposure. The algorithm is usually explained before
delivering the code, step by step, in the same order that the algorithm is executed on a
computer. The MATLAB codes do not look very different from pseudo codes. For the sake
of clarity of the representation, most of the MATLAB codes presented in the book are
kept within few numbers of lines. This book is intended for people working in the field of
matrix computations, who need to master, implement and improve the matrix
algorithms. Students in computer science, applied mathematics, computer engineering
and other engineering disciplines can benefit from studying the book. The book is also
useful to researchers and professionals in numerical analysis of engineering and
scientific research.

Objectives of the topics covered in this Book are

Fundamentals of linear algebra


⦁ Vector spaces, matrices, - [theoretical]
⦁ Understanding bases, ranks, linear independence
⦁ Improve mathematical reasoning skills [proofs]
Computational linear algebra
⦁ Understanding common computational problems
⦁ Solving linear systems
⦁ Get a working knowledge of matlab
⦁ Understanding computational complexity
⦁ See how numerical linear algebra arises in a few computer science-related applications.
The road ahead: Plan in a nutshell

If there are some things that are not clear or few topics are not understandable in the
text (e.g., complexity) then we are strongly advising readers and students to cover them
in the following textbooks:

⦁ Gene H. Golub (1996) Matrix Computations, Third Edition


⦁ G. W. Stewart (1998) Matrix Algorithms, Volume I and II
⦁ G.W. Stewart (1973) Introduction to Matrix Computations
⦁ Nicholas J. Higham (1996) Accuracy and Stability of Numerical Algorithms
⦁ P Arbenz (2006) Software for Numerical Linear Algebra, at the Computer Science
Department of ETH Zurich

⦁ Charles F. Van Loan (2010) Insight Through Computing: A MATLAB Introduction to


Computational Science and Engineering
⦁ Cleve B. Moler (2004) Numerical computing with MATLAB-Society for Industrial and
Applied Mathematics
⦁ Cleve B. Moler (2011) Experiments with MATLAB
CHAPTER II:
Elements of Numerical Linear
Algebra
Elements of Numerical Linear Algebra
It is often useful to treat the rows or columns
of a matrix as vectors. Terms such as linear independence that we have defined for
vectors (i.e. in Linear Algebra see BEKHITI Belkacem 2020) also apply to rows and/or
columns of a matrix. The vector space generated by the columns of the 𝑚 × 𝑛 matrix 𝑨 is
of order 𝑚 and of dimension 𝑛 or less, and is called the column space of 𝑨, the range of
𝑨, or the manifold of 𝑨. This vector space is sometimes denoted by 𝒱(𝑨) or span(𝑨).

The linear dependence or independence of the vectors forming the


rows or columns of a matrix is an important characteristic of the matrix. The maximum
number of linearly independent vectors (those forming either the rows or the columns) is
called the rank of the matrix. We use the notation rank(𝑨) to denote the rank of the
matrix 𝑨. Because multiplication by a nonzero scalar does not change the linear
independence of vectors, for the scalar 𝑎 with 𝑎 ≠ 0, we have rank(𝑎𝑨) = rank(𝑨). If 𝑨 is
an 𝑚 × 𝑛 matrix then rank(𝑨) < min(𝑛, 𝑚).

We have defined matrix rank in terms of numbers of linearly independent rows or


columns. This is because the number of linearly independent rows is the same as the
number of linearly independent columns. Although we may use the terms “row rank”
or “column rank”, the single word “rank” is sufficient because they are the same.

If a matrix is not of full rank, we say it is rank deficient and define the rank deficiency as
the difference between its smaller dimension and its rank. Let 𝑨 be any 𝑚 × 𝑛 matrix
such that 𝑟 = rank(𝑨) and 𝜈 = nullity (𝑨), [the dimension of 𝒩(𝑨), the null space or
kernel of 𝑨, i.e., the dimension of {𝐱: 𝑨𝐱 = 𝟎}]. Then 𝑟 + 𝜈 = 𝑛.

Theorem: rank(𝑨) = rank(𝑨𝑇 ) = rank(𝑨𝑇 𝑨) = rank(𝑨𝑨𝑇 ).

Proof: 𝑨𝐱 = 𝟎 ⟹ 𝑨𝑇 𝑨𝐱 = 𝟎 ⟹ 𝐱 𝑇 𝑨𝑇 𝑨𝐱 = 0 ⟹ (𝑨𝐱)𝑇 𝑨𝐱 = 0 ⟹ ‖𝑨𝐱‖𝟐 = 𝟎 ⟹ 𝑨𝐱 = 𝟎. Hence


the null-spaces of 𝑨 and 𝑨𝑇 𝑨 are the same. Since 𝑨 and 𝑨𝑇 𝑨 have the same number of
columns, it follows that rank(𝑨) = rank(𝑨𝑇 𝑨). Similarly, rank(𝑨𝑇 ) = rank(𝑨𝑇 𝑨) and the
result follows.
▪ rank(𝑨) = rank(𝑨𝑇 ), ▪ rank(𝑨) = 𝑑𝑖𝑚(𝒱(𝑨)),
𝑇 𝑇)
▪ rank(𝑨 𝑨) = rank(𝑨𝑨 ▪ 𝑑𝑖𝑚(𝒱(𝑨)) = 𝑑𝑖𝑚(𝒱(𝑨𝑇 ))
𝑇
▪ rank(𝑨) = rank(𝑨 𝑨) ▪ rank(𝑨𝑇 ) = rank(𝑨𝑨𝑇 )

(Note, of course, that in general 𝒱(𝑨) ≠ 𝒱(𝑨𝑇 ) ; the orders of the vector spaces are
possibly different.)

Theorem: If 𝑨 and 𝑩 are conformable matrices, then: rank(𝑨𝑩) < min(rank(𝑨), rank(𝑩))

Proof: The rows of 𝑨𝑩 are linear combinations of the rows of 𝑩, so that the number of
linear independent rows of 𝑨𝑩 is less than or equal to those of 𝑩; thus rank(𝑨𝑩) <
rank(𝑩). Similarly, the columns of 𝑨𝑩 are linear combinations of the columns of 𝑨, so
that rank(𝑨𝑩) < rank(𝑨).
Theorem: ❶ If 𝒞(𝑨) is the column space of 𝑨 (the space spanned by the columns of 𝑨),
then 𝒞(𝑨𝑇 𝑨) = 𝒞(𝑨𝑇 ) and 𝒞(𝑨𝑨𝑇 ) = 𝒞(𝑨).

❷ If 𝑨 is any matrix, and 𝜬 and 𝑸 are any conformable nonsingular matrices, then
rank(𝑷𝑨𝑸) = rank(𝑨).

❸ If 𝑨 is a square, then rank(𝑨) is equal to the number of nonzero eigenvalues.

Proof: ❶ 𝑨𝑇 𝑨𝐱 = 𝑨𝑇 𝐲 for 𝐲 = 𝑨𝐱, so that 𝒞(𝑨𝑇 𝑨) ⊂ 𝒞(𝑨𝑇 ). However, these two spaces must
be the same, as they have the same dimension.

❷ rank(𝑨) < rank(𝑨𝑸) < rank(𝑨𝑸𝑸−1 ) = rank(𝑨), so that rank(𝑨) = rank(𝑨𝑸), etc.

❸ we know that, rank(𝑨) = rank(𝑻−1 𝑨𝑻) = rank(𝜦). Then the rank(𝑨) is equal to the
number of nonzero eigenvalues.

In order to perform certain operations on matrices and


vectors, it is often useful first to reshape a matrix. The most common reshaping
operation is the transpose, which we define in this section. Sometimes we may need to
rearrange the elements of a matrix or form a vector into a special matrix. In this section,
we define some operators for doing this.

✔ Transpose The transpose of a matrix is the matrix whose 𝑖 𝑡ℎ row is the 𝑖 𝑡ℎ column of
the original matrix and whose 𝑗 𝑡ℎ column is the 𝑗 𝑡ℎ row of the original matrix. We use a
superscript “𝑇” to denote the transpose of a matrix; thus, if 𝑨 = (𝒂𝑖𝑗 ) then 𝑨𝑻 = (𝒂𝑗𝑖 ). If
the elements of the matrix are from the field of complex numbers, the conjugate
transpose, also called the adjoint, is more useful than the transpose. We use a
superscript“𝐻” to denote the conjugate transpose of a matrix; thus, if 𝑨 = (𝒂𝑖𝑗 ) then
𝑨𝑯 = (𝒂
̅𝑗𝑖 ).
▪ (𝑨𝑩)−1 = 𝑩−1 𝑨−1 ▪ (𝑨𝑩𝑪 … )𝑇 = ⋯ 𝑪𝑇 𝑩𝑇 𝑨𝑇
▪ (𝑨𝑩𝑪 … )−1 = ⋯ 𝑪−1 𝑩−1 𝑨−1 ▪ (𝑨𝐻 )−1 = (𝑨−1 )𝐻
▪ (𝑨𝑇 )−1 = (𝑨−1 )𝑇 ▪ (𝑨 + 𝑩)𝐻 = 𝑩𝐻 + 𝑨𝐻
▪ (𝑨 + 𝑩)𝑇 = 𝑩𝑇 + 𝑨𝑇 ▪ (𝑨𝑩)𝐻 = 𝑩𝐻 𝑨𝐻
▪ (𝑨𝑩)𝑇 = 𝑩𝑇 𝑨𝑇 ▪ (𝑨𝑩𝑪 … )𝐻 = ⋯ 𝑪𝐻 𝑩𝐻 𝑨𝐻

✔ Diagonal Matrices and Diagonal Vectors: A square diagonal matrix can be specified
by the diag(. ) constructor function that operates on a vector and forms a diagonal matrix
𝒂1
with the elements of the vector along the diagonal: diag([𝒂1 ⋯ 𝒂𝑛 ]) = ( ⋱ ). The
𝒂𝑛
vecdiag(. ) function forms a vector from the principal diagonal elements of a matrix. If 𝑨
is an 𝑛 × 𝑚 matrix, and k =min(n,m), vecdiag(𝑨) = [𝒂11 ⋯ 𝒂𝑘𝑘 ]

✔ Partitioned Matrices We often find it useful to partition a matrix into submatrices;


for example, in many applications in data analysis, it is often convenient to work with
submatrices of various types representing different subsets of the data. We usually
denote the submatrices with capital letters with subscripts indicating the relative
𝑨 𝑨12
positions of the submatrices. Hence, we may write 𝑨 = ( 11 ). Of course, the
𝑨21 𝑨22
submatrices in a partitioned matrix may be denoted by different letters. Also, for clarity,
sometimes we use a vertical bar to indicate a partition: 𝑨 = [𝑩|𝑪] . The vertical bar is
used just for clarity and has no special meaning in this representation. The term “
submatrix” is also used to refer to a matrix formed from a given matrix by deleting
various rows and columns of the given matrix.

Partitioned matrices may have useful patterns. A “block diagonal” matrix is one of the
𝑨1
form blckdiag(𝑨1 ⋯ 𝑨𝑘 ) = ( ⋱ ). The diag(. ) function previously introduced for a
𝑨𝑘
vector is also defined for a list of matrices, where the term blckdiag denotes the block
diagonal matrix with submatrices 𝑨1 ⋯ 𝑨𝑘 along the diagonal and zeros elsewhere. A
matrix formed in this way is sometimes called a direct sum of 𝑨1 ⋯ 𝑨𝑘 , and the operation
is denoted by ⊕: 𝑨1 ⊕ ⋯ ⊕ 𝑨𝑘 = blckdiag(𝑨1 ⋯ 𝑨𝑘 ). The transpose of a partitioned matrix is
formed in the obvious way; for example,
𝑇 𝑨11 𝑇 𝑨21 𝑇
𝑨 𝑨 𝑨13
𝑨𝑻 = ( 11 12
) = (𝑨12 𝑇 𝑨22 𝑇 )
𝑨21 𝑨22 𝑨23
𝑨13 𝑇 𝑨23 𝑇

✔ The Trace and Determinant There are several useful mappings from matrices to real
numbers; that is, from ℝ𝑛×𝑚 to ℝ. Some important ones are norms, which are similar to
vector norms and which we will consider later. In this section and the next, we define
two scalar-valued operators, the trace and the determinant, that apply to square
matrices.

The sum of the diagonal elements of a square matrix is called the trace of the matrix. We
use the notation “𝑡𝑟(𝑨)” to denote the trace of the matrix 𝑨: 𝑡𝑟(𝑨) = ∑𝑖 𝑎𝑖𝑖 . The
determinant, like the trace, is a mapping from ℝ𝑛×𝑛 to ℝ. Although it may not be obvious
from the definition below, the determinant has far-reaching applications in matrix
theory.

▪ 𝑡𝑟(𝑨) = ∑𝑖 𝑎𝑖𝑖 ▪ 𝑡𝑟(𝑨𝑩𝑪) = 𝑡𝑟(𝑩𝑪𝑨) = 𝑡𝑟(𝑪𝑨𝑩)


▪ 𝑡𝑟(𝑨) = ∑𝑖 𝜆𝑖 , 𝜆𝑖 = eig(𝑨) ▪ det(𝑨) = ∏𝑖 𝜆𝑖 , 𝜆𝑖 = eig(𝑨)
▪ 𝑡𝑟(𝑨) = 𝑡𝑟(𝑨 𝑇) ▪ det(𝑨𝑩) = det(𝑨) det(𝑩)
▪ 𝑡𝑟(𝑨𝑩) = 𝑡𝑟(𝑩𝑨) ▪ det(𝑨−1 ) = 1/ det(𝑨−1 )
▪ 𝑡𝑟(𝑨 + 𝑩) = 𝑡𝑟(𝑨) + 𝑡𝑟(𝑩) ▪ det(𝑰 + 𝐮𝐯 𝑇 ) = 1 + 𝐯 𝑇 𝐮
▪ 𝑡𝑟(𝐮𝐯 𝑇 ) = 𝐮𝑇 𝐯 ▪ 𝑡𝑟(𝐗𝐯𝐮𝑇 ) = 𝐮𝑇 𝐗𝐯
▪ det(𝑰 + 𝑨𝑇 𝑩) = det(𝑰 + 𝑨𝑩𝑇 ) = det(𝑰 + 𝑩𝑇 𝑨) = det(𝑰 + 𝑩𝑨𝑇 )
▪ det(𝑨 + 𝐮𝐯 𝑇 ) = (1 + 𝐯 𝑇 𝑨−1 𝐮) det(𝑨)

✔ Vectorization The vector formed by concatenating all the columns of 𝑿 is written


𝐯𝐞𝐜(𝑿) and here is some basic properties of such operator

▪ 𝒂 ⊗ 𝒃 = 𝐯𝐞𝐜(𝒃𝒂𝑇 ) where ⊗ denotes the Kroneker product.


▪ 𝐯𝐞𝐜(𝑨𝑩) = (𝑰 ⊗ 𝑨)𝐯𝐞𝐜(𝑩) = (𝑩𝑇 ⊗ 𝑰)𝐯𝐞𝐜(𝑨) = (𝑩𝑇 ⊗ 𝑨)𝐯𝐞𝐜(𝑰)
▪ 𝐯𝐞𝐜(𝑨𝒃𝒄𝑇 ) = (𝒄 ⊗ 𝑨)𝒃 = 𝒄 ⊗ 𝑨𝒃 ▪ 𝑨𝑩𝒄 = (𝒄𝑇 ⊗ 𝑨) 𝐯𝐞𝐜(𝑩)
▪ 𝒂𝑇 𝑩𝒄 = (𝒄 ⊗ 𝒂)𝑇 𝐯𝐞𝐜(𝑩) = (𝒄𝑇 ⊗ 𝒂𝑇 )𝐯𝐞𝐜(𝑩)
= 𝐯𝐞𝐜(𝒂𝒄𝑇 )𝑇 𝐯𝐞𝐜(𝑩) = 𝐯𝐞𝐜(𝑩)𝑇 (𝒂 ⊗ 𝒄) = 𝐯𝐞𝐜(𝑩)𝑇 𝐯𝐞𝐜(𝒄𝒂𝑇 )
A symmetric matrix 𝑨 such
that for any (conformable and real) vector 𝐱 the quadratic form 𝐱 𝑇 𝑨𝐱 is nonnegative, that
is, 𝐱 𝑇 𝑨𝐱 ≥ 𝟎, is called a nonnegative definite matrix (or positive-semidefinite). We denote
the fact that 𝑨 is nonnegative definite by 𝑨 ≽ 𝟎. (Note that we consider 𝟎𝑛×𝑛 to be
nonnegative definite). A symmetric matrix 𝑨 is called a positive definite matrix if the
quadratic form 𝐱 𝑇 𝑨𝐱 > 𝟎 for any (conformable) vector 𝐱 ≠ 𝟎 . We denote the fact that A is
positive definite by 𝑨 ≻ 𝟎.

Theorem: ❶ The eigenvalues of a positive-semidefinite "p.s.d" matrix are nonnegative,


and the eigenvalues of a p.d. matrix 𝑨 are all positive.

❷ If 𝑨 is p.s.d., then 𝑡𝑟(𝑨) > 0. This follows from the previous result.

❸ 𝑨 is p.s.d. of rank 𝑟 if and only if there exists an 𝑛 × 𝑛 matrix 𝑹 of rank 𝑟 such that
𝑨 = 𝑹𝑹𝑇 . Also 𝑨 is p.d. if and only if there exists a nonsingular 𝑹 such that 𝑨 = 𝑹𝑹𝑇 .

Proof:

❶ If 𝑻𝑇 𝑨𝑻 = 𝜦, then substituting 𝐱 = 𝑻𝐲, we have 𝐱 𝑇 𝑨𝐱 = 𝐲 𝑇 𝑻𝑇 𝑨𝑻𝐲 = 𝐲 𝑇 𝜦𝐲 = ∑𝑖 𝜆𝑖 y𝑖 2 ≥ 0,


leads to 0 ≤ 𝜆𝑖 .

❷ If 𝑨 is p.s.d., then 𝑡𝑟(𝑨) > 0. This can be deduced directly from the previous result.

❸ Given a p.s.d matrix 𝑨 of rank 𝑟, then, 𝜦 = diag(𝜆1 , 𝜆2 , . . . , 𝜆𝑟 , 0, … , 0), where 𝜆𝑖 > 0 with
(𝑖 = 1,2, … , 𝑟). Let 𝜦1/2 = diag(𝜆11/2 , 𝜆21/2 , . . . , 𝜆𝑟 1/2 , 0, … , 0), then 𝑻𝑇 𝑨𝑻 = 𝜦 implies that
𝑨 = 𝑻𝜦1/2 𝜦1/2 𝑻𝑇 = 𝑹𝑹𝑇 , where rank(𝑹) = rank(𝜦1/2 ) = 𝑟. Conversely, if 𝑨 = 𝑹𝑹𝑇 , then
rank(𝑨) = rank(𝑹) = 𝑟 and 𝐱 𝑇 𝑨𝐱 = 𝐱 𝑇 𝑹𝑹𝑇 𝐱 = 𝐲 𝑇 𝐲 > 0, where 𝐲 = 𝑹𝑇 𝐱 .

Theorem: ❶ If 𝑨 is an 𝑛 × 𝑛 p.s.d. matrix of rank 𝑟, then there exists an 𝑛 × 𝑟 matrix 𝑺 of


rank 𝑟 such that 𝑺𝑇 𝑨𝑺 = 𝑰𝑟 .
❷ If 𝑨 is p.s.d., then 𝐗 𝑇 𝑨𝐗 ⟹ 𝑨𝐗 = 𝟎. And if 𝑨 is p.d., then so is 𝑨−1 .
❸ If 𝑨 is p.d., then rank(𝑪𝑨𝑪𝑻 ) = rank(𝑪).

❹ If 𝑨 is an 𝑛 × 𝑛 p.d. matrix and 𝑪 is 𝑝 × 𝑛 of rank 𝑝, then 𝑪𝑨𝑪𝑻 is p.d.


❺ If 𝑿 is 𝑛 × 𝑝 of rank 𝑝, then 𝑿𝑇 𝑿 is p.d.
❻ 𝑨 is p.d. if and only if all the leading minor determinants of 𝑨 [including det(𝑨) itself]
are positive.
❼ The diagonal elements of a positive-definite matrix are all positive.

𝜦𝑟 𝟎
Proof: ❶ From 𝑻𝑇 𝑨𝑻 = ( ) we have 𝑻1 𝑇 𝑨𝑻1 = 𝜦𝑟 , where 𝑻1 consists of the first 𝑟
𝟎 𝟎
columns of 𝑻. Setting 𝑆 = 𝑻1 𝜦𝑟 1/2 leads to the required result.

❷ we know that 𝟎 = 𝐗 𝑇 𝑨𝐗 = 𝐗 𝑇 𝑹𝑹𝑇 𝐗 = 𝑩𝑇 𝑩 (𝑩 = 𝑹𝑇 𝐗), which implies that 𝒃𝑖 𝑇 𝒃𝑖 = 𝟎; that


is, 𝒃𝑖 = 𝟎 for every column 𝒃𝑖 of 𝑩. Hence 𝑨𝑿 = 𝑹𝑩 = 𝟎.

Moreover, 𝑨−1 = (𝑹𝑹𝑇 )−1 = (𝑹𝑇 )−1 𝑹−1 = (𝑹−1 )𝑇 𝑹−1 = 𝑺𝑺𝑇 where 𝑺 is nonsingular. The result
then follows from above.

❸ rank(𝑪𝑨𝑪𝑻 ) = rank(𝑪𝑹𝑹𝑇 𝑪𝑻 ) = rank(𝑪𝑹) = rank(𝑪).


❹ 𝐱 𝑇 𝑪𝑨𝑪𝑻 𝐱 = 𝐲 𝑇 𝑨𝐲 > 0 with equality ⟺ 𝐲 = 𝟎 ⟺ 𝑪𝑻 𝐱 = 𝟎 ⟺ 𝐱 = 𝟎 (since the columns of 𝑪
are linearly independent). Hence 𝐱 𝑇 𝑪𝑨𝑪𝑻 𝐱 > 0 all 𝐱, 𝐱 ≠ 𝟎.

❺ 𝐱 𝑇 𝑿𝑻 𝑿𝐱 = 𝐲 𝑇 𝐲 > 0 with equality ⟺ 𝑿𝐱 = 𝟎 ⟺ 𝐱 = 𝟎 (since the columns of 𝑿 are linearly


independent).

❻ If 𝑨 is p.d., then det(𝑨) = det(𝑻𝜦𝑻𝑇 ) = det(𝜦) = ∏𝑖 𝜆𝑖 > 0. The complete proof can be
found in Seber.G and Lee.A 2003.

1 when 𝑖 = 𝑗
❼ Setting 𝐱𝑗 = 𝛿𝑖𝑗 = { } with (𝑗 = 1,2, … , 𝑛), we have 0 < 𝐱 𝑇 𝑨𝐱 = 𝑎𝑖𝑖 .
0 when 𝑖 ≠ 𝑗

Theorem: (Cholesky decomposition) If 𝑨 is p.d, there exists a unique upper triangular


matrix 𝑹 with positive diagonal elements such that 𝑨 = 𝑹𝑇 𝑹.

Proof: The complete proof can be found in Seber.G and Lee.A 2003.

Theorem: (Square root of a positive-definite matrix) If 𝑨 is p.d., there exists a p.d. square
2
root 𝑨1/2 such that (𝑨1/2 ) = 𝑨.

Proof: Let 𝑨 = 𝑻𝜦𝑻𝑇 be the spectral decomposition of 𝑨, where the diagonal elements of
2
𝜦 are positive. Let 𝑨1/2 = 𝑻𝜦1/2 𝑻𝑇 ; then (𝑨1/2 ) = (𝑻𝜦1/2 𝑻𝑇 )(𝑻𝜦1/2 𝑻𝑇 ) = 𝑻𝜦𝑻𝑇 = 𝑨 (since
𝑻𝑇 𝑻 = 𝑰𝑛 ).

The most common kind of product of two


matrices is the Cayley product (the well-known), and when we speak of matrix
multiplication without qualification, we mean the Cayley product. Three other types of
matrix multiplication that are useful are Hadamard multiplication, Kronecker
multiplication, and dot product multiplication.

✔ The Hadamard Product Hadamard multiplication denoted by ⨀ is defined for


matrices of the same shape as the multiplication of each element of one matrix by the
corresponding element of the other matrix. Hadamard multiplication immediately
inherits the commutativity, associativity, and distribution over addition of the ordinary
multiplication of the underlying field of scalars. Hadamard multiplication is also called
array multiplication and element-wise multiplication (In some Books is named the Schur
Prooduct). Hadamard matrix multiplication is a mapping ℝ𝑚×𝑛 × ℝ𝑚×𝑛 → ℝ𝑚×𝑛 . The
identity for Hadamard multiplication is the matrix of appropriate shape whose elements
are all 1s.

✔ The Kronecker Product Kronecker multiplication, denoted by ⊗, is defined for any


𝑎11 𝑩 ⋯ 𝑎1𝑛 𝑩
two matrices 𝑨𝑚×𝑛 and 𝑩𝑝×𝑞 as 𝑨 ⊗ 𝑩 = ( ⋮ ⋱ ⋮ ). The Kronecker product of 𝑨
𝑎𝑚1 𝑩 … 𝑎𝑚𝑛 𝑩
and 𝑩 is 𝑚𝑝 × 𝑛𝑞; that is, Kronecker operator is a mapping ℝ𝑚×𝑛 × ℝ𝑝×𝑞 → ℝ𝑚𝑝×𝑛𝑞 . The
Kronecker product is also called the “right direct product” or just direct product. (A left
direct product is a Kronecker product with the factors reversed.) Kronecker
multiplication is not commutative, but it is associative and it is distributive over
addition. The identity for Kronecker multiplication is the 1 × 1 matrix with the element 1;
that is, it is the same as the scalar 1.
✔ The Dot Product or the Inner Product of Matrices Another product of two matrices
of the same shape is defined as the sum of the dot products of the vectors formed from
the columns of one matrix with vectors formed from the corresponding columns of the
other matrix; that is, if 𝒂1 , . . . , 𝒂𝑛 are the columns of 𝑨 and 𝒃1 , . . . , 𝒃𝑛 are the columns of 𝑩,
then the dot product of 𝑨 and 𝑩, denoted 〈𝑨, 𝑩〉 is 〈𝑨, 𝑩〉 = ∑𝑛𝑖=1 𝒂𝑖 𝑇 𝒃𝑖 . For conformable
matrices 𝑨, 𝑩, and 𝑪, we can easily confirm that this product satisfies the general
properties of an inner product

• If 𝑨 ≠ 𝟎, 〈𝑨, 𝑨〉 > 0, and 〈𝟎, 𝑨〉 = 〈𝑨, 𝟎〉 = 〈𝟎, 𝟎〉 = 0.


• 〈𝑨, 𝑩〉 = 〈𝑩, 𝑨〉 .
• 〈𝜂𝑨, 𝑩〉 = 𝜂〈𝑨, 𝑩〉 , for a scalar 𝜂.
• 〈〈𝑨, 𝑩〉, 𝑪〉 = 〈𝑨, 𝑪〉 + 〈𝑩, 𝑪〉 .

The dot product of the matrices 𝑨 and 𝑩 with the same shape is denoted by 𝑨⦁𝑩, or 〈𝑨, 𝑩〉,
just like the dot product of vectors. We see from the definition above that the dot
product of matrices satisfies 〈𝑨, 𝑩〉 = 〈𝑨𝑻 , 𝑩𝑻 〉 = 𝑡𝑟(𝑨𝑻 𝑩).

Like any inner product, dot products of matrices obey the Cauchy-Schwarz inequality

〈𝑨, 𝑩〉 ≤ 〈𝑨, 𝑨〉1/2 〈𝑩, 𝑩〉1/2

We can likewise define an orthogonal binary relationship between two matrices in terms
of dot products of matrices. We say the matrices 𝑨 and 𝑩 of the same shape are
orthogonal to each other if 〈𝑨, 𝑩〉 = 0

From the definitions above ''〈𝑨, 𝑩〉 = 𝑡𝑟(𝑨𝑻 𝑩)'' we see that the matrices 𝑨 and 𝑩 are
orthogonal to each other if and only if 𝑨𝑻 𝑩 and 𝑩𝑻 𝑨 are hollow (that is, they have 0s in all
diagonal positions). We also use the term “orthonormal” to refer to matrices that are
orthogonal to each other and for which each has a dot product with itself of 1.

Let the matrix 𝑨 be partitioned as


𝑨11 𝑨12
𝑨=( ) Then the number of linearly independent rows of 𝑨 must be at least as
𝑨21 𝑨22
great as the number of linearly independent rows of [𝑨11 |𝑨12 ] and the number of linearly
independent rows of [𝑨21 |𝑨22 ]. We could go through a similar argument relating to the
number of linearly independent columns and arrive at the inequality rank(𝑨𝑖𝑗 ) ≤ rank(𝑨) .
Furthermore, we see that

rank(𝑨) ≤ rank([𝑨11 |𝑨12 ]) + rank([𝑨21 |𝑨22 ])

𝑨11 𝑨
Likewise, we have rank(𝑨) ≤ rank ([ ]) + rank ([ 12 ]).In a similar manner, by merely
𝑨21 𝑨22
counting the number of independent rows, we see that,

if 𝒱([𝑨11 |𝑨12 ]𝑇 ) ⊥ 𝒱([𝑨21 |𝑨22 ]𝑇 ) then rank(𝑨) = rank([𝑨11 |𝑨12 ]) + rank([𝑨21 |𝑨22 ])

𝑨11 𝑨 𝑨11 𝑨
if 𝒱 ([ ]) ⊥ 𝒱 ([ 12 ]) then rank(𝑨) = rank ([ ]) + rank ([ 12 ])
𝑨21 𝑨22 𝑨21 𝑨22
The invariance of the trace to
permutations of the factors in a product (i. e. 𝑡𝑟(𝑨𝑩) = 𝑡𝑟(𝑩𝑨)) is particularly useful in
working with quadratic forms. Because the quadratic form itself is a scalar (or a 1 × 1
matrix), and because of the invariance, we have the very useful fact

𝐱 𝑇 𝑨𝐱 = 𝑡𝑟(𝐱 𝑇 𝑨𝐱) = 𝑡𝑟(𝑨𝐱𝐱 𝑇 ).

Given a real matrix 𝑨, an important matrix product is


𝑇
𝑨 𝑨. (This is called a Gramian matrix). First, for any 𝑚 × 𝑛 matrix 𝑨, we have the fact
that 𝑨𝑇 𝑨 = 𝟎 if and only if 𝑨 = 𝟎. Also, 𝑡𝑟(𝑨𝑇 𝑨) = 𝟎 if and only if 𝑨 = 𝟎. Another useful fact
about 𝑨𝑇 𝑨 is that it is nonnegative definite. In addition, we see that 𝑨𝑇 𝑨 is positive
definite if and only if 𝑨 is of full column rank.

Theorem: rank(𝑨) = rank(𝑨𝑇 ) = rank(𝑨𝑇 𝑨) = rank(𝑨𝑨𝑇 ) "an alternative way of proof"

Proof: Let 𝑨 be an 𝑚 × 𝑛 matrix, and let 𝑟 = rank(𝑨). If r > 0, interchange columns of 𝑨 if


necessary to obtain a partitioning similar to 𝑨 = [𝑨1 |𝑨2 ] where 𝑨1 is an 𝑚 × 𝑟 matrix of
rank r. Now, because 𝑨1 is of full column rank, there is an 𝑟 × 𝑛 − 𝑟 matrix 𝑩 such that
𝑰
𝑨2 = 𝑨1 𝑩; hence we have 𝑨 = 𝑨1 [𝑰𝑟 |𝑩] and 𝑨𝑇 𝑨 = [ 𝑟𝑇 ] 𝑨1 𝑇 𝑨1 [𝑰𝑟 |𝑩] Because 𝑨1 is of full
𝑩
𝑰 𝑟 𝟎
rank, rank(𝑨1 𝑇 𝑨1 ) = 𝑟. Now let 𝑻 = [ 𝑇 ]. It is clear that 𝑻 is of full rank, and so
−𝑩 𝑰𝑛−𝑟

rank(𝑨𝑇 𝑨) = rank(𝑻𝑨𝑇 𝑨𝑻𝑇 )


𝑇
= rank ([𝑨1 𝑨1 𝟎])
𝟎 𝟎
𝑇
= rank(𝑨1 𝑨1 ) = 𝑟

that is, rank(𝑨) = rank(𝑨𝑇 𝑨), which implies that rank(𝑨𝑇 ) = rank(𝑨𝑨𝑇 ) and because
𝑰 𝟎
𝑨𝑇 = (𝑸−1 )𝑇 [ 𝑟 ] (𝑷−1 )𝑇 then we obtain rank(𝑨𝑇 ) = rank(𝑰𝑟 ) = 𝑟 = rank(𝑨) so
𝟎 𝟎
rank(𝑨) = rank(𝑨𝑇 ) = rank(𝑨𝑇 𝑨) = rank(𝑨𝑨𝑇 )

The rank of the sum of two


matrices is less than or equal to the sum of their ranks; that is,

rank(𝑨 + 𝑩) ≤ rank(𝑨) + rank(𝑩)

𝑰
We can see this by observing that 𝑨 + 𝑩 = [𝑨|𝑩] [ ] and so rank(𝑨 + 𝑩) ≤ rank([𝑨|𝑩]),
𝑰
which implies that rank(𝑨 + 𝑩) ≤ rank(𝑨) + rank(𝑩).

Theorem: If 𝑨 is 𝑛 × 𝑛 (that is, square) and 𝑩 is a matrix with 𝑛 rows, then

rank(𝑨𝑩) ≥ rank(𝑨) + rank(𝑩) − 𝑛.

Proof: Equation rank(𝑨𝑩) < min(rank(𝑨), rank(𝑩)) gives an upper bound on the rank of the
product of two matrices; the rank cannot be greater than the rank of either of the
𝑰 𝟎 −1
factors. Now, using equation 𝑨 = 𝑷−1 [ 𝑟 ] 𝑸 , we develop a lower bound on the rank
𝟎 𝟎
of the product of two matrices if one of them is square. We see this by first letting
𝑟 = rank(𝑨), letting 𝑷 and 𝑸 be matrices that form an equivalent canonical form of 𝑨, and
𝟎 𝟎
then forming 𝑪 = 𝑷−1 [𝟎 𝑰 ] 𝑸−1 so that 𝑨 + 𝑪 = 𝑷−1 𝑸−1. Because 𝑷−1 and 𝑸−1 are of full
𝑛–𝑟

rank, rank(𝑪) = rank (𝑰𝑛–𝑟 ) = 𝑛 − rank(𝑨) We now develop an upper bound on rank(𝑩),
rank(𝑩) = rank(𝑷−1 𝑸−1 𝑩)
= rank(𝑨𝑩 + 𝑪𝑩)
≤ rank(𝑨𝑩) + rank(𝑪𝑩),
≤ rank(𝑨𝑩) + rank(𝑪),
= rank(𝑨𝑩) + 𝑛 − rank(𝑨),

The inverse of the Cayley product of


two nonsingular matrices of the same size is particularly easy to form. If 𝑨 and 𝑩 are
square full rank matrices of the same size, (𝑨𝑩)−1 = 𝑩−1 𝑨−1 We can see this by
multiplying 𝑩−1 𝑨−1 and (𝑨𝑩). Often in linear regression analysis we need inverses of
various sums of matrices. This may be because we wish to update regression estimates
based on additional data or because we wish to delete some observations. If 𝑨 and 𝑩 are
full rank matrices of the same size, the following relationships are easy to show

𝑨(𝑰 + 𝑨)−1 = (𝑰 + 𝑨−1 )−1 ,


(𝑨 + 𝑩𝑩𝑇 )−1 𝑩 = 𝑨−1 𝑩(𝑰 + 𝑩𝑇 𝑨−1 𝑩)−1
(𝑨−1 + 𝑩−1 )−1 = 𝑨(𝑨 + 𝑩)−1 𝑩,
𝑨 − 𝑨(𝑨 + 𝑩)−1 𝑨 = 𝑩 − 𝑩(𝑨 + 𝑩)−1 𝑩,
𝑨−1 + 𝑩−1 = 𝑨−1 (𝑨 + 𝑩)𝑩−1 ,
(𝑰 + 𝑨𝑩)−1 = 𝑰 − 𝑨(𝑰 + 𝑩𝑨)−1 𝑩,
(𝑰 + 𝑨𝑩)−1 𝑨 = 𝑨(𝑰 + 𝑩𝑨)−1
𝑨 + 𝑩 = 𝑨 (𝑨−1 + 𝑩−1 )𝑩 = 𝑩 (𝑨−1 + 𝑩−1 )𝑨
= 𝑨 (𝑨−1 + 𝑨−1 𝑩𝑨−1 ) 𝑨 = 𝑩 (𝑩−1 + 𝑩−1 𝑨𝑩−1 ) 𝑩

When 𝑨 and/or 𝑩 are not of full rank, the inverses may not exist, but in that case these
equations hold for a generalized inverse.

The next identities are useful because it says how a matrix changes if you add a bit onto
its inverse. They are variously called the Matrix Inversion Lemma, Sherman-Morrison
formula and Sherman-Morrison-Woodbury formula.

 [ (I+VHAU) non-singular]: (𝑨−1 + 𝑼𝑽𝐻 )−1 = 𝑨 − 𝑨𝑼(𝑰 + 𝑽𝐻 𝑨𝑼)−1 𝑽𝐻 𝑨

 [ (I+VHA-1U) non-singular]: (𝑨 + 𝑼𝑽𝐻 )−1 = 𝑨−1 − 𝑨−1 𝑼(𝑰 + 𝑽𝐻 𝑨−1 𝑼)−1 𝑽𝐻 𝑨−1
 [vHAu ≠ -1]: (𝑨−1 + 𝐮𝐯 𝐻 )−1 = 𝑨 − 𝑨𝐮𝐯 𝐻 𝑨/(1 + 𝐯 𝐻 𝑨𝐮)
 [ (C+VHAU) non-singular]: (𝑨−1 + 𝑼𝑪−1 𝑽𝐻 )−1 = 𝑨 − 𝑨𝑼(𝑪 + 𝑽𝐻 𝑨𝑼)−1 𝑽𝐻 𝑨
 [ (C+VHAV) non-singular]: (A−1 + VC−1 V H )−1 = A − AV(C + V H AV)−1 V H A
 [ (I+VHAV) non-singular]: (𝑨−1 + 𝑽𝑽𝐻 )−1 = 𝑨 − 𝑨𝑽(𝑰 + 𝑽𝐻 𝑨𝑽)−1 𝑽𝐻 𝑨
 [ (I+VHAU) non-singular]: (𝑨−1 + 𝑼𝑽𝐻 )−1 𝑼 = 𝑨𝑼(𝑰 + 𝑽𝐻 𝑨𝑼)−1
 [ (I+VHAU) non-singular]: 𝑽𝐻 (𝑨−1 + 𝑼𝑽𝐻 )−1 = (𝑰 + 𝑽𝐻 𝑨𝑼)−1 𝑽𝐻 𝑨
 [ (I+VHAU) non-singular]: 𝑽𝐻 (𝑨−1 + 𝑼𝑽𝐻 )−1 𝑼 = 𝑰 − (𝑰 + 𝑽𝐻 𝑨𝑼)−1
 [vHA-1u ≠ -1]: (𝑨 + 𝐮⨂𝐯 𝐻 )−1 = 𝑨−1 + (𝑨−1 𝐮)⨂(𝐯 𝐻 𝑨−1 )/(1 + 𝐯 𝐻 𝑨−1 𝐮)

 det(𝑨−1 + 𝑼𝑽𝐻 ) = det(𝑰 + 𝑽𝐻 𝑨𝑼) × det(𝑨−1 ) sometimes called the Matrix Determinant
Lemma det(𝑨−1 + 𝐮𝐯 𝐻 ) = (1 + 𝐯 𝐻 𝑨𝐮) × det(𝑨−1 )
A matrix 𝜬 is idempotent if 𝑷2 = 𝑷. A symmetric idempotent
matrix is called a projection matrix.

Theorem:

▪ If 𝑷2 = 𝑷 is symmetric, then 𝜬 is idempotent and of rank 𝑟 if and only if it has 𝑟


eigenvalues equal to unity and 𝑛 − 𝑟 eigenvalues equal to zero.

▪ A projection matrix 𝑷 satisfy, 𝑡𝑟(𝑷) = rank(𝑷).

▪ If 𝑷 is idempotent, so is 𝑰 − 𝑷.

▪ Projection matrices are positive-semidefinite. Proof: 𝐱 𝑇 𝑷𝐱 = 𝐱 𝑇 𝑷2 𝐱 = (𝑷𝐱)𝑇 (𝑷𝐱) ≥ 0.

In mathematics, a norm is a function from a vector


space over the real or complex numbers to the nonnegative real numbers that satisfies
certain properties pertaining to scalability and additivity, and takes the value zero only if
the input vector is zero.

Given a vector space 𝓥 over a field 𝔽 of the real numbers ℝ or complex numbers ℂ, a
norm on 𝓥 is a nonnegative-valued function Norm : 𝓥 → ℝ with the following properties:

▪ If Norm(𝐯) = 0 then 𝐯 = 𝟎 being positive definite or being point-separating.


▪ Norm(𝛼𝐯) = |𝛼|Norm(𝐯) being absolutely homogeneous or absolutely scalable.
▪ Norm(𝐯 + 𝐮) ≤ Norm(𝐯) + Norm(𝐮) being subadditive or satisfying the triangle inequality.

✔ The p-norm: There are many norms that could be defined for vectors. One type of
norm is called an 𝐿𝑝 norm, often denoted as ‖ . ‖𝑝 . For 𝑝 ≥ 1, it is defined as
1/𝑝

‖𝐱‖𝑝 = (∑|𝑥𝑖 ) |𝑝
𝑖
• ‖𝐱‖1 = ∑𝑖 |𝑥𝑖 |, also called the Manhattan norm because it corresponds to sums of
distances along coordinate axes, as one would travel along the rectangular street plan of
Manhattan.

• ‖𝐱‖2 = √∑𝑖(𝑥𝑖 )2 , also called the Euclidean norm, the Euclidean length, or just the length
of the vector. The 𝐿𝑝 norm is the square root of the inner product of the vector with itself:
‖𝐱‖2 = √〈𝐱, 𝐱〉.

• ‖𝐱‖∞ = max𝑖 |𝑥𝑖 | , also called the max norm or the Chebyshev norm. The 𝐿∞ norm is
defined by taking the limit in an 𝐿𝑝 norm, and we see that it is indeed max𝑖 |𝑥𝑖 | by
expressing it as
1/𝑝 1/𝑝
𝑥𝑖 𝑝
‖𝐱‖∞ = lim ‖𝐱‖𝑝 = lim (∑|𝑥𝑖 |𝑝 ) = 𝑚 lim (∑ | | )
𝑝→∞ 𝑝→∞ 𝑝→∞ 𝑚
𝑖 𝑖

With 𝑚 = max𝑖 |𝑥𝑖 |. Because the quantity of which we are taking the 𝑝𝑡ℎ root is bounded
above by the number of elements in 𝐱 and below by 1, that factor goes to 1 as 𝑝 → ∞. An
𝐿𝑝 norm is also called a p-norm, or 1-norm, 2-norm, or ∞-norm in those special cases. It
is easy to see that, for any n-vector 𝐱, the 𝐿𝑝 norms have the relationships ‖𝐱‖∞ ≤ ‖𝐱‖2 ≤
‖𝐱‖1. More generally, for given 𝐱 and for 𝑝 ≥ 1, we see that ‖𝐱‖𝑝 is a non-increasing
function of 𝑝.

✔ Cauchy-Schwarz inequality: the Cauchy–Schwarz inequality, also known as the


Cauchy–Bunyakovsky–Schwarz inequality, is a useful inequality encountered in many
different settings, such as linear algebra, analysis, probability theory, vector algebra and
other areas. It is considered to be one of the most important inequalities in all of
mathematics.
1 1
𝑛 𝑛 2 𝑛 2
𝑛 2 2
For all 𝐱, 𝐲 ∈ ℝ : |∑ 𝑥𝑖 𝑦𝑖 | ≤ (∑ 𝑥𝑖 ) (∑ 𝑦𝑖 ) can be rewritten as |𝐱 𝑇 𝐲| ≤ ‖𝐱‖. ‖𝐲‖
𝑖=1 𝑖=1 𝑖=1

✔ The A-norm: Given a positive definite matrix 𝑨 ∈ ℝ𝑛×𝑛 , define the A-norm on ℝ𝑛 by
1
‖𝐱‖𝑨 = (𝐱 𝑇 𝑨𝐱)2

Note: When 𝑨 = 𝑰, the A-norm is just the Euclidean norm. All p-norms on ℝ𝑛 are
equivalent to each other. In particular,
‖𝐱‖2 ≤ ‖𝐱‖1 ≤ √𝑛‖𝐱‖2
‖𝐱‖∞ ≤ ‖𝐱‖2 ≤ √𝑛‖𝐱‖∞
‖𝐱‖∞ ≤ ‖𝐱‖1 ≤ 𝑛‖𝐱‖∞

Note: For all 𝐱 ∈ ℝ𝑛 , ‖𝐱‖∞ ≤ ‖𝐱‖2 ≤ ‖𝐱‖1 ≤ √𝑛‖𝐱‖2 ≤ 𝑛‖𝐱‖∞

✔ Matrix norms: Norms on matrices are scalar functions of matrices with the three
properties that define a norm in general. Matrix norms are often required to have
another property, called the consistency property, in addition to the properties listed
before, which we repeat here for convenience. Assume 𝑨 and 𝑩 are matrices conformable
for the operations shown.

1. Nonnegativity and mapping of the identity:


if 𝑨 ≠ 𝟎, then ‖𝑨‖ > 0, and ‖𝟎‖ = 0.
2. Relation of scalar multiplication to real multiplication:
‖𝛼𝑨‖ = |𝛼|‖𝑨‖ for real 𝛼
3. Triangle inequality:
‖𝑨 + 𝑩 ‖ ≤ ‖𝑨‖ + ‖𝑩‖.
4. Consistency property:
‖𝑨𝑩 ‖ ≤ ‖𝑨‖ ‖𝑩‖.

Some people do not require the consistency property for a matrix norm. Most useful
matrix norms have the property, however, and we will consider it to be a requirement in
the definition. We note that the four properties of a matrix norm do not imply that it is
invariant to transposition of a matrix, and in general, ‖𝑨𝑻 ‖ ≠ ‖𝑨‖. For a square matrix 𝑨,
the consistency property for a matrix norm yields ‖𝑨𝑘 ‖ ≤ ‖𝑨‖𝑘 for any positive integer k.
A matrix norm ‖. ‖ is orthogonally invariant if 𝑨 and 𝑩 = 𝑷𝑨𝑸 being orthogonally similar
implies ‖𝑨‖ = ‖𝑷𝑨𝑸‖ where 𝑷 and 𝑸 are orthogonal, i.e., 𝑷𝑇 𝑷 = 𝑰 & 𝑸𝑇 𝑸 = 𝑰.
✔ Matrix Norms Induced from Vector Norms Some matrix norms are defined in terms
of vector norms. The matrix norm induced by the vector 𝐱 is defined by

‖𝑨𝐱‖
‖𝑨‖ = max
𝐱≠0 ‖𝐱‖

It is easy to see that an induced norm (subordinate norm) is indeed a matrix norm. The
first three properties of a norm are immediate, and the consistency property can be
verified by applying the definition. For any vector norm and its induced matrix norm, we
see that ‖𝑨𝐱 ‖ ≤ ‖𝑨‖ ‖𝐱‖

The matrix norms that correspond to the 𝐿𝑝 vector norms are defined for the 𝑚 × 𝑛
matrix 𝑨 as
‖𝑨‖𝑝 = max ‖𝑨𝐱‖𝑝
‖𝐱‖𝑝 =1

The 𝐿1 and 𝐿∞ norms have interesting simplifications: ‖𝑨‖1 = max𝑗 ∑𝑖|𝑎𝑖𝑗 | so the 𝐿1 is also
called the column-sum norm; and ‖𝑨‖∞ = max𝑖 ∑𝑗|𝑎𝑖𝑗 | so the 𝐿∞ is also called the row-sum
norm. we see that ‖𝑨𝑇 ‖∞ = ‖𝑨‖1 . Alternative formulations of the 𝐿2 norm of a matrix are
not so obvious. It is related to the eigenvalues (or the singular values) of the matrix. The
𝐿2 matrix norm is related to the spectral radius. ‖𝑨‖2 = √𝜆max (𝑨𝑇 𝑨). Because of this
relationship, the 𝐿2 matrix norm is also called the spectral norm.

For 𝑨 orthogonal, the 𝐿2 vector norm has the important property ‖𝑨𝐱‖2 = ‖𝐱‖2. For this
reason, an orthogonal matrix is sometimes called an isometric matrix. By the proper
choice of 𝐱, it is easy to see that ‖𝑨‖2 = 1. (See Algebra Book by Bekhiti 2020)

✔ The Frobenius Norm: Frobenius norm (or Schur norm) or the usual Norm is defined as

2
‖𝑨‖𝐹 = √∑ 𝑎𝑖𝑗
𝑖,𝑗

The Frobenius norm is sometimes called the Euclidean matrix norm and denoted by ‖. ‖𝐸 ,
although the 𝐿2 matrix norm is more directly based on the Euclidean vector norm, as we
mentioned above. We will usually use the notation ‖𝑨‖𝐹 to denote the Frobenius norm.
The Frobenius norm is also often called the “usual norm”, which emphasizes the fact
that it is one of the most useful matrix norms. Other names sometimes used to refer to
the Frobenius norm are Hilbert-Schmidt norm and Schur norm.

A useful property of the Frobenius norm that is obvious from the definition is

‖𝑨‖𝐹 = √𝑡𝑟(𝑨𝑇 𝑨) = √〈𝑨, 𝑨〉

From the commutativity of an inner product, we have ‖𝑨𝑇 ‖𝐹 = ‖𝑨‖𝐹

If 𝑨 and 𝑩 are orthogonally similar, then ‖𝑨‖𝐹 = ‖𝑩‖𝐹 that is, the Frobenius norm is an
orthogonally invariant norm. To see this, let 𝑨 = 𝑸𝑇 𝑩𝑸, where 𝑸 is an orthogonal matrix.
Then
‖𝑨‖2𝐹 = 𝑡𝑟(𝑨𝑇 𝑨)
= 𝑡𝑟(𝑸𝑇 𝑩𝑇 𝑸𝑸𝑇 𝑩𝑸)
= 𝑡𝑟(𝑩𝑇 𝑩𝑸𝑸𝑇 )
= 𝑡𝑟(𝑩𝑇 𝑩)
= ‖𝑩‖2𝐹

If 𝑨 is an 𝑚 × 𝑛 real matrix, we have some specific instances of

‖𝑨‖∞ ≤ √𝑛‖𝑨‖𝐹
‖𝑨‖𝐹 ≤ √min(𝑚, 𝑛) ‖𝑨‖2
‖𝑨‖2 ≤ √𝑛‖𝑨‖1
‖𝑨‖1 ≤ √𝑚‖𝑨‖2
‖𝑨‖2 ≤ ‖𝑨‖𝐹
‖𝑨‖𝐹 ≤ √𝑚‖𝑨‖∞

✔ Condition numbers: Numerical analysis involving the condition number is a useful


tool in scientific computing. The condition number has been involved in analyses of

⦁ Accuracy, due to the error involved in the data


⦁ Stability of algebraic systems
⦁ Convergence speed of iterative algebraic solvers
The condition number is one of most frequently-used measurements for matrices.

Definition: Let 𝑨 ∈ ℝ𝑛×𝑛 . Then 𝜅(𝑨) = ‖𝑨−1 ‖‖𝑨‖ is called the condition number of A,
associated to the matrix norm.

Lemma Let 𝑨 ∈ ℝ𝑛×𝑛 . Then

❶ 𝜅(𝑨) = 𝜅(𝑨−1 ) ❷ 𝜅(𝑐𝑨) = 𝜅(𝑨) for any 𝑐 ≠ 0. ❸ 𝜅(𝑰) = 1 and 𝜅(𝑨) ≥ 1, for any induced
matrix norm.

If 𝑨 is nonsingular and 𝜅(𝑨)‖𝛿𝑨‖ ≤ ‖𝑨‖ then 𝑨 + 𝛿𝑨 is nonsingular

Theorem: Let 𝑨 ∈ ℝ𝑛×𝑛 be nonsingular, and x and 𝐱̂ = 𝐱 + 𝛿𝐱 be the solutions of 𝑨𝐱 = 𝒃


and 𝑨𝐱̂ = 𝒃 + 𝛿𝒃; respectively. Then
‖𝛿𝐱‖ ‖𝛿𝒃‖
≤ 𝜅(𝑨)
‖𝐱‖ ‖𝒃‖

Proof: The equations 𝑨𝐱 = 𝒃 and 𝑨(𝐱 + 𝛿𝐱 ) = 𝒃 + 𝛿𝒃 imply that 𝑨(𝛿𝐱 ) = 𝛿𝒃, that is,
𝛿𝐱 = 𝑨−1 𝛿𝒃. Whatever vector norm we have chosen, we will use the induced matrix norm
to measure matrices. Thus ‖𝛿𝐱‖ ≤ ‖𝑨−1 ‖‖𝛿𝒃‖. Similarly, the equation 𝑨𝐱 = 𝒃 implies
‖𝒃‖ ≤ ‖𝑨‖‖𝐱‖, or equivalently

‖𝒃‖ ≤ ‖𝑨‖‖𝐱‖ ‖𝛿𝐱‖ ‖𝛿𝒃‖


{ −1 ⟹ ≤ 𝜅(𝑨)
‖𝛿𝐱‖ ≤ ‖𝑨 ‖‖𝛿𝒃‖ ‖𝐱‖ ‖𝒃‖
There are two important subspaces
𝑚×𝑛
associated with 𝑨 ∈ ℝ . The first one is called the range of 𝑨 and is defined by
Rang(𝑨) = {𝐲 ∈ ℝ : 𝐲 = 𝑨𝐱 for some 𝐱 ∈ ℝ𝑛 } and the second one is called the null space of 𝑨
𝑚

and is defined by Null(𝑨) = {𝐱 ∈ ℝ𝑛 : 𝑨𝐱 = 𝟎}. If 𝑨 = [𝒂1 , . . . , 𝒂𝑛 ] is a column partitioning,


then Rang(𝑨) = span{𝒂1 , . . . , 𝒂𝑛 }. The rank of a matrix 𝑨 is defined by
rank(𝑨) = dim(Rang(𝑨)) . It can be shown that rank(𝑨) = rank(𝑨𝑇 ). We say that 𝑨 ∈ ℝ𝑚×𝑛 is
range deficient if rank(𝑨) < min{𝑚, 𝑛}. If 𝑨 ∈ ℝ𝑚×𝑛 then: dim(Null(𝑨)) + rank(𝑨) = 𝑛.

Fundamental Spaces by QR Factorizations: Let 𝑨 be an ℝ𝑚×𝑛 matrix with rank 𝑛. 𝑄𝑅


decomposition finds orthonormal matrix 𝑸 ∈ ℝ𝑚×𝑚 and upper triangular matrix 𝑹 ∈ ℝ𝑚×𝑛
such that 𝑨 = 𝑸𝑹. If we define matrix 𝑸 = [𝑸1 𝑸2 ], where 𝑸1 ∈ ℝ𝑚×𝑛 and 𝑸2 ∈ ℝ𝑚×(𝑚−𝑛) ,
then the columns of 𝑸2 form the null space of 𝑨𝑇 . The columns of 𝑸1 form a basis for the
rang space of 𝑨.

𝑹1 𝑸𝑇 𝑨 𝑹 𝑸𝑇 𝑨 = 𝑹1 𝑨 = 𝑸1 𝑹1
𝑨 = 𝑸𝑹 = [𝑸1 𝑸2 ] ( ) ⟺ 𝑸𝑇 𝑨 = 𝑹 ⟺ ( 1𝑇 ) = ( 1 ) ⟺ { 1𝑇 ⟺{ 𝑇
𝟎 𝑸2 𝑨 𝟎 𝑸2 𝑨 = 𝟎 𝑨 𝑸2 = 𝟎

If 𝑨 has rank 𝑟 < 𝑛, the 𝑄𝑅 factorization does not necessarily yield an orthonormal basis
for range(𝑨). The 𝑄𝑅 decomposition is computed either by Gram-Schmidt, Givens
rotations, or Householder reflections. They have different stability properties and
operation counts. The dimension of the null space of 𝑨 is equal to the number of
eigenvalues of 𝑨 that are zero.

% Case: m>n
clear all, clc, A=10*rand(7,5); m=size(A,1); n=size(A,2);
r= rank(A); [Q R]=qr(A);
if r==n
Rspace=10*Q(:,1:n) % Rang space of 𝑨
Nspace=10*Q(:,n+1:end) % Null space of 𝑨𝑇
else
return
end
A'*Nspace
%-----------------------------------------%
% Case: m<n
clear all, clc, A=10*rand(5,7); m=size(A,1); n=size(A,2); p=min(m,n);
r= rank(A); [Q R]=qr(A'); % The QR factorization of 𝑨𝑇
if r==p
Rspace=10*Q(:,1:m) % Rang space of 𝑨𝑇
Nspace=10*Q(:,m+1:end) % Null space of 𝑨
else
return
end
A*Nspace
✔ Null-Space by Complete QR Factorizations: A complete 𝑄𝑅 factorization of 𝑨 ∈ ℝ𝑚×𝑛
𝑹 𝑹
of rank 𝑟 is a factorization of the form 𝑨 = 𝑸 ( ) 𝑽𝑇 = [𝑸1 𝑸2 ] ( ) 𝑽1𝑇 = 𝑸1 𝑹𝑽1𝑇 , where 𝑸
𝟎 𝟎
and 𝑽 are orthogonal matrices and 𝑹 ∈ ℝ𝑟×𝑛 is upper (or lower) triangular matrix. where
𝑸1 ∈ ℝ𝑚×𝑟 , and 𝑽1 ∈ ℝ𝑛×𝑛 , then 𝑸1 give orthogonal bases for the range space of 𝑨.
Similarly, 𝑽1 and 𝑸2 give orthogonal bases for the range and null spaces of 𝑨𝑇 .

Note: An advantage of the complete 𝑄𝑅 factorization of 𝑨𝑇 is that 𝑸2 gives an orthogonal


basis for the null space 𝒩(𝑨).

𝑹 𝑹 𝑸𝑇 𝑨𝑇 𝑇 𝑸𝑇 𝑨𝑇 = 𝑹𝑽𝑇 𝑨 = 𝑽𝑹𝑇 𝑸1𝑇


𝑨𝑇 = 𝑸 ( ) 𝑽𝑇 ⟺ [𝑸1 𝑸2 ]𝑇 𝑨𝑇 = ( ) 𝑽𝑇 ⟺ ( 1𝑇 𝑇 ) = (𝑹𝑽 ) ⟺ { 1𝑇 𝑇 ⟺{
𝟎 𝟎 𝑸2 𝑨 𝟎 𝑸2 𝑨 = 𝟎 𝑨𝑸2 = 𝟎

clear all, clc, M1=10*rand(11,10);


A=[M1(:,1:3) M1(:,1:5)]; m=size(A,1); n=size(A,2); r= rank(A);
[Q,R,V]=qr(A');
Zero=A'-Q*R*V'; % verification
Q1=Q(1:n,1:r); Q2=Q(1:n,r+1:end); V1=V(1:m,1:m);
A*Q2 % null space of A: 𝒩(𝑨).

✔ Null-Space by SVD Factorizations: Let 𝒁 = null(𝑨) be an orthonormal basis for the


null space of 𝑨 obtained from the singular value decomposition. That is, 𝑨𝒁 has
negligible elements, size(𝒁, 2) is the nullity of 𝑨, and 𝒁𝑇 𝒁 = 𝑰.

%-----------------------------------------------------%
% Null-Space by SVD
%-----------------------------------------------------%
clear all, clc
A=100*rand(6,12); m=size(A,1); n=size(A,2); % create m-by-n matrix
[U,S,V] = svd(A,0); % run svd
s_d=diag(S); s = diag(S);
column_basis = U(:,logical(s));
r=nnz(find(abs(s)>1e-12)) % rank of A
nullity=n-r % nullity of A
null_basis=V(:,r+1:n)
Zero= A*null_basis

There are a number of ways to compute the rank of a matrix. MATLAB uses the method
based on the singular value decomposition, or SVD. The SVD algorithm is the most time
consuming, but also the most reliable. The rank algorithm is

⦁ s = svd(A);
⦁ tol = max(size(A))*s(1)*eps;
⦁ r = sum(s > tol);
In MATLAB you can use 𝑟 = rank(𝑨, tol) returns the number of singular values of 𝑨 that
are larger than tol.

Remark: Everything which related to QR and SVD algorithms will be detailed later.
✔ Intersection of Subspaces: The intersection 𝑨 ∩ 𝑩 of two sets 𝑨 and 𝑩 contains only
elements which are in both subsets. So, the intersection is the set of all vectors that are
in both subspaces.

First method to find intersection of two subspaces: Consider the inner product space
⊥ ⊥
(𝑽, 〈|〉). We want to show: (𝑾1 + 𝑾2 )⊥ = 𝑾1 ∩ 𝑾2

Let's begin. First note that if 𝑾 ⊂ 𝑽 then 𝑽⊥ ⊂ 𝑾⊥ . This is trivial and left as an exercise to
the reader. We know that, 𝑾1 + 𝑾2 is the smallest subspace containing 𝑾1 and 𝑾2 . This
⊥ ⊥
is a pretty basic result. Hence, 𝑾1 , 𝑾2 ⊂ 𝑾1 + 𝑾2 and so we get 𝑾1 , 𝑾2 ⊃ (𝑾1 + 𝑾2 )⊥

⊥ ⊥
Now take note that (𝑾1 + 𝑾2 )⊥ is contained in both 𝑾1 , 𝑾2 . So, it's also contained in the
intersection of these two sets. Hence,

⊥ ⊥
(𝑾1 + 𝑾2 )⊥ ⊂ 𝑾1 ∩ 𝑾2

⊥ ⊥
Now we need to show the other direction of the inclusion. Let 𝜶 ∈ 𝑾1 ∩ 𝑾2 and 𝛽 be an
arbitrary vector in 𝑾1 + 𝑾2 . Then we can write 𝜷 = 𝒘1 + 𝒘2 , 𝒘1 ∈ 𝑾1 , 𝒘2 ∈ 𝑾2 Then,
〈𝜶|𝜷〉 = 〈𝜶|𝒘1 〉 + 〈𝜶|𝒘2 〉 = 𝟎. This follows because 𝜶 is in the orthogonal complement of
⊥ ⊥
both subspaces. Hence, 𝜶 ∈ (𝑾1 + 𝑾2 )⊥ . This proves that (𝑾1 + 𝑾2 )⊥ ⊃ 𝑾1 ∩ 𝑾2

⊥ ⊥
(𝑾1 + 𝑾2 )⊥ ⊂ 𝑾1 ∩ 𝑾2 ⊥ ⊥
{ ⊥ ⊥ ⟺ (𝑾1 + 𝑾2 )⊥ = 𝑾1 ∩ 𝑾2 ■

(𝑾1 + 𝑾2 ) ⊃ 𝑾1 ∩ 𝑾2

⊥ ⊥
Alternatively we can prove this formula (𝑾1 + 𝑾2 )⊥ = 𝑾1 ∩ 𝑾2 by using

𝐱 ∈ (𝑾1 + 𝑾2 )⊥ ⇔ ∀𝐲 ∈ 𝑾1 + 𝑾2 : 〈𝐱|𝐲〉 = 𝟎
⇔ ∀𝒛 ∈ 𝑾1 , ∀𝒕 ∈ 𝑾2 : 〈𝐱|𝐳〉 = 〈𝐱|𝒕〉 = 𝟎
⊥ ⊥
⇔ 𝐱 ∈ 𝑾1 ∩ 𝑾2

Algorithm:
⦁ 𝑿 = Null(𝑨𝑇 ); % 𝓧⊥ Orthogonal Complement of 𝓧
⦁ 𝒀 = Null(𝑩𝑇 ); % 𝓨⊥ Orthogonal Complement of 𝓨
⦁ 𝑿𝒀 = [𝑿 𝒀]; % Computing 𝓧⊥ + 𝓨⊥

⦁ 𝑻1 = Null(𝑿𝒀𝑇 ) % Basis for (𝓧⊥ + 𝓨⊥ )

Second method to find intersection of two subspaces: Assume that we have two
matrices 𝑨 ∈ ℝ𝑛×𝑚 & 𝑩 ∈ ℝ𝑛×𝑠 where the sets 𝓨1 & 𝓨2 are two vector spaces corresponding
to the ranges of 𝑨 & 𝑩 that is 𝐲1 ∈ 𝓨1 and 𝐲2 ∈ 𝓨2

Rang(𝑨) = {𝐲1 | 𝐲1 = 𝑨𝐱1 ∀𝐱1 }; Rang(𝑩) = {𝐲2 | 𝐲2 = 𝑩𝐱 2 ∀𝐱 2 }

𝐱1
The intersection of 𝓨1 & 𝓨2 is 𝐲 ∈ 𝓨1 ∩ 𝓨2 ⟺ 𝐲 = 𝑨𝐱1 = 𝑩𝐱2 ⟺ [𝑨 −𝑩] [𝐱 ] = 𝟎
2
Algorithm:
⦁ 𝑼1 ∈ ℝ𝑛×𝑚 ; 𝑼2 ∈ ℝ𝑛×𝑠 ; % It is necessarily that n<=min{m,s}
⦁ 𝑵 = Null([𝑼1 𝑼2 ]); % N = basis for nullspace of [U1 U2]
⦁ 𝑰 = 𝑼1 ⋆ 𝑵(1: 𝑚, : ); % I = basis for intersection of U1 and U2
⦁ 𝑰 = orth(𝑰); % returns an orthonormal basis for the range of 𝑰

clear all, clc, n=4; m=10; s=7; % It is necessarily that n<=min{m,s}


% Matlab code for the intersection of 2 subspaces U1 and U2
% First solve [U1 U2]*N=0 where N=[N1;N2]=basis of null space of [U1 U2]
% Then I=U1*N1=-U2*N2 is a basis for the intersection of U1 and U2
% Basis I may not be a minimal size basis, so use orth(I)
U1=10*rand(n,m); U2=10*rand(n,s);
N=null([U1 U2]); % N = basis for nullspace of [U1 U2]
I=U1*N(1:m,:); % I = basis for intersection of U1 and U2
I=orth(I) % I = orthonormal and minimal size basis
Zero=rank(I)+ rank([U1 U2])- rank(U1)- rank(U2)

Remark: The function 𝑸 = orth(𝑨) returns an orthonormal basis for the range of 𝑨. The
columns of 𝑸 are vectors, which span the range of 𝑨. The number of columns in 𝑸 is
equal to the rank of 𝑨.

Exercise: Determine the controllable but unobservable subspace of a system defined by


its state-space {𝑨, 𝑩, 𝑪, 𝑫}

clear all, clc, % Application in Control Engineering


A = [5 9 2 1; -5 -6 -1 0; -7 -13 -5 -2; 14 5 2 -4];
B = [2; -1; -3; 4]; C = [7 16 3 3]; n = size (A,1);
P = ctrb(A,B); % controllability matrix
Q = obsv(A,C); % observability matrix
Wc = orth(P); % basis for controllable subspace
Wo = null(Q); % basis for unobservable subspace
%------------------ First method ----------------------%
X=null(Wc'); Y=null(Wo'); XY=[X Y]; T1=null(XY')
%----------------- Second method ---------------------%
U1=Wc; U2=Wo; N=null([U1 U2]); m=size(U1,2);
T1=U1*N(1:m,:); T1=orth(T1)
Zero=rank(T1) + rank([U1 U2]) - rank(U1) - rank(U2)

✔ General Rank Additive Matrix Decompositions: In this section we investigate matrix


decompositions 𝑸 = 𝑹 + 𝑺 where rank(𝑸) = rank(𝑹) + rank(𝑺) that satisfy the property that
the rank of the sum is equal to the sum of the ranks.

Lemma: let 𝑨 ∈ ℝ𝑛×𝑛 , 𝑩 ∈ ℝ𝑛×𝑘 , 𝑪 ∈ ℝ𝑘×𝑛 & 𝑫 ∈ ℝ𝑘×𝑘 with rank(𝑩) = rank(𝑪) = rank(𝑪) = 𝑘
then rank(𝑨 − 𝑩𝑫−1 𝑪) = rank(𝑨) − rank(𝑩𝑫−1 𝑪). if and only if there exist matrices 𝑿 ∈ ℝ𝑛×𝑘
and 𝒀 ∈ ℝ𝑘×𝑛 such that 𝑩 = 𝑨𝑿, 𝑪 = 𝒀𝑨 and 𝑫 = 𝒀𝑨𝑿.
Proof First recall the identity

(
𝑨 𝑩 𝑰
)=( 𝑛
𝑩𝑫−1 𝑨 − 𝑩𝑫−1 𝑪
)( 𝟎 ) ( 𝑰𝑛 𝟎
)
−1
𝑪 𝑫 𝟎 𝑰𝑘 𝟎 𝑫 𝑫 𝑪 𝑰𝑘
𝑨 𝑩
therefore, rank ( ) = rank(𝑨 − 𝑩𝑫−1 𝑪) + rank(𝑫) is called the Schur-complement of the
𝑪 𝑫
block matrix in question. If conditions 𝑩 = 𝑨𝑿, 𝑪 = 𝒀𝑨 and 𝑫 = 𝒀𝑨𝑿 hold, then
𝑨 𝑩 𝑨 𝑨𝑿 𝑰
( )=( ) = ( ) 𝑨(𝑰 𝑿)
𝑪 𝑫 𝒀𝑨 𝒀𝑨𝑿 𝒀
Therefore, the rank of this matrix is equal to the rank of 𝑨, this implies

rank(𝑨) = rank(𝑨 − 𝑩𝑫−1 𝑪) + rank(𝑫)

and since rank(𝑫) = rank(𝑩𝑫−1 𝑪), then rank(𝑨 − 𝑩𝑫−1 𝑪) = rank(𝑨) − rank(𝑩𝑫−1 𝑪) this
proves the sufficiency of 𝑩 = 𝑨𝑿, 𝑪 = 𝒀𝑨 and 𝑫 = 𝒀𝑨𝑿.

Conversely, let rank(𝑨 − 𝑩𝑫−1 𝑪) = rank(𝑨) − rank(𝑩𝑫−1 𝑪) be satisfied. From the above
analysis it follows that
𝑨 𝑩
rank ( ) = rank(𝑨)
𝑪 𝑫

Then the fact that the rank of the first block row and the first block column of this
matrix both equal the rank of 𝑨 implies the existence of 𝑿, 𝒀, such that the first two
conditions 𝑩 = 𝑨𝑿, 𝑪 = 𝒀𝑨 are satisfied; the third condition 𝑫 = 𝒀𝑨𝑿 follows then from
the fact that the whole matrix should have rank equal to that of 𝑨. ■

The following result for the rank-one case was discovered by Wedderburn in the early
1930s.

Corollary: Rank-one reduction. The rank of the difference 𝑨 − 𝛼𝐯𝐰 𝐻 , 𝐯, 𝐰 ∈ ℝ𝑛 , is one


less than the rank of 𝑨 ∈ ℝ𝑛×𝑛 if and only if 𝐯 = 𝑨𝐱, 𝐰 𝐻 = 𝐲 𝐻 𝑨, and 𝛼 −1 = 𝐲 𝐻 𝑨𝐱 for some
vectors 𝐱, 𝐲 of appropriate dimension.

Problem: (Rank Update) Let 𝑨 ∈ ℝ𝑛×𝑛 be a matrix defined by 𝑟 = 𝑟𝑎𝑛𝑘(𝑨) with 𝑟 ≤ 𝑛 and
𝑸 a matrix defined by 𝑸 = null(𝑨𝑇 ), that is, a matrix whose columns form a basis of the
null space of 𝑨𝑇 . Let 𝐮 be a column of 𝑸, compute the rank of 𝑨 + 𝐮𝐮𝑇 . Prove the observed
result. Proof: (See the Algebra book by BEKHITI B 2020)

clear all, clc, M=10*rand(6,6); A=M*diag([-2 -1 0 1 0 3])*inv(M);


r1=rank(A)
Q=null(A')
u=Q(:,2)
r2=rank(A+u*u')

Making a “rank-one update” to a matrix 𝑨 means adding to the matrix another matrix of
rank one: 𝑨 + ∆𝑨 = 𝑨 + 𝐮𝐮𝑇 . This actually occurs quite a bit in statistics, optimization and
in sequential regression problems.
In many applications it is
necessary to “decompose” any n-dimensional vector 𝐱 into a sum of
two terms 𝐱 = 𝐮 + 𝐯, one term being a scalar multiple of a specified
nonzero vector a and the other term being orthogonal to a.

𝐚𝑇 𝐱 〈𝐱, 𝐚〉 𝐚𝑇 𝐱
𝐮 = Proj𝐚 𝐱 = 𝐚 = 𝐚 𝐯 = 𝐱 − Proj𝐚 𝐱 = 𝐱 − 𝐚
‖𝐚‖2 〈𝐚, 𝐚〉 ‖𝐚‖2
〈𝐱, 𝐚〉
=𝐱− 𝐚
〈𝐚, 𝐚〉

In this context, 𝐮 is the projection of 𝐱 onto 𝐚; and 𝐯 is the orthogonal complement. In


term of projector matrix we can write

𝐚𝑇 𝐱 𝐚𝐚𝑇 𝐚𝐚𝑇
𝐮 = 𝐏𝐱 = 𝐚 = 𝐱 ⟹ 𝐏 =
‖𝐚‖2 ‖𝐚‖2 ‖𝐚‖2

Properties of the orthogonal projection


2 𝑇
 𝐏 =𝐏  𝐏 =𝐏  rank(𝐏) = rank(𝐚𝐚𝑇 ) = 1  Range(𝐏) ⊥ Range(𝑰 − 𝐏)
 (𝑰 − 𝐏)2 = (𝑰 − 𝐏)  (𝑰 − 𝐏)𝑇 = (𝑰 − 𝐏)  Range(𝐏) = Null(𝑰 − 𝐏)

Remark: If we let 𝓧 be a subspace of ℝ𝑚 and let 𝑸 = [𝒒1 , … , 𝒒𝑟 ] ∈ ℝ𝑚×𝑟 be an orthonormal


basis of 𝓧 (i.e. 𝑸𝑇 𝑸 = 𝑰𝒓×𝒓), then 𝐏 = 𝑸𝑸𝑇 ∈ ℝ𝑚×𝑚 is an orthogonal projector of ℝ𝑚 onto 𝓧.
Now if we define 𝓧⊥ to be the orthogonal complement subspace such that

𝓧⊥ = {𝐲|𝐲 ⊥ 𝐱, ∀𝐱 ∈ 𝓧}

Then  Range(𝐏) = 𝓧  Range(𝑰 − 𝐏) = 𝓧⊥ and  Range(𝑰 − 𝐏) = Null(𝐏) = 𝓧⊥

Proof:
𝓧 ⊆ Range(𝐏)
 It is well-known 𝓧 = Range(𝐏) ⟺ { } We start by the first one, it is clearly
Range(𝐏) ⊆ 𝓧
that: Range(𝐏) = {𝐱 | 𝐱 = 𝑸𝑸𝑇 𝐲, 𝐲 ∈ ℝ𝑟 } ⊆ 𝓧. Now let we prove the implication of the second
side, we know that any 𝐱 ∈ 𝓧 is of the form 𝐱 = 𝑸𝐯, 𝐯 ∈ ℝ𝑟 ⟹ 𝐏𝐱 = 𝑸𝑸𝑇 (𝑸𝐯) = 𝑸𝐯 = 𝐱,
since 𝐱 = 𝑷𝐱, 𝐱 ∈ Range(𝐏). So 𝓧 ⊆ Range(𝐏). In the end 𝓧 = Range(𝐏).
 𝐱 ∈ 𝓧 ⟺ 𝐱𝑇 𝐲 = 0 ∀𝐲 ∈ 𝓧 ⟺ 𝐱𝑇 𝑸𝒛 = 0 ∀𝒛 ∈ ℝ ⟺ 𝒛𝑇 𝑸 𝐱 = 0 ∀𝒛 ∈ ℝ ⟺ 𝑸 𝐱 = 0 ⟺
⊥ 𝑟 𝑇 𝑟 𝑇

𝑸𝑸𝑇 𝐱 = 𝟎 ⟺ 𝐏𝐱 = 0 ⟺ Null(𝐏) = 𝓧⊥ .

 𝐱 ∈ Null(𝐏) ⟺ 𝐏𝐱 = 0 ⟺ (𝑰 − 𝐏)𝐱 = 𝐱 ⟹ 𝐱 ∈ Range(𝑰 − 𝐏) = 𝓧⊥

Result: Any 𝐱 ∈ ℝ𝑚 can be written in a unique way as 𝐱 = 𝐱1 + 𝐱 2 ; 𝐱1 ∈ 𝓧; 𝐱 2 ∈ 𝓧⊥

Proof: Just set 𝐱1 = 𝐏𝐱 ; 𝐱 2 = (𝑰 − 𝐏)𝐱 those are called orthogonal complements.

𝓧⊥ + 𝓧 = ℝ𝑚 and 𝓧⊥ ⋂𝓧 = {𝟎} ⟺ 𝓧⊥ ⨁𝓧 = ℝ𝑚

This is called the Called the Orthogonal Decomposition. In other words

ℝ𝑚 = 𝐏ℝ𝑚 ⨁(𝑰 − 𝐏)ℝ𝑚 or


ℝ𝑚 = Range(𝐏)⨁Range(𝑰 − 𝐏) or
ℝ𝑚 = Range(𝐏)⨁Null(𝐏) or
𝑚 ⊥
ℝ = Range(𝐏)⨁Range(𝐏)
One can complete the orthonormal basis {𝒒1 , … , 𝒒𝑟 } by the set of linearly independent
vectors {𝒒𝑟+1 , … , 𝒒𝑚 } to form an orthonormal basis of ℝ𝑚 , where {𝒒𝑟+1 , … , 𝒒𝑚 } = basis of
𝓧⊥ , and dim(𝓧⊥ ) = 𝑚 − 𝑟.

✔ The Four Fundamental Subspaces: Central to the study of linear algebra are four
fundamental subspaces associated with a matrix. These subspaces are intricately linked
to the rows and columns of a matrix as well as the solution set of the homogeneous
linear system associated with that matrix. These four subspaces will keep surfacing as
we explore properties of matrices.

Theorem: If we let 𝑨 ∈ ℝ𝑚×𝑛 and consider the following four subspaces


ℛ(𝑨), ℛ(𝑨𝑇 ), 𝒩(𝑨𝑇 ) 𝑎𝑛𝑑 𝒩(𝑨) then

ℛ(𝑨)⊥ = 𝒩(𝑨𝑇 ) 𝑚
= ℛ(𝑨)⨁𝒩(𝑨𝑇 )
{ ⟺ {ℝ𝑛
ℛ(𝑨𝑇 ) = 𝒩(𝑨)⊥ ℝ = ℛ(𝑨𝑇 )⨁𝒩(𝑨)

Proof: (See the Algebra book by BEKHITI B 2020)

✔ Projection Onto a Subspace: The direct decomposition of an inner product space into
a subspace and its orthogonal complement afforded before leads to wide generalization of
the elementary notion of projection of one vector on another.

Suppose we have a vector space ℛ(𝑨) spanned by columns of matrix 𝑨 and we want to
project a point 𝒃 onto the space ℛ(𝑨), using the linear transformation P.

𝐲 = 𝑷𝒃 = 𝑨𝐱 and 𝐫 = 𝑨𝐱 − 𝒃
〈𝐲, 𝐫〉 = 𝐲 𝑻 𝐫 = 0 ⇔ (𝑨𝐱)𝑻 (𝑨𝐱 − 𝒃) = 0 ⇔ 𝐱 𝑻 ⏟
𝑨𝑻 (𝑨𝐱 − 𝒃) = 0

(𝑨𝐱) ⊥ (𝑨𝐱 − 𝒃) ⟹ (𝑨𝐱 − 𝒃) ∈ 𝒩(𝑨𝑻 ) ⟹ 𝑨𝑻 (𝑨𝐱 − 𝒃) = 0

(𝑨𝑻 𝑨)𝐱 − 𝑨𝑻 𝒃 = 𝟎

The solution of this least squares problem comes down to solving the 𝑛 × 𝑛 linear
system of equations (𝑨𝑻 𝑨)𝐱 − 𝑨𝑻 𝒃 = 𝟎. These equations are called the normal equations of
the least squares problem 𝑨𝐱 = 𝒃.

The problem solution depends on the non-singularity of the matrix (𝑨𝑻 𝑨) ∈ ℝ𝑛×𝑛 , so
under what condition (𝑨𝑻 𝑨) is full rank?

Theorem: rank(𝑨) = rank(𝑨𝑻 ) = rank(𝑨𝑨𝑻 ) = rank(𝑨𝑻 𝑨)

We can deduce that 𝑨𝑨𝑻 is full rank if and only if rank(𝑨) = 𝑛, hence the normal system
(𝑨𝑻 𝑨)𝐱 − 𝑨𝑻 𝒃 = 𝟎 is exactly solvable with the following best (optimal) solution

𝐱 = (𝑨𝑻 𝑨)−𝟏 𝑨𝑻 𝒃

Now we can write 𝑷𝒃 = 𝑨𝐱 = 𝑨(𝑨𝑻 𝑨)−𝟏 𝑨𝑻 𝒃 ⟹ 𝑷 = 𝑨(𝑨𝑻 𝑨)−𝟏 𝑨𝑻


The operator 𝑷 is called a projector, and satisfies the following properties

 𝑷 = 𝑷𝑻 symmetry 𝑷 projects onto the ℛ(𝑨)

 𝑷 = 𝑷𝟐 Idempotency (𝑰 − 𝑷) projects onto 𝒩 (𝑨𝑻 )

A basis that is suitable for one problem may not be suitable


for another, so it is a common process in the study of vector spaces to change from one
basis to another. Because a basis is the vector space generalization, changing bases is
akin to changing coordinate axes in any coordinate system. In this section we shall study
problems related to change of basis. Orthonormal bases are nice because several
formulas are much simpler when vectors are given written an Orthonormal basis.

Theorem: Every orthogonal (orthonormal) sequence of nonzero vectors is linearly


independent.

Proof: (See the Algebra book by BEKHITI B 2020)

Orthonormalization processes play a key role in many iterative methods such as :


Array Signal Processing, the Kalman Filtering problem, Datamining and
Bioinformatics, among others, Besides, it has a very important field of applications in
communications, filtering and compression in Digital Signal and Image Processing.
While, there are a lot of methods for doing the process, the most popular one is the
Gram–Schmidt algorithm.

Orthonormal Sets and Matrices: The set 𝕭 = {𝒖𝟏 , 𝒖𝟐 , … , 𝒖𝒏 } is called an orthonormal set
whenever ‖𝒖𝒊 ‖ = 1 for each 𝑖, and 𝒖𝒊 ⊥𝒖𝒋 for all 𝑖 ≠ 𝑗. In other words,
1 when i = j
<𝒖𝒊 |𝒖𝒋 > = 𝒖𝒊 T𝒖𝒋 = {
0 when i ≠ j
• A matrix is said to be orthogonal if it is made of orthogonal column vectors.
• Every orthonormal set is linearly independent.
• Every orthonormal set of n vectors from an n-dimensional space 𝓥 is an orthonormal
basis for 𝓥.

✔ Gram–Schmidt Orthogonalization Procedure: As discussed before, orthonormal


bases possess significant advantages over bases that are not orthonormal. The spaces
ℜ3 and ℂ3 clearly possess orthonormal bases (e.g.,the standard basis),but what about
other spaces? Does every finite dimensional space possess an orthonormal basis, and, if
so, how can one be produced? The Gram–Schmidt orthogonalization procedure
developed below answers these questions.

Given an arbitrary basis 𝓢 = {𝐱 𝟏 , 𝐱 𝟐 , … , 𝐱 𝒏 } , the sequence 𝕭 = {𝒖𝟏 , 𝒖𝟐 , … , 𝒖𝒏 } is the required


system of orthogonal vectors.

Idea: If 𝕭 = {𝒖𝟏 , 𝒖𝟐 , … , 𝒖𝒏 } is an orthonormal basis for ℝ𝑛 then any vector 𝐯 ∈ ℝ𝑛 can be


written uniquely in term of 𝒖𝒊 , mean that: 𝐯 = 𝑘1 𝒖𝟏 + 𝑘1 𝒖𝟐 + ⋯ + 𝑘1 𝒖𝒏 , based on this
notion let we start development of the algorithm
First step: Let we define 𝒖𝟏 = 𝐱 𝟏 and 𝒖𝟐 = 𝐱 𝟐 + 𝛽1 𝒖𝟏 , our objective is to find 𝛽1 such
that 〈𝒖𝟏 , 𝒖𝟐 〉 = 0.

𝒖𝟏 ⊥ 𝒖𝟐 ⟺ 0 = 〈𝒖𝟏 , 𝒖𝟐 〉 = 〈𝐱 𝟏 , 𝐱 𝟐 + 𝛽1 𝒖𝟏 〉 = 〈𝐱 𝟏 , 𝐱 𝟐 〉 + 𝛽1 〈𝐱 𝟏 , 𝒖𝟏 〉 = 〈𝐱 𝟏 , 𝐱 𝟐 〉 + 𝛽1 〈𝐱𝟏 , 𝐱 𝟏 〉

〈𝐱 𝟏 , 𝐱 𝟐 〉
𝛽1 = −
〈𝒖𝟏 , 𝒖𝟏 〉

Second step: Let we define 𝒖𝟑 = 𝐱 𝟑 + 𝛾1 𝒖𝟏 + 𝛾2 𝒖𝟐 , our objective is to find scalers


𝛾1 & 𝛾2 such that 〈𝒖𝟏 , 𝒖𝟑 〉 = 0 and 〈𝒖𝟐 , 𝒖𝟑 〉 = 0.

𝒖𝟏 ⊥ 𝒖𝟑 ⟺ 0 = 〈𝒖𝟏 , 𝒖𝟑 〉 = 〈𝐱 𝟏 , 𝐱 𝟑 + 𝛾1 𝒖𝟏 + 𝛾2 𝒖𝟐 〉 = 〈𝐱 𝟏 , 𝐱 𝟑 〉 + 𝛾1 〈𝐱 𝟏 , 𝒖𝟏 〉 + 𝛾2 〈𝐱
⏟ 𝟏 , 𝒖𝟐 〉
𝟎
〈𝐱 𝟏 , 𝐱 𝟑 〉
𝛾1 = −
〈𝒖𝟏 , 𝒖𝟏 〉

𝒖𝟐 ⊥ 𝒖𝟑 ⟺ 0 = 〈𝒖𝟐 , 𝒖𝟑 〉 = 〈𝒖𝟐 , 𝐱 𝟑 + 𝛾1 𝒖𝟏 + 𝛾2 𝒖𝟐 〉 = 〈𝒖𝟐 , 𝐱 𝟑 〉 + 𝛾1 ⏟


〈𝒖𝟐 , 𝒖𝟏 〉 + 𝛾2 〈𝒖𝟐 , 𝒖𝟐 〉
𝟎
〈𝒖𝟐 , 𝐱 𝟑 〉
𝛾2 = −
〈𝒖𝟐 , 𝒖𝟐 〉
〈𝐱 𝟏 , 𝐱 𝟑 〉 〈𝒖𝟐 , 𝐱 𝟑 〉
𝒖𝟑 = 𝐱 𝟑 − 𝒖𝟏 − 𝒖
〈𝒖𝟏 , 𝒖𝟏 〉 〈𝒖𝟐 , 𝒖𝟐 〉 𝟐

Following the same procedure to generate all the rest vectors 

The sequence {𝒖𝟏 , 𝒖𝟐 , … , 𝒖𝒏 } is the required system of orthogonal vectors, and the
normalized vectors 𝒆𝒊 = 𝒖𝒊 /‖𝒖𝒊 ‖ form an orthonormal set. The calculation of the sequence
{𝒖𝟏 , 𝒖𝟐 , … , 𝒖𝒏 } is known as Gram–Schmidt orthogonalization, while the calculation of the
sequence 𝒆𝒊 is known as Gram–Schmidt orthonormalization as the vectors are
normalized.

Algorithm:
Where:
for 𝑘 = 1:
𝑘−1 𝑘−1
𝒖1 ← 𝐱1 ⁄‖𝐱1 ‖ ∑ (𝒖𝑖 T 𝐱 𝑘 ) 𝒖𝑖 = ∑ (Proj𝒖𝑖 𝐱 𝑘 )
for 𝑘 > 1: 𝑖=1 𝑖=1

𝒖𝑘 ← 𝐱 𝑘 − ∑𝑘−1 T 〈𝒖𝒊 |𝐱 𝑘 〉
𝑖=1 (𝒖𝑖 𝐱 𝑘 ) 𝒖𝑖 Proj𝒖𝑖 𝐱 𝑘 = 𝒖
〈𝒖𝒊 |𝒖𝒊 〉 𝒊
𝒖𝑘 ← 𝒖𝑘 ⁄‖𝒖𝑘 ‖
end end The general proof proceeds by mathematical induction.

The application of the Gram–Schmidt process to the column vectors of a full column
rank matrix yields the QR decomposition (it is decomposed into an orthogonal and a
triangular matrix).

✔ The QR decomposition: is one of the more powerful numerical methods developed for
computing eigenvalues of real matrices. In contrast to the power methods, which
converge only to a single dominant real eigenvalue of a matrix, the QR-algorithm
generally locates all eigenvalues, both real and complex, regardless of multiplicity.
Question: What is the relationship between 𝐀 = [𝒂𝟏 , 𝒂𝟐 , … , 𝒂𝒏 ] and 𝑸 = [𝒒𝟏 , 𝒒𝟐 , … , 𝒒𝒏 ] when
the Gram–Schmidt process is applied to the matrix columns?

When Gram–Schmidt is applied to the columns of 𝐀 ∈ ℝ𝑚×𝑛 , the result is an orthonormal


basis {𝒒𝟏 , 𝒒𝟐 , … , 𝒒𝒏 } for ℛ(𝐀) where

𝐚1 𝐚𝑘 − ∑𝑘−1
𝑖=1 〈𝒒𝒊 |𝐚𝑘 〉 𝒒𝑖
𝒒1 = 𝒒𝑘 = for 𝑘 = 2,3, … , 𝑛
𝜂1 𝜂𝑘
𝑘−1
with: 𝜂1 = ‖𝐚𝟏 ‖
𝜂𝑘 = ‖𝐚𝑘 − ∑ 〈𝒒𝒊 |𝐚𝑘 〉 𝒒𝑖 ‖ for 𝑘 > 1
{ 𝑖=1

The above relationships can be rewritten as 𝐚1 = 𝜂1 𝒒1 and 𝐚𝑘 = 𝜂𝑘 𝒒𝑘 + ∑𝑘−1


𝑖=1 〈𝒒𝒊 |𝐚𝑘 〉 𝒒𝑖
which in turn can be expressed in matrix form by writing

𝜂1 〈𝒒𝟏 |𝐚2 〉 〈𝒒𝟏 |𝐚3 〉 ⋯ 〈𝒒𝟏 |𝐚𝑛 〉


0 𝜂2 〈𝒒𝟐 |𝐚3 〉 ⋯ 〈𝒒𝟐 |𝐚𝑛 〉
[𝒂𝟏 , 𝒂𝟐 , … , 𝒂𝒏 ] = [𝒒𝟏 , 𝒒𝟐 , … , 𝒒𝒏 ] 0 0 𝜂3 ⋮ 〈𝒒𝟑 |𝐚𝑛 〉
⋮ ⋮ ⋮ ⋱ ⋮
(0 0 0 ⋯ 𝜂𝑛 )

This says that it’s possible to factorize (decompose) a matrix with independent
columns as [𝑨]𝑚×𝑛 = [𝑸]𝑚×𝑛 [𝑹]𝑛×𝑛 , where the columns of 𝑸 are an orthonormal basis for
ℛ(𝐀) and 𝑹 is an upper-triangular matrix with positive diagonal elements.
The factorization (i.e. decomposition) 𝐀 = 𝑸𝑹 is called the 𝑸𝑹-factorization for 𝐀, and it
is uniquely determined by 𝐀.

If 𝐀 ∈ ℝ𝑛×𝑛 is nonsingular, then 𝑸𝑇 = 𝑸−𝟏 (because 𝑸 has orthonormal columns), so


𝑨𝐱 = 𝒃 ⟺ 𝑸𝑹𝐱 = 𝐛 ⟺ 𝑹𝐱 = 𝑸𝑇 𝐛, which is also a triangular system that is solved by back
substitution.

Consider again the least square problem = (𝑨𝑻 𝑨)−𝟏 𝑨𝑻 𝒃 , since 𝑨 has 𝑛 linearly
independent columns implies that 𝑨 has a 𝑸𝑹-factorization 𝐀 = 𝑸𝑹

𝐀𝑇 𝐀 = (𝑸𝑹)𝑇 𝑸𝑹 = 𝑹𝑇 𝑸𝑇 𝑸𝑹 = 𝑹𝑇 𝑹

𝑇
𝐱= (𝑹 𝑹) 𝑹 𝑸 𝒃 = 𝑹−𝟏 (𝑹𝑇 )−𝟏 𝑹𝑇 𝑸𝑇 𝒃
−𝟏 𝑇 𝑇

𝐱 = 𝑹−𝟏 𝑸𝑇 𝒃

Remarks:
The matrix 𝑹 is nonsingular (it is triangular with positive diagonal entries).

The following statements are equivalent to saying that a real matrix 𝑷 ∈ ℝ𝑛×𝑛 is
orthogonal.

 𝑷 has orthonormal columns.


 𝑷 has orthonormal rows.
 𝑷−𝟏 = 𝑷𝑻
 ‖𝑷𝐱‖𝟐 = ‖𝐱‖𝟐 for every 𝐱 ∈ ℝ𝑛×1 . (isometry property)
 ‖𝑷𝐀‖𝑭 = ‖𝐀‖𝑭 for every 𝐀 ∈ ℝ𝑛×𝑝 . (isometry property)
In general, a linear operator 𝑷 on a vector space 𝓥 with the property that ‖𝑻𝐱‖𝟐 = ‖𝐱‖𝟐
for every 𝐱 ∈ 𝓥 is called an isometry on 𝓥 (they preserve length). The isometries on ℝ𝑛 are
precisely the orthogonal matrices, and the isometries on ℂ𝑛 are the unitary matrices.
% Classical Gram-Schmidt algorithm

clear all, clc, A =10*rand(4,4); [m,n]=size(A); R=zeros(n); Q=A;

for k=1:n,
for i=1:k-1,
R(i,k)=Q(:,i)'*Q(:,k);
end
for i=1:k-1,
Q(:,k)=Q(:,k)-R(i,k)*Q(:,i);
end
R(k,k)=norm(Q(:,k)); Q(:,k)=Q(:,k)/R(k,k);
end
R, Q,

Unfortunately, the Classical Gram Schmidt (CGS) method has very poor numerical
properties in that there is typically a severe loss of orthogonality among the computed 𝒒𝑖 .
Interestingly, a rearrangement of the calculation, known as modified Gram Schmidt
(MGS), yields a much sounder computational procedure. In the 𝑘 𝑡ℎ step of MGS, the
𝑘 𝑡ℎ column of 𝑸 (denoted by 𝒒𝑘 ) and the fcth row of 𝑹 (denoted by 𝒓𝑘 𝑇 ) are determined.
(see Gene H. Golub and Charles F. Van Loan 1996).

% Modified Gram-Schmidt Orthogonalization

clear all, clc, A =10*rand(4,4); [m,n]=size(A); R=zeros(n); Q=A;


for k=1:n,
for i=1:k-1,
R(i,k)=Q(:,i)'*Q(:,k);
Q(:,k)=Q(:,k)-R(i,k)*Q(:,i);
end
R(k,k)=norm(Q(:,k)); Q(:,k)=Q(:,k)/R(k,k);
end
R, Q,

✔ Linearly Independent Search Algorithm Given a set of 𝑛-dimensional


rows 𝒓1 , 𝒓2 , … , 𝒓𝑚 , an 𝑛 × 𝑛 matrix 𝐏(𝑘) is determined recursively for 𝑘 = 1, 2, … , 𝑚

Initialize 𝐏(0) = 𝑰𝑛 (𝑛 × 𝑛 identity matrix) Properties of this 𝐏(𝑘):


For 𝑘 = 1, 2, … , 𝑚 do
If 𝒓𝑘 𝐏(𝑘 − 1)𝒓𝑘 𝑇 ≠ 0, then ▪ 𝐏(𝑘) = 𝐏 𝟐 (𝑘).
▪ 𝐏(𝑘) = 𝐏 𝑻 (𝑘).
𝐖(𝑘) = 𝐏(𝑘 − 1)𝒓𝑘 𝑇
▪ 𝐏(𝑘)𝐱 = 𝐱 ∀𝐱 ⊥ {𝒓1 , 𝒓2 , … , 𝒓𝑘 }.
𝐖(𝑘)𝐖(𝑘)𝑇
𝐏(𝑘) = 𝐏(𝑘 − 1) − ▪ 𝐏(𝑘)𝒓𝑖 𝑇 = 𝟎 ∀𝒓𝑖 ∈ {𝒓1 , 𝒓2 , … , 𝒓𝑘 }.
𝐖(𝑘)𝑇 𝐖(𝑘)
And 𝒓𝑘 is LI of the previous rows (see A.W. Naylor1982)
Else 𝐏(𝑘) = 𝐏(𝑘 − 1) & 𝒓𝑘 is linearly dependent.
Proof: Given a real matrix 𝑨 ∈ ℝ𝑚×𝑛 whose column vectors are the following set 𝓢 =
{𝐱 𝟏 , 𝐱 𝟐 , … , 𝐱 𝒏 }. When Gram–Schmidt algorithm is applied to the columns of 𝐀 , the result is
an orthogonal basis {𝒖𝟏 , 𝒖𝟐 , … , 𝒖𝒏 } for ℛ(𝐀) where
𝑘−1
〈𝒖𝒊 |𝐱 𝑘 〉
𝐮𝑘 = 𝐱 𝒌 − ∑ 𝒖 with 𝐏𝒌−𝟏 𝐱 𝒌 = 𝐮𝑘 where 𝐏𝟎 = 𝑰 𝑘 = 1,2, … , 𝑛
〈𝒖𝒊 |𝒖𝒊 〉 𝒊
𝑖=1

We will use the fact that:


〈𝐱 𝒌 , 𝐮𝑖 〉 𝐱 𝒌 𝑻 𝐮𝑖 𝐮𝑖 𝑇 𝐱 𝒌 𝐮𝑖 𝐮𝑖 𝑇
𝐮𝑖 = 𝑇 𝐮𝑖 = 𝐮𝑖 𝑇 = 𝑇 𝐱 𝒌
〈𝐮𝑖 , 𝐮𝑖 〉 𝐮𝑖 𝐮𝑖 𝐮𝑖 𝐮𝑖 𝐮𝑖 𝐮𝑖

Let we expand the 𝐮𝑘 formula and obtain the corresponding projecting matrix 𝐏𝒌 . In
terms of 𝐱 𝒌 and the previous 𝐏𝒌−𝟏 .

𝐒𝐭𝐞𝐩 𝟎: 𝐏𝟎 𝐱 𝟏 = 𝐮1 = 𝐱 𝟏 ⟹ 𝐏𝟎 = 𝑰
〈𝐱 𝟐 , 𝐮1 〉 𝐮1 𝐮1 𝑇 (𝐏𝟎 𝐱 𝟏 )(𝐏𝟎 𝐱 𝟏 )𝑇
𝐒𝐭𝐞𝐩 𝟏: 𝐏𝟏 𝐱 𝟐 = 𝐮2 = 𝐱 𝟐 − 𝐮1 = (𝑰 − 𝑇 ) 𝐱 𝟐 ⟹ 𝐏𝟏 = (𝐏𝟎 − )
〈𝐮1 , 𝐮1 〉 𝐮1 𝐮1 (𝐏𝟎 𝐱 𝟏 )𝑇 (𝐏𝟎 𝐱 𝟏 )

〈𝐱 𝟑 , 𝐮1 〉 〈𝐱 𝟑 , 𝐮2 〉 𝐮1 𝐮1 𝑇 𝐮2 𝐮2 𝑇
𝐒𝐭𝐞𝐩 𝟐: 𝐏𝟐 𝐱 𝟑 = 𝐮3 = 𝐱 𝟑 − 𝐮 − 𝐮 = (𝑰 − 𝑇 − 𝑇 ) 𝐱 𝟑
〈𝐮1 , 𝐮1 〉 1 〈𝐮2 , 𝐮2 〉 2 𝐮1 𝐮1 𝐮2 𝐮2
(𝐏𝟎 𝐱 𝟏 )(𝐏𝟎 𝐱 𝟏 ) 𝑇 (𝐏𝟏 𝐱 𝟐 )(𝐏𝟏 𝐱 𝟐 ) 𝑇 (𝐏𝟏 𝐱 𝟐 )(𝐏𝟏 𝐱 𝟐 )𝑇
= (𝐏𝟎 − − ) 𝐱 𝟑 ⟹ 𝐏 𝟐 = (𝐏𝟏 − )
(𝐏𝟎 𝐱 𝟏 )𝑇 (𝐏𝟎 𝐱 𝟏 ) (𝐏𝟏 𝐱 𝟐 )𝑇 (𝐏𝟏 𝐱 𝟐 ) (𝐏𝟏 𝐱 𝟐 )𝑇 (𝐏𝟏 𝐱 𝟐 )



𝑘
〈𝒖𝒊 |𝐱 𝑘+1 〉 (𝐏𝒌−𝟏 𝐱 𝒌 )(𝐏𝒌−𝟏 𝐱 𝒌 )𝑇
𝐒𝐭𝐞𝐩 𝒌: 𝐏𝒌 𝐱 𝒌+𝟏 = 𝐮𝑘+1 = 𝐱 𝒌+𝟏 − ∑ 𝒖 ⟹ 𝐏𝒌 = (𝐏𝒌−𝟏 − )
〈𝒖𝒊 |𝒖𝒊 〉 𝒊 (𝐏𝒌−𝟏 𝐱 𝒌 )𝑇 (𝐏𝒌−𝟏 𝐱 𝒌 )
𝑖=1

Notice that: when 𝐱 𝒌+𝟏 is linear dependent with the previous one, then 𝐱 𝒌+𝟏 = 𝛽𝐱 𝒌 implies
that 𝐮𝒌+𝟏 = 𝛽𝐮𝒌 which can be written as 𝐏𝒌 𝐱 𝒌+𝟏 = 𝛽𝐏𝒌−𝟏 𝐱 𝒌 ⟺ 𝐏𝒌 = 𝐏𝒌−𝟏 and this will end
the prove.

clear all, clc, % See the Algebra Book by BEKHITI Belkacem 2020
T=[2 0 8 2 4 1;0 2 10 7 6 3;0 0 -6 -2 -4 -1;0 0 -12 -4 -8 -2;...
0 0 8 2 6 1;0 0 16 4 12 2];
V1=T(1,:)';V2=T(2,:)';V3=T(3,:)';V4=T(4,:)';V5=T(5,:)';V6=T(6,:)';
V=[V1,V2,V3,V4,V5,V6];P=eye(6,6); X1=[]; epsi=1.0e-10; %Tolerance
for i=0:5
if abs((V(:,i+1)'*P*V(:,i+1)))>epsi
L=abs((V(:,i+1)'*P*V(:,i+1)))
X2=V(:,i+1); X2=[X1,X2]; X1=X2;
P=P-(((P*V(:,i+1))*(V(:,i+1)'*P'))/(V(:,i+1)'*P*V(:,i+1)));
else
X2=X2; P=P; L=abs((V(:,i+1)'*P*V(:,i+1)))
end
end
X2', r1=rank(T) , r2=rank(X2')
Orthogonal transformations are one of the most
important tools in numerical linear algebra. The types of orthogonal transformations that
will be introduced in this section are easy to work with and do not require much storage.
Most important, processes that involve orthogonal transformations are inherently stable.
For example, let 𝐱 ∈ ℝ𝑛 and 𝐱̂ = 𝐱 + 𝒆 be an approximation to 𝐱: If 𝑸 is an orthogonal
matrix, then 𝑸𝐱̂ = 𝑸𝐱 + 𝑸𝒆 The error in 𝑸𝐱̂ is 𝑸𝒆. With respect to the 2-norm, the vector
𝑸𝒆 is the same size as 𝒆;
‖𝑸𝐱̂‖2 = ‖𝑸𝐱 + 𝑸𝒆‖2
≤ ‖𝑸𝐱‖2 + ‖𝑸𝒆‖2 } ⟺ ‖𝐱̂‖2 ≤ ‖𝐱‖2 + ‖𝒆‖2
≤ ‖𝐱‖2 + ‖𝒆‖2

̅ = 𝑨 + 𝑬, then ‖𝑸𝑨
Similarly, if 𝑨 ̅ ‖2 = ‖𝑸𝑨 + 𝑸𝑬‖2 ≤ ‖𝑸𝑨‖2 + ‖𝑸𝑬‖2 ≤ ‖𝑨‖2 + ‖𝑬‖2 which
implies that ‖𝑨̅ ‖2 ≤ ‖𝑨‖2 + ‖𝑬‖2. When an orthogonal transformation is applied to a
vector or matrix, the error will not grow with respect to the 2-norm.
By an elementary orthogonal matrix, we mean a matrix of the form 𝑸 = 𝑰 − 2𝐮𝐮𝑇 where
𝐮 ∈ ℝ𝑛 and 𝐮2 = 1. To see that 𝑸 is orthogonal, note that 𝑸𝑇 = (𝑰 − 2𝐮𝐮𝑇 )𝑇 = 𝑰 − 2𝐮𝐮𝑇 = 𝑸,
and 𝑸𝑇 𝑸 = 𝑸2 = (𝑰 − 2𝐮𝐮𝑇 )(𝑰 − 2𝐮𝐮𝑇 ) = 𝑰 − 4𝐮𝐮𝑇 + 4𝐮𝐮𝑇 = 𝑰. Thus, if 𝑸 is an elementary
orthogonal matrix, then 𝑸𝑇 = 𝑸 = 𝑸−1 The matrix 𝑸 = 𝑰 − 2𝐮𝐮𝑇 is completely determined
by the unit vector 𝐮. Rather than store all 𝑛2 entries of 𝑸, we need store only the vector
𝐮. To compute 𝑸𝐱, note that 𝑸𝐱 = (𝑰 − 2𝐮𝐮𝑇 )𝐱 = 𝐱 − 2𝛼𝐮 where 𝛼 = 𝐮𝑇 𝐱. The matrix
product 𝑸𝑨 is computed as
𝑸𝑨 = [𝑸𝒂1 ⋮ 𝑸𝒂2 ⋮ ⋯ ⋮ 𝑸𝒂𝑛 ]
= [𝒂1 ⋮ 𝒂2 ⋮ ⋯ ⋮ 𝒂𝑛 ] − 2[𝛼1 𝐮 ⋮ 𝛼2 𝐮 ⋮ ⋯ ⋮ 𝛼𝑛 𝐮]
= 𝑨 − 2𝐮[𝛼1 ⋮ 𝛼2 ⋮ ⋯ ⋮ 𝛼𝑛 ]

Elementary orthogonal transformations can be used to obtain a 𝑄𝑅 factorization of 𝑨,


and this in turn can be used to solve a linear system 𝑨𝐱 = 𝒃. As with Gaussian
elimination, the elementary matrices are chosen so as to produce zeros in the coefficient
matrix. To see how this is done, let us consider the problem of finding a unit vector 𝐮
such that: (𝑰 − 2𝐮𝐮𝑇 )𝐱 = (𝛼 0 0 … 0)𝑇 = 𝛼𝒆1 for a given vector 𝐱 ∈ ℝ𝑛 .

The Householder reflection 𝑯 = (𝑰 − 2𝐮𝐮𝑇 ) is also called “reflector”, if 𝑯𝐱 = 𝛼𝒆1 then, since
𝑯 is orthogonal, it follows that |𝛼| = ‖𝛼𝒆1 ‖2 = ‖𝑯𝐱‖2 = ‖𝐱‖2. If we take 𝛼 = +‖𝐱‖2 or we
take 𝛼 = −‖𝐱‖2, then since 𝑯𝐱 = 𝛼𝒆1, and 𝑯 is its own inverse, we have

𝑯𝐱 = 𝛼𝒆1 ⟺ 𝐱 = 𝛼𝑯𝒆1 ⟺ 𝐱 = 𝛼(𝑰 − 2𝐮𝐮𝑇 )𝒆1 = 𝛼𝒆1 − 2𝛼𝐮𝐮𝑇 𝒆1 = 𝛼𝒆1 − 2𝛼𝑢1 𝐮

𝑥1 𝛼 − 2𝛼𝑢12
𝑥2 −2𝛼𝑢1 𝑢2
𝐱 = 𝛼𝒆1 − 2𝛼𝑢1 𝐮 ⟺ (𝑥3 ) = −2𝛼𝑢1 𝑢3
⋮ ⋮
𝑥𝑛
(−2𝛼𝑢1 𝑢𝑛 )
Solving for the 𝑢𝑖 ’𝑠, we get
𝛼 − 𝑥1 1/2 −𝑥𝑘
𝑢1 = ± ( ) and 𝑢𝑘 = ( ) 𝑘 = 2,3, … , 𝑛
2𝛼 2𝛼𝑢1
𝛼 − 𝑥1 1⁄2
If we let 𝑢1 = − ( ) we set 𝛽 = 𝛼(𝛼 − 𝑥1 )
2𝛼
1⁄ 1⁄ 1⁄
Then −2𝛼𝑢1 = [2𝛼(𝛼 − 𝑥1 )] 2 = [2𝛽] 2 It follows that 𝑢1 = −[𝛽] 2 /√2𝛼
1
and 𝐮= (𝑥1 − 𝛼, 𝑥2 , 𝑥3 , … , 𝑥𝑛 )𝑇
√2𝛽
If we set 𝐯 = (𝑥1 − 𝛼, 𝑥2 , 𝑥3 , … , 𝑥𝑛 )𝑇 , then ‖𝐯‖22 = (𝑥1 − 𝛼)2 + ∑𝑛𝑘=2 𝑥𝑘2 = 2𝛼(𝛼 − 𝑥1 ) = 2𝛽

𝐯 𝐯 2𝐯𝐯 𝑇 1
‖𝐯‖2 = √2𝛽 ⟹ 𝐮 = = ⟹ 𝑯 = 𝑰 − 2𝐮𝐮𝑇 = 𝑰 − 2 = 𝑰 − 𝐯𝐯 𝑇
√2𝛽 ‖𝐯‖2 ‖𝐯‖2 𝛽

In theory equation 𝑯 = 𝑰 − 𝛽 −1 𝐯𝐯 𝑇 will be valid if 𝛼 = ±‖𝐱‖2 ; however, in finite precision


arithmetic it does matter how the sign is chosen. Since the first entry of 𝐯 is 𝑣1 = 𝑥1 − 𝛼,
one could possibly lose significant digits of accuracy if 𝑥1 and 𝛼 are nearly equal and
have the same sign. To avoid this situation, the scalar 𝛼 should be defined by

−‖𝐱‖2 if 𝑥1 > 0
𝛼={
+‖𝐱‖2 if 𝑥1 ≤ 0

Algorithm: Proposition Let 𝑯 = 𝑰 − 2𝐮𝐮𝑇 with ‖𝐮‖2 = 1.


given a vector 𝐱 ∈ ℝ𝑛
−‖𝐱‖2 if 𝑥1 > 0 1. 𝑯𝐮 = −𝐮 (reflector).
𝛼={
+‖𝐱‖2 if 𝑥1 ≤ 0
2. 𝑯𝐯 = 𝐯, if 𝐮𝑇 𝐯 = 0
𝛽 = 𝛼(𝛼 − 𝑥1 )
𝐯 = (𝑥1 − 𝛼, 𝑥2 , 𝑥3 , … , 𝑥𝑛 )𝑇 3. 𝑯 = 𝑯𝑇 (𝑯 is symmetric)
𝐯 𝐯
𝐮= = 4. 𝑯𝑇 = 𝑯−1 (𝑯 is orthogonal)
√2𝛽 ‖𝐯‖2
5. 𝑯−1 = 𝑯 (𝑯 is an involution)
2𝐯𝐯 𝑇 1
𝑯=𝑰− 2 = 𝑰 − 𝐯𝐯 𝑇 6. 𝑯𝑇 𝑯 = 𝑯2 = 𝑰 (is an idempotent)
‖𝐯‖2 𝛽

The matrix 𝑯 formed in this way is called a Householder transformation. The matrix 𝑯 is
determined by the vector 𝐯 and the scalar 𝛽.

Theorem (Existence and Uniqueness Theorem) Let 𝐱, 𝐲 be two vectors in ℝ𝑛 with 𝐱 ≠ 𝐲


but ‖𝐱‖2 = ‖𝐲‖2 . Then there exists a unique reflector 𝑯 such that 𝑯𝐱 = 𝐲.

Proof: (Existence) Let 𝐯 = 𝐱 − 𝐲 and 𝑯 = 𝑰 − 𝛾 𝐯𝐯 𝑇 , where 𝛾 = 2/‖𝐯‖22 . Note that

1 1
𝐱 = (𝐱 + 𝐲) + (𝐱 − 𝐲) ⟹ 𝑯(𝐱 − 𝐲) = −(𝐱 − 𝐲)
2 2
Since (𝐱 + 𝐲)𝑇 (𝐱 − 𝐲) = ‖𝐱‖22 − ‖𝐲‖22 = 0, by Proposition, we have 𝑯(𝐱 + 𝐲) = (𝐱 + 𝐲)

𝑯(𝐱 − 𝐲) = −(𝐱 − 𝐲)
{ ⟹ 𝑯𝐱 = 𝐲
𝑯(𝐱 + 𝐲) = (𝐱 + 𝐲)
(Uniqueness) let we have 𝑯1 ≠ 𝑯2 be two Householder transformations and 𝐱, 𝐲 be two
vectors in ℝ𝑛 with 𝐱 ≠ 𝐲 but ‖𝐱‖2 = ‖𝐲‖2 . Assume that 𝑯1 𝐱 = 𝐲 and 𝑯2 𝐱 = 𝐲 and by
Proposition, we have
𝑯 𝐱=𝐲 𝑯 𝐲=𝐱 𝑯 𝐲 = 𝑯1 𝐱
{ 1 ⟹{ 1 ⟹{ 1 ⟹ 𝑯1 𝑯2 𝐱 = 𝑯1 𝐱 ⟹ 𝑯1 = 𝑯2
𝑯2 𝐱 = 𝐲 𝑯2 𝐱 = 𝐲 𝑯1 𝑯2 𝐱 = 𝑯1 𝐲

This contradiction proves the uniqueness of Householder transformation. ■


Application of reflectors to a matrix 𝑨 ∈ ℝ𝑚×𝑛

If 𝑯 = 𝑰 − 𝛾 𝐯𝐯 𝑇 ∈ ℝ𝑚×𝑚 , then 𝑯𝑨 = (𝑰 − 𝛾 𝐯𝐯 𝑇 )𝑨 = 𝑨 − 𝐯𝐰 𝑇 with 𝐰 = 𝛾𝑨𝑇 𝐯


If 𝑯 = 𝑰 − 𝛾 𝐯𝐯 𝑇 ∈ ℝ𝑛×𝑛 , then 𝑨𝑯 = 𝑨(𝑰 − 𝛾 𝐯𝐯 𝑇 ) = 𝑨 − 𝐰𝐯 𝑇 with 𝐰 = 𝛾𝑨𝐯

Notes: An 𝑚 × 𝑛 Householder update cost: 4𝑚𝑛 flops = {a matrix-vector multiplication} +


{an outer product update}. Householder updates never require the explicit formation of
the Householder matrix.

a Householder matrix is a special form of


linear transformation 𝑯 which was suggested by Alston Scott Householder in a 1958. If
the transformation matrices 𝑯𝑖 is chosen to be unitary, then the condition numbers
associated with the systems 𝑯𝑖 𝑨𝐱 = 𝑯𝑖 𝒃 do not change (so they certainly don't get worse).
Furthermore, the matrices 𝑯𝑖 , should be chosen so that the matrices 𝑨𝑖 become simpler.
As was suggested by Householder, this can be accomplished in the following manner.
The unitary matrix 𝑯 is chosen to be 𝑯 = 𝑰 − 2𝒘𝒘𝑇 with 𝒘𝑇 𝒘 = 1 𝒘 ∈ ℂ𝒏 .

This matrix is Hermitian: 𝑯𝑇 = (𝑰 − 2𝒘𝒘𝑇 )𝑇 = 𝑰 − 2𝒘𝒘𝑇 = 𝑯 and unitary, that is 𝑯𝑇 𝑯 = 𝑰


and therefore involutory: 𝑯2 = 𝑰. As any vector 𝐯 can be normalized 𝐯 = 𝐯/‖𝐯‖ to have a
unit norm, the Householder matrix defined above can be written as:

𝐯𝐯 𝑇
𝑯 = 𝑰 − 2𝒘𝒘𝑇 = 𝑰 − 2
‖𝐯‖2

We define specifically a vector 𝐯 = 𝐱 − ‖𝐱‖𝐞1 with 𝐱 is any n-dimensional vector and 𝐞1 is


the first standard basis vector with all elements equals zero except the first one. The
norm squared of this vector can be found to be

‖𝐯‖2 = 𝐯 𝑇 𝐯 = (𝐱 − ‖𝐱‖𝐞1 )𝑻 (𝐱 − ‖𝐱‖𝐞1 ) = 2‖𝐱‖(‖𝐱‖ − 𝑥1 )

Notice that 𝑯𝐱 = 𝑑𝐞1 with 𝑑 = ‖𝐱‖ Let we prove this claim

𝐯𝐯 𝑇 𝐯(𝐱 − ‖𝐱‖𝐞1 )𝑇 𝐯(‖𝐱‖2 − ‖𝐱‖𝑥1 )


𝑯𝐱 = (𝑰 − 2 ) 𝐱 = (𝑰 − 2 ) 𝐱 = (𝐱 − 2 ) = 𝐱 − 𝐯 = ‖𝐱‖𝐞1
‖𝐯‖2 ‖𝐯‖2 2‖𝐱‖(‖𝐱‖ − 𝑥1 )

We see that all elements in 𝐱 except the first one are eliminated to zero. This feature of
the Householder transformation is the reason why it is widely used.

Let 𝐜𝑖 be the columns of the matrix 𝑨 = [𝐜1 𝐜2 … 𝐜𝑛 ], it is very easy to observe that if we let
𝐯 = 𝐜1 − ‖𝐜1 ‖𝐞1 then
⋆ ⋆ ⋆ ⋆ ⋆ ⋆ ⋆
𝐯𝐯 𝑇 0 ⋆ ⋆ 0
(𝑰 − 2 ) 𝐜1 = ‖𝐜1 ‖𝐞1 ⟹ 𝑯1 𝑨 = [‖𝐜1 ‖𝐞1 ⋮ 𝐜̂ 2 ⋮ ⋯ ⋮ 𝐜̂ 𝑛 ] = 0 ⋆ ⋱ ⋆ =( 𝑨′ )
‖𝐯‖2 ⋮
⋮ ⋮ ⋮
(0 ⋆ ⋆) 0

We then construct second transformation matrix 𝑯2 (𝑯1 𝑨)

1 0 ⋯ 0
0 𝐯2 𝐯2 𝑇 )
𝑯2 = ( with 𝐯2 corresponds to the 2nd columns of the matrix 𝑨′
⋮ 𝑰−2
‖𝐯2 ‖2
0
If we redo the process 𝑘 times we obtain 𝑨𝑘 = 𝑷𝑘 𝑨𝑘−1 ⟹ 𝑨𝑘 = 𝑯𝑘 … 𝑯2 𝑯1 𝑨0 . A matrix
𝑨 = 𝑨0 can be reduced step by step using these unitary "Householder matrices" 𝑯𝑘 into
an upper triangular matrix 𝑨𝑛−1 = 𝑯𝑛−1 … 𝑯2 𝑯1 𝑨0 = 𝑹.

The Householder reduction of a matrix to triangular form requires about 2𝑛3 /3


operations. In this process an 𝑛 × 𝑛 unitary matrix 𝑯 = 𝑯𝑛−1 … 𝑯2 𝑯1 consisting of
Householder matrices 𝑯, and an 𝑛 × 𝑛 upper triangular matrix 𝑹 are determined so that

𝑯𝑨 = 𝑹 or 𝑨 = 𝑯−1 𝑹 = 𝑸𝑹

An upper triangular matrix 𝑹, that 𝑨𝑹−1 = 𝑸 is a matrix with orthonormal columns, can
also be produced directly by the application of Gram-Schmidt orthogonalization to the
columns at of 𝑨 = [𝐜1 𝐜2 … 𝐜𝑛 ].

Remark: Since 𝑯 is orthogonal we have ‖𝑯𝐱‖2 = ‖𝐱‖2 = 𝑑2 ⟹ 𝑑 = ±‖𝐱‖. We can still


choose the sign of 𝑑, and we choose it such that no cancellation occurs in computing
𝐱 − 𝑑𝐞1. {𝑑 = +‖𝐱‖ if 𝑥1 < 0} and {𝑑 = −‖𝐱‖ if 𝑥1 ≥ 0}

Algorithm: At starting let 𝑨 = 𝑸1 𝑹1 with 𝑹1 = 𝑨 and 𝑸1 = 𝑰


for 𝑘 = 1: 𝑚 − 1

𝐱 = 𝐳𝐞𝐫𝐨𝐬(𝑘: 𝑚, 𝑘); 𝐱 = 𝑹𝑖 (𝑘: 𝑚, 𝑘); d = −sign(𝐱(𝑘))‖𝐱‖ ;


𝐯 = 𝐱; 𝐯(𝑘) = 𝐱(𝑘) − d𝐞1; 𝐫 = ‖𝐯(𝑘)‖;
𝐯𝐯 𝑇 𝐯𝐯 𝑇
𝑹𝑘+1 = (𝑰 − 2 2 ) 𝑹𝑘 ; 𝑸𝑘+1 = (𝑰 − 2 2 ) 𝑸𝑘
𝐫 𝐫
end

Example:
× × × × × × × × × × × × × × × × × × × ×
× × × × × 𝑯1 0 × × × × 𝑯2 0 × × × × 𝑯3 0 × × × ×
× × × × × → 0 × × × × → 0 0 × × × → 0 0 × × ×
× × × × × 0 × × × × 0 0 × × × 0 0 0 × ×
(× × × × ×) (0 × × × ×) (0 0 × × ×) (0 0 0 × ×)

If Householder matrices are used to compute the 𝑄𝑅 factorization, the operation count is
approximately 2𝑛3 /3 multiplications and 2𝑛3 /3 additions.

% QR factorization by Householder reflections.


% Input: A is m-by-n matrix
% Output: Q,R A=QR, Q m-by-m orthogonal, R m-by-n upper triangular
clear all, clc, A=randi(10,8);[m,n] = size(A); Q = eye(m); R=A;
for k = 1:n
z = R(k:m,k);
v = [-sign(z(1))*norm(z) - z(1); -z(2:end)];
P = eye(m-k+1) - 2*(v*v')/(v'*v); % HH reflection
R(k:m,:) = P*R(k:m,:);
Q(k:m,:) = P*Q(k:m,:);
end
Q = Q'; R = triu(R); % enforce exact triangularity
Zero=A-Q*R
An alternative way of programming the Householder Transformations
% QR Factorization Using Householder Transformations

clear all, clc, A=10*rand(4,4) + 10*eye(4,4); [m,n]=size(A);


R=A; %Start with R=A
Q=eye(m); %Set Q as the identity matrix

for k=1:m-1
x=zeros(m,1); x(k:m,1)=R(k:m,k);
d=-sign(x(k))*norm(x); v=x; v(k)=x(k)-d; r=norm(v);
if r~=0, w=v/r; u=2*R'*w;
R=R-w*u'; %Product PR
Q=Q-2*Q*w*w'; %Product QP
end
end

Zero=A-Q*R, Identity=Q*Q'

% Compact Form in programming the Householder Transformations


%-----------------------------------------------------%
% Householder Transformations to the upper QR Form
%-----------------------------------------------------%
% By BEKHITI Belkacem 2020

clear all, clc, M=10*rand(6,6); D=diag([-1 -2 0 -4 -5 -6]);


A=M*D*inv(M); n=size(A,1); AA=A; % save a copy of A

for k=1:n-1
A1=A(k:n,k:n);
v=A1(:,1)-norm(A1(:,1))*eye(n-k+1,1);
P1=eye(n-k+1,n-k+1)-2*(v*v')/(v'*v);
H(:,:,k)=blkdiag(eye(k-1,k-1),P1);
A=H(:,:,k)*A;
end

H1=H(:,:,1),
H2=H(:,:,2),
H3=H(:,:,3),
H4=H(:,:,4),
H5=H(:,:,5),

R=A; R = triu(R)
H=H5*H4*H3*H2*H1;
Zero=AA-H'*R
r=nnz(find(abs(diag(R))>1e-10)) % rank of A
%-----------------------------------------------------%
% Householder Transformations to the lower QR Form
%-----------------------------------------------------%
% By BEKHITI Belkacem 2020
clear all, clc, M=10*rand(6,6); D=diag([-1 -2 0 0 -5 -6]);
A=M*D*inv(M); n=size(A,1); AA=A; % save a copy of A
for k=1:n-1
A2=A(k:n,k:n);
c=A2(1,:)-norm(A2(1,:))*eye(1,n-k+1);
P2=eye(n-k+1,n-k+1)-2*(c'*c)/(c*c');
Q(:,:,k)=blkdiag(eye(k-1,k-1),P2);
A=A*Q(:,:,k);
end
Q1=Q(:,:,1), Q2=Q(:,:,2), Q3=Q(:,:,3), Q4=Q(:,:,4), Q5=Q(:,:,5),
Q=Q1*Q2*Q3*Q4*Q5;
L=A; L = tril(L)
Zero=AA -L*Q'
r=nnz(find(abs(diag(L))>1e-10)) % rank of A

clear all, clc, A=10*rand(6,6);


[Q,R] = qrhouseholder(A)

function [Q,R] = qrhouseholder (A)


n = size(A,1); Q=eye(n); R=A; I = eye(n);
for j=1:n-1
x=R(j:n,j);
v=-sign(x(1))*norm(x)*eye(n-j+1,1)-x;
if norm(v)>0,v=v/norm(v);
P=I; P(j:n,j:n)=P(j:n,j:n)-2*v*v';
R=P*R; Q=Q*P;
end
end
Q, R,
Zero=A-Q*R

For finding the eigenvalues of matrix we may use


the QR algorithms, howover all QR algorithms are computationally expensive: one
iteration of the QR decomposition costs 𝒪(𝑛3 ) FLOPS. Assume that we can do only one
iteration to find one eigenvalue. Then in this case, the cost will be 𝒪(𝑛4 ). The goal of this
section is to present one more technique for reducing computations. It turns out that if
we first reduce the original matrix 𝑨 to upper Hessenberg form and then apply the
method of QR iteration without computing Q, we dramatically reduce computations, and
instead of 𝒪(𝑛4 ) FLOPS, we perform our computations in 𝒪(𝑛3 ) FLOPS. A Hessenberg
matrix is a special kind of square matrix, one that is “almost” triangular. More precisely,
an upper Hessenberg matrix has zero entries below the first subdiagonal, and a lower
Hessenberg matrix has zero entries above the first superdiagonal. They are named after
Karl Hessenberg.
⋆ ⋆ ⋆ ⋯ ⋆
⋆ ⋆ ⋆ … ⋆
𝑨𝐻 = 𝑸𝑻 𝑨𝑸 = ( ⋆ ⋆ ⋮ ⋆)
⋱ ⋱ ⋮
⋆ ⋆

Again, The QR factorization should be computed by the Householder algorithm rather


than by the Gram–Schmidt ortho-normalization process. A priori, the QR factorization of
a matrix of order 𝑛 requires on the order of 𝒪(𝑛3 ) operations. Such a complexity can be
drastically reduced by first reducing the original matrix 𝑨 to its upper Hessenberg form.

It remains for us to show how the Hessenberg decomposition 𝑼𝑇 𝑨𝑼 = 𝑯 with 𝑼𝑇 𝑼 = 𝑰


can be computed. The transformation 𝑼 can be computed as a product of Householder
matrices 𝑷1 , … , 𝑷𝑛−2 The role of 𝑷𝑘 is to zero the 𝑘 𝑡ℎ column below the subdiagonal. In
the 𝑛 = 6 case, we have
× × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×
× × × × × × × × × × × × × × × × × × × × × × × × × × × × × ×
× × × × × × 𝑷1 0 × × × × × 𝑷2 0 × × × × × 𝑷3 0 × × × × × 𝑷4 0 × × × × ×
→ → → →
× × × × × × 0 × × × × × 0 0 × × × × 0 0 × × × × 0 0 × × × ×
× × × × × × 0 × × × × × 0 0 × × × × 0 0 0 × × × 0 0 0 × × ×
[× × × × × ×] [0 × × × × ×] [0 0 × × × ×] [0 0 0 × × ×] [0 0 0 0 × ×]

Note: The reduction to Hessenberg form of a square matrix 𝑨 via an orthonormal basis
change is not unique. However, if the first unit vector of the basis change matrix is
specified then the Hessenberg matrix becomes essentially unique. This was first
observed by Francis (1961/1962).

The reduction to Hessenberg form is especially helpful when the


matrix is Hermitian. Since unitary similarity transformations
preserve the Hermitian property, the reduced matrix is not merely
Hessenberg, it is tridiagonal. That is, it has the form

Algorithm (Householder Reduction to Hessenberg Form) Given 𝑨 ∈ ℝ𝑛×𝑛 , the following


algorithm overwrites A with 𝑯 = 𝑼𝑇 𝑨𝑼 where 𝑯 is upper Hessenberg and 𝑼 is product of
Householder matrices.

for 𝑘 = 1: 𝑛 − 2
𝑨1 = 𝑨(𝑘 + 1: 𝑛, 𝑘);
𝐯 = 𝑨1 − norm(𝑨1 ) ⋆ eye(𝑛 − 𝑘, 1);
𝑟 = length(𝐯);
𝑨(𝑘 + 1: 𝑛, 𝑘: 𝑛) = (eye(𝑟, 𝑟) − (2/(𝐯 T ⋆ 𝐯)) ⋆ 𝐯 ⋆ 𝐯 T ) ⋆ 𝑨(𝑘 + 1: 𝑛, 𝑘: 𝑛);
𝑨(1: 𝑛, 𝑘 + 1: 𝑛) = 𝑨(1: 𝑛, 𝑘 + 1: 𝑛) ⋆ (eye(𝑟, 𝑟) − (2/(𝐯 T ⋆ 𝐯)) ⋆ 𝐯 ⋆ 𝐯 T );
end
This algorithm requires 10𝑛3 /3 flops. If 𝑼 is explicitly formed, an additional 4𝑛3 /3 flops
are required. The 𝑘 𝑡ℎ Householder matrix can be represented in 𝑨(𝑘 + 2: 𝑛, 𝑘). See Martin
and Wilkinson 1968 for a detailed description. The roundoff properties of this method
for reducing A to Hessenberg form are very desirable. (Wilkinson 1965, p.351)

clear all, clc,


M=10*rand(6,6); D=diag([-1 -2 -3 -4 -5 -6]); A=M*D*inv(M); %A=(A'+ A)/2;
n=size(A,1); AA=A; % save a copy of A

for k = 1:n-2
A1=A(k+1:n,k); v=A1-norm(A1)*eye(n-k,1); r=length(v);
P=(eye(r,r)-(2/(v'*v))*v*v');
A(k+1:n,k:n) = P*A(k+1:n,k:n); A(1:n,k+1:n) = A(1:n, k+1:n)*P;
U(:,:,k)=blkdiag(eye(k,k),P);
end
H=triu(A,-1), eig(AA), eig(H),

% Verification by similarity transformation

U1=U(:,:,1); U2=U(:,:,2); U3=U(:,:,3); U4=U(:,:,4); U=U1*U2*U3*U4;


H=U'*AA*U; H=triu(H,-1), eig(H)

Householder algorithm uses unitary similarity transformations to transform a matrix to


upper Hessenberg form in 10𝑛3 /3 flops. This algorithm does not of itself solve the
eigenvalue problem, but it is extremely important nevertheless because it reduces the
problem to a form that can be manipulated inexpensively.

Why are Hessenberg Forms?

▪ Many linear algebra algorithms require significantly less computational effort when
applied to triangular matrices, and this improvement often carries over to Hessenberg
matrices as well. The Hessenberg decomposition of a matrix is not unique.

▪ If the constraints of a linear algebra problem do not allow a general matrix to be


conveniently reduced to a triangular one, reduction to Hessenberg form is often the next
best thing. In fact, reduction of any matrix to a Hessenberg form can be achieved in a
finite number of steps, for example, through Householder's transformation or Arnoldi
iteration etc…

▪ In the QR algorithm and eigenvalue problems the reduction of general matrix to a


Hessenberg form often economizes the arithmetic involved.

▪ Hessenberg decomposition is the first step in Schur algorithm. The reduction to Schur
Form is done in two steps. First a reduction to upper Hessenberg Form is performed, and
then a further reduction to Schur Form is performed.
Givens rotations (also known as Jacobi rotations) are
named after Wallace Givens, who introduced them to numerical analysts in the 1950s
while he was working at Argonne National Laboratory. The elementary Givens matrices
are orthogonal rotation matrices which make it possible to cancel certain coefficients of a
vector or a matrix. For a given pair of indices 𝑖 and 𝑘, and an angle 𝜃, these matrices
are defined by 𝑮(𝑖, 𝑘, 𝜃) = 𝑰𝑛 − 𝑾 where 𝑾 ∈ ℝ𝑛×𝑛 is the matrix whose all coefficients are
zero except 𝑤𝑖𝑖 = 𝑤𝑘𝑘 = 1 − cos(𝜃), 𝑤𝑖𝑘 = −sin(𝜃) = −𝑤𝑘𝑖 . A Givens matrix is of the form

𝑖 𝑘
1 𝚶
1

cos(𝜃) ⋯ sin(𝜃) 𝑖
𝑮(𝑖, 𝑘, 𝜃) =
⋮ ⋱ ⋮
− sin(𝜃) … cos(𝜃) 𝑘

1
1
(𝚶 )

For a given vector 𝐱 ∈ ℝ𝑛 , the product 𝐲 = 𝑮𝑇 (𝑖, 𝑘, 𝜃)𝐱 is no more than a rotation of 𝐱 by
angle 𝜃 (counterclockwise) in the plane coordinates (𝑥𝑖 , 𝑥𝑘 ). By setting 𝑐 = cos(𝜃), 𝑠 =
sin(𝜃), we so
𝑥𝑗 𝑗 ≠ 𝑖, 𝑘
𝑦𝑗 = {𝑐𝑥𝑖 − 𝑠𝑥𝑘 𝑗=𝑖
𝑠𝑥𝑖 + 𝑐𝑥𝑘 𝑗=𝑘

Let 𝛼𝑖𝑘 = √𝑥𝑖2 + 𝑥𝑘2 , notice that if 𝑐 and 𝑠 satisfy 𝑐 = 𝑥𝑖 /𝛼𝑖𝑘 , 𝑠 = −𝑥𝑘 /𝛼𝑖𝑘 (in this case,
𝜃 = arctan(−𝑥𝑘 /𝑥𝑖 ) ), we get 𝑦𝑘 = 0, 𝑦𝑖 = 𝛼𝑖𝑘 and 𝑦𝑖 = 𝑥𝑖 for 𝑗 = 𝑖, 𝑘. Likewise, if 𝑐 = 𝑥𝑘 /𝛼𝑖𝑘 ,
𝑠 = 𝑥𝑖 /𝛼𝑖𝑘 (i.e. 𝜃 = arctan(𝑥𝑖 /𝑥𝑘 ) ), then 𝑦𝑖 = 0, 𝑦𝑘 = 𝛼𝑖𝑘 and 𝑦𝑗 = 𝑥𝑗 for 𝑗 = 𝑖, 𝑘. The Givens
matrices can be used to perform the 𝑄𝑅 factorization and Schur decomposition.

Algorithm Given scalars 𝑎 and 𝑏 this Algorithm Given a matrix 𝑯 ∈ ℝ𝑛×𝑛 , 𝑐 = cos(𝜃)
function computes 𝑐 = cos(𝜃) and and 𝑠 = sin(𝜃) this function computes
𝑇 (𝑖,
𝑐 𝑠 𝑇 𝑥𝑖 𝜏 𝑮 𝑘, 𝜃)𝑯 which effects just two rows of 𝑯,
𝑠 = sin(𝜃) so ( ) (𝑥 ) = ( )
−𝑠 𝑐 𝑘 0
function [𝑯] = garow(𝑯, 𝑐, 𝑠, 𝑖, 𝑘, 𝑗1 , 𝑗2 )
function [𝑐, 𝑠] = givens(𝑥𝑖 , 𝑥𝑘 ) for 𝑗 = 𝑗1 : 𝑗2
if 𝑥𝑘 = 0 𝑡1 = 𝑯(𝑖, 𝑗); 𝑡2 = 𝑯(𝑘, 𝑗);
𝑐 = 1; 𝑠 = 0 𝑯(𝑖, 𝑗) = 𝑐 ⋆ 𝑡1 − 𝑠 ⋆ 𝑡2 ; 𝑯(𝑘, 𝑗) = 𝑠 ⋆ 𝑡1 + 𝑐 ⋆ 𝑡2 ;
else end
if |𝑥𝑘 | > |𝑥𝑖 | ---------------------------------------------
𝜏 = − 𝑥𝑖 ⁄𝑥𝑘 ; 𝑠 = 1⁄√1 + 𝜏 2 ; 𝑐 = 𝑠𝜏;
else Algorithm Let 𝑯 ∈ ℝ𝑛×𝑛 , 𝑐 = cos(𝜃), 𝑠 = sin(𝜃)
𝜏 = − 𝑥𝑘 ⁄𝑥𝑖 ; 𝑐 = 1⁄√1 + 𝜏 2 ; 𝑠 = 𝑐𝜏; this function computes 𝑯𝑮(𝑖, 𝑘, 𝜃) which
end effects just two columns of 𝑯,
end function [𝑯] = gacol(𝑯, 𝑐, 𝑠, 𝑗 , 𝑗 , 𝑖, 𝑘)
1 2
This algorithm requires 5 flops and a for 𝑗 = 𝑗1 : 𝑗2
single square root. Note that it does not 𝑡1 = 𝑯(𝑗, 𝑖); 𝑡2 = 𝑯(𝑗, 𝑘);
compute 𝜃 and so it does not involve 𝑯(𝑗, 𝑖) = 𝑐 ⋆ 𝑡1 − 𝑠 ⋆ 𝑡2 ; 𝑯(𝑗, 𝑘) = 𝑠 ⋆ 𝑡1 + 𝑐 ⋆ 𝑡2 ;
inverse trigonometric functions. end
The Givens rotation can be used to perform an upper triangular matrix in order to
compute the QR decomposition. The QR factorization of an 𝑚 × 𝑛 matrix 𝑨 is then
computed as follows.

Algorithm Let a matrix 𝑯 ∈ ℝ𝑛×𝑛 , A=10*rand(6,6); n=size(A,1);


𝑐 = cos(𝜃) and 𝑠 = sin(𝜃) this [Q,R] = qrgivens(A)
function computes 𝑮𝑇 (𝑖, 𝑘, 𝜃)𝑯 function [Q,R] = qrgivens(A)
which effects just two rows of 𝑯, Q=eye(n); R=A;
for j=1:n
for i=n:-1:j+1
function [𝑸, 𝑹] = qrgivens(𝑨) x=R(:,j);
if norm([x(i-1),x(i)])>0
𝑸 = 𝑰; 𝑹 = 𝑨; c=x(i-1)/norm([x(i-1),x(i)]);
s=-x(i)/norm([x(i-1),x(i)]);
for 𝑗 = 1: 𝑛
G=eye(n);
for 𝑖 = 𝑚: −1: 𝑗 + 1
G([i-1,i],[i-1,i])=[c,s;-s,c];
[𝑐, 𝑠] = givens(𝑟(𝑖 − 1, 𝑗), 𝑟(𝑖, 𝑗)) R=G'*R; Q=Q*G;
𝑹 = 𝑮𝑇 (𝑖, 𝑗, 𝑐, 𝑠)𝑹; end
𝑸 = 𝑸𝑮(𝑖, 𝑗, 𝑐, 𝑠); end
end end
end Q,R
Zero=A-Q*R

In numerical linear algebra, an


orthogonal matrix decomposition or orthogonal matrix factorization is a factorization of a
matrix into a product of two or three matrices. There are many different matrix
decompositions; each finds use among a particular class of problems. Here in this
section we are going to introduce some importance class of decompositions that are not
mentioned in most textbooks.

The new form of QR


factorization procedure is proposed by F. Rotella 1999 basically this algorithm is based
on a generalization of the Householder transformation. This extension is a block matrix
form of the usual Householder procedure which leads to a dichotomic algorithm which
allows parallel implementation. Let us consider a full column rank matrix 𝑽 and if we
introduce the matrix defined by 𝑯 = 𝑰𝑛 − 2𝑽(𝑽𝑇 𝑽)−1 𝑽𝑇 which appears as a matrix
extension of the usual Householder transform, we have the following result.

Theorem For every (𝑛 × 𝑟) matrix 𝑽, such that rank(𝑽) = 𝑟, then 𝑯 = 𝑰𝑛 − 2𝑽(𝑽𝑇 𝑽)−1 𝑽𝑇 is
symmetric and orthogonal.

Proof: As rank(𝑽) = 𝑟, 𝑽𝑇 𝑽 is nonsingular and 𝑯 is well defined. We can notice here that
(𝑽𝑇 𝑽)−1 𝑽𝑇 = 𝑽+ , the Moore-Penrose pseudo-inverse of the matrix 𝑽 (see Bekhiti B Algebra
Book 2020), thus we can write 𝑯 = 𝑰 − 2𝑽𝑽+ . Consequently, as by definition of the
pseudo-inverse, we have 𝑽𝑽+ = (𝑽𝑽+ )𝑇 and 𝑽𝑽+ 𝑽 = 𝑽, it is a trivial trick to verify that
𝑯𝑇 = 𝑯 and 𝑯2 = 𝑯𝑇 𝑯 = 𝑰𝑛 . 

After this preliminary step, let us state now the main result
Theorem For any full column rank matrix 𝑨 = (𝑨1𝑇 𝑨𝑇2 )𝑇 ∈ ℝ𝑚×𝑟 where 𝑨1 is a (𝑟 × 𝑟)
𝑨 +𝑿
nonsingular matrix, if we choose 𝑽 = ( 1 ) where 𝑿 is given by 𝑿 = 𝑷𝑇 𝚲𝑷𝑨1 and the
𝑨2
matrix 𝚲 is 𝚲 = 𝑫1/2 = diag 𝑟𝑖=1 (√𝑑𝑖 ) where the nonnegative scalar 𝑑𝑖 and the orthogonal
matrix 𝑷, are defined by 𝑰𝑟 + (𝑨2 𝑨1−1 )𝑇 (𝑨2 𝑨1−1 ) = 𝑷𝑇 (diag 𝑟𝑖=1 (𝑑𝑖 ))𝑷 then

𝑨 −𝑿
𝑯𝑨 = (𝑰𝑛 − 2𝑽(𝑽𝑇 𝑽)−1 𝑽𝑇 )𝑨 = 𝑯 ( 1 ) = (𝑶 )
𝑨2 ((𝑚−𝑟)×𝑟)

where 𝑰𝑟 is the (𝑟 × 𝑟) identity matrix and 𝑶((𝑚−𝑟)×𝑟) is the ((𝑚 − 𝑟) × 𝑟) null matrix.

Proof: F. Rotella 1999

clear all, clc, A=10*rand(12,12); tol=0.001;


n=size(A,1); r=2; % The Block dimension
I=eye(r,r); QT=eye(n,n); A1=A; AA=A; % save a copy of A
%-----------------------------------------
if mod(n,r)==0, m=n/r; else, m=(n+1)/r; end
%-----------------------------------------
for k=1:m-1
B=(A(r+1:end,1:r)*inv(A(1:r,1:r)));
Z=I+B'*B; [P D]=svd(Z); %[P D]=eig(Z)
X=P'*(D^0.5)*P*A(1:r,1:r);
V=[A(1:r,1:r)+X;A(r+1:end,1:r)];
H=eye(n-k*r+2,n-k*r+2)-2*V*inv(V'*V)*V';
T(:,:,k)=blkdiag(eye(k*r-2,k*r-2),H);
A1=T(:,:,k)*A1; QT=T(:,:,k)*QT;
A=A1(k*r+1:end,k*r+1:end);
end
R=A1; Q=QT';
%-----------------------------------------
for i=1:n,
for j=1:n
if abs(R(i,j))<tol,
R(i,j)=0;
end
end;
end;
%-----------------------------------------
R, L=eig(R),

The Bidiagonalization is one


𝐻
of unitary (orthogonal) matrix decompositions such that 𝑼 𝑨𝑽 = 𝑩, where 𝑼 and 𝑽 are
unitary (orthogonal) matrices; subscript 𝐻 denotes Hermitian transpose; and 𝑩 is upper
bidiagonal. 𝑨 is allowed to be rectangular. For large scale, the unitary matrices 𝑼 and 𝑽,
are calculated iteratively by using Lanczos method, referred to as Golub-Kahan-Lanczos
method. Bidiagonalization has a very similar structure to the singular value
decomposition (SVD). However, it is computed within finite operations, while SVD
requires iterative schemes to find singular values. 𝑩 = 𝑼𝐻 𝑨𝑽 = (𝑼1 𝑼2 … 𝑼𝑛 )𝐻 𝑨(𝑽1 𝑽2 … 𝑽𝑛 ).
Notice that the matrix 𝑩 can be easily decomposed into the form of SVD, in other words
𝑩 can be viewed as an eigenvalue problem of a larger matrix H in the following way.
𝐻
𝑯 = (𝟎 𝑩 )
𝑩 𝟎
and use the SVD of 𝑩 in the form 𝑩 = 𝑼𝑩 × Σ × 𝑽𝐻
𝐵 to arrive at

𝑩𝐻 ) ( 𝑽𝑩 𝑽𝑩 𝑽 𝑽𝑩 Σ 𝟎 ) ⟺ 𝑯 = 𝑻 (Σ 𝟎 ) 𝑻−1
(𝟎 )=( 𝑩 )(
𝑩 𝟎 𝑼𝑩 −𝑼𝑩 𝑼𝑩 −𝑼𝑩 𝟎 −Σ 𝟎 −Σ

% Golub-Kahan Algorithm (Golub-Kahan Bidiagonalization)

clear all, clc, M=10*rand(6,6); A=M*diag([6 5 4 3 2 1])*inv(M);


m=size(A,1); n=size(A,2); AA=A;
for k= 1:n
x=A(k:m, k); e1=[1;zeros(length(x)-1,1)];
uk=x + sign(x(1))*norm(x)*e1; uk=uk/norm(uk);
A(k:m,k:n)=A(k:m,k:n)-2*uk*((uk)'*A(k:m,k:n));
if k<=n-2
x=(A(k,k+1:n))'; e1=[1;zeros(length(x)-1,1)];
vk=x+sign(x(1))*norm(x)*e1; vk=vk/norm(vk);
A(k:m,k+1:n)=A(k:m,k+1:n)-2*(A(k:m,k+1:n)*vk)*vk';
end
end
B=A % bidiagonal form

H=[zeros(n,n) B';B zeros(n,n)];


[VH,DH]=eig(H);
V=VH(1:n,1:n); U=VH(n+1:2*n,1:n);
Sigma=2*DH(1:n,1:n);
Zero=B-U*Sigma*V' % Zero=B-U*Sigma*inv(V)
A linear equation in the variables
𝑥1 , … , 𝑥𝑛 is an equation that can be written in the form ∑𝑛𝑖=1 𝛼𝑖 𝑥𝑖 = 𝑏 where 𝑏 and the
coefficients 𝛼𝑖 are known real or complex numbers. A system of linear equations (or a
linear system) is a collection of one or more linear equations involving the same
variables, they are recorded compactly in a rectangular array called a matrix, 𝑨𝐱 = 𝒃. A
solution of the system is a list (𝑠1 , … , 𝑠𝑛 ) of values for 𝑥1 , … , 𝑥𝑛 , respectively, which make
the equations satisfied. (For more detail we refer the reader to contact Prof: Yousef Saad)

Typical Large-scale problem: In many situations, e.g., boundary value problems for
ordinary and partial differential equations, matrices arise where a large proportion of the
elements are equal to zero (they are called sparse matrix). If the nonzero elements are
concentrated around the main diagonal, then the matrix is called a band matrix.

Definition: If 𝑨 ∈ ℝ𝑚×𝑛 is an 𝑚 × 𝑛 matrix, with columns 𝒂1 , 𝒂2 , … , 𝒂𝑛 , and if 𝐱 is in ℝ𝑛 ,


then the product of 𝑨 and 𝐱, denoted by 𝑨𝐱 is the linear combination of the columns of 𝑨
using the corresponding entries in 𝐱 as weights; that is, 𝑨𝐱 = ∑𝑛𝑖=1 𝒂𝑖 𝑥𝑖 with 𝒂𝑖 ∈ ℝ𝑚 .

⦁ The set of all possible solutions is called the solution set of the linear system.
⦁ Two linear systems are called equivalent if they have the same solution set.
⦁ A linear system can have: no solution, or exactly one solution, or infinitely many
solutions. 𝑨𝐱 = 𝒃 is called a matrix equation.
⦁ 𝑨𝐱 is defined only if the number of columns of 𝑨 equals the number of entries in 𝐱

Definition: A system of linear equations is said to be inconsistent if it has no solution


(Case 1 above). It is consistent if it has at least one solution (Case 2 or Case 3 above).

Theorem: (Prof: Yousef Saad Courses) Let 𝑨 ∈ ℝ𝑚×𝑛 be an 𝑚 × 𝑛 matrix. Then the
following four statements are all mathematically equivalent.
1. For each 𝒃 in ℝ𝑚 , the equation 𝑨𝐱 = 𝒃 has a solution.
2. Each 𝒃 in ℝ𝑚 is a linear combination of the columns of 𝑨. (i.e. 𝒃 ∈ ℛ(𝑨))
3. The columns of 𝑨 span ℝ𝑚 (columns of 𝑨 are linearly independent)
4. 𝑨 has a pivot position in every row. (Later we will see what does this mean)
Matrix Operations and structures:

⦁ An augmented matrix of a system consists of the coefficient matrix with the right-hand
side added as a last column [𝑨 𝒃].

⦁ Basic Strategy: To solve systems of equations we manipulate these "rows" to get


equivalent equations that are easier to solve.

⦁ We do not change the solution set of a linear system if we


▪ Permute two equations
▪ Multiply a whole equation by a nonzero scalar
▪ Add an equation to another.
⦁ Two systems are row-equivalent if one is obtained from the other by a succession of the
above operations.

⦁ Eliminating an unknown consists of combining rows so that the


coefficients for that unknown in the equations become zero.

⦁ Gaussian Elimination: performs eliminations to reduce the system to a


"triangular form".

The Algebra of Triangular Matrices: For future reference we list a few properties about
products and inverses of triangular and unit triangular matrices.

• The inverse of an upper (lower) triangular matrix is upper (lower) triangular,


• The product of two upper (lower) triangular matrices is upper (lower) triangular.
• The inverse of a unit upper (lower) triangular matrix is unit upper (lower) triangular.
• The product of 2 unit upper/lower triangular matrices is unit upper/lower triangular.

Consider the 𝑛 × 𝑛 system 𝑳𝐱 = 𝒃 where 𝑳 is a


nonsingular, lower-triangular matrix (ℓ𝑖𝑖 ≠ 0). It is easy to solve this system by Forward
Substitutions:

𝐟𝐨𝐫 𝑖 = 1: 𝑛
𝐟𝐨𝐫 𝑗 = 1: 𝑖 − 1
𝑏(𝑖) = 𝑏(𝑖) − 𝐿(𝑖, 𝑗) ⋆ 𝑏(𝑗)
𝐞𝐧𝐝
𝐢𝐟 𝐿(𝑖, 𝑖) == 0, set error flag, exit
𝑏(𝑖) = 𝑏(𝑖)/𝐿(𝑖, 𝑖)
𝐞𝐧𝐝
𝐞𝐧𝐝

The result is 𝐱, and as for the computational complexity for each 𝑖, the forward
substitution requires 2(𝑖 − 1) + 1 flops. Thus the total number of flops becomes 𝐶(𝑛) =
∑𝑛𝑖=1(2(𝑖 − 1) + 1) = ∑𝑛𝑖=1(2𝑖 − 1) = 𝑛2 .
Consider the system 𝑼𝐱 = 𝒃 where 𝑼 is a nonsingular,
upper-triangular matrix (𝑢𝑖𝑖 ≠ 0). It is clear that we should solve the system from bottom
to top by Back Substitutions:

𝐟𝐨𝐫 𝑖 = 𝑛: −1: 1
𝐢𝐟 𝑈(𝑖, 𝑖) == 0, error(′U: singular! ′);
𝐞𝐧𝐝
𝑥(𝑖) = 𝑏(𝑖)/𝑈(𝑖, 𝑖)
𝑏(1: 𝑖 − 1) = 𝑏(1: 𝑖 − 1) − 𝑈(1: 𝑖 − 1, 𝑖) ⋆ 𝑥(𝑖);
𝐞𝐧𝐝
Computational complexity: 𝐶(𝑛) = 𝑛2 + 𝒪(𝑛) flops

Here you are given the most popular and exist methods for solving linear systems.
Direct Methods to Solve Equations
▪ Gauss Elimination Method
▪ Gauss–Jordan Elimination Method
▪ Matrix Inverse Method (Polynomial method or Faddeev–LeVerrier algorithm)
▪ Decomposition (Factorization)
⦁ LU Decomposition (Factorization): Triangularization
⦁ Other Decomposition (Factorization): Cholesky, QR, and SVD
Iterative Methods to Solve Equations
▪ The Jacobi method
▪ The Gauss–Seidel method and relaxation technique
▪ The Conjugate gradient method
▪ The Free inversion Free derivative method (proposed by the author)
▪ Krylov subspace methods (Lanczos and Arnoldi)

--------------------------Direct Methods to Solve Equations---------------------------


Theorem: (Yousef Saad Courses and BEKHITI B algebra Book 2020) Let 𝑨 ∈ ℝ𝑛×𝑛 Then
the following 4 statements are equivalent
(1) 𝑨 is invertible
(2) The columns of 𝑨 are linearly independent
(3) The Span of the columns of 𝑨 is ℝ𝑛
(4) rref(𝑨) is the identity matrix. rref stands for Reduced Row Echelon Form.

Gauss Elimination is a very basic algorithm for solving


𝑨𝐱 = 𝒃. The algorithms developed here produce (in the absence of rounding errors) the
unique solution of 𝑨𝐱 = 𝒃 whenever 𝑨 ∈ ℝ𝑛×𝑛 is nonsingular.

Our strategy: Transform the system 𝑨𝐱 = 𝒃 to an equivalent system 𝑼𝐱 = 𝒃, where 𝑼 is


upper-triangular. It is convenient to represent 𝑨𝐱 = 𝒃 by an augmented matrix [𝑨 𝒃];
each equation in 𝑨𝐱 = 𝒃 corresponds to a row of the augmented matrix. The method is
based on the following very interesting theorem

Theorem: Every matrix 𝑨 ∈ ℝ𝑛×𝑛 is similar to a upper-triangular matrix 𝑼 = 𝑷−1 𝑨𝑷


whose diagonal entries are the eigenvalues of 𝑨.

Proof: Later on we will see the proof of this Theorem.


✔ The Gauss Elimination Process: Since triangular systems are easy to solve, we will
transform a linear system into one that is triangular. Main operation: combine rows so
that zeros appear in the required locations to make the system triangular.

By means of three elementary row operations applied on the augmented matrix.

▪ Replacement: 𝑅𝑖 ⟵ 𝑅𝑖 + 𝛼𝑅𝑗 (𝑖 ≠ 𝑗)
▪ Interchange: 𝑅𝑖 ⟵ 𝑅𝑗
▪ Scaling: 𝑅𝑖 ⟵ 𝛼𝑅𝑖

We obtain the required matrix form.

̅] is obtained from [𝑨 𝒃] by elementary row operations (EROs), then systems


̅ 𝒃
⦁ If [𝑨
̅ 𝒃
[𝑨 𝒃] and [𝑨 ̅] represent the same solution.

̅ is obtained from 𝑨 by EROs, then 𝑨


⦁ Suppose 𝑨 ̅ is nonsingular if and only if 𝑨 is.

⦁ Each ERO corresponds to left-multiple of an elementary matrix (i.e. 𝑬𝑖 𝑨).

⦁ Each elementary matrix 𝑬𝑖 is nonsingular.

⦁ The elementary matrices corresponding to “Replacement” and “Scaling” operations are


lower triangular.

Algorithm: (Gauss Elimination Method)

% Elimination phase
clear all, clc, A=10*rand(10,10)+10*eye(10,10);
b=100*rand(10,1); n=length(b);
A1=A; b1=b; % storing A & b before the elimination process go on
for k=1:n-1
for i=k+1:n
if A(i,k)==0
break
else
lambda = A(i,k)/A(k,k); % is called pivot
A(i,k+1:n) = A(i,k+1:n) - lambda*A(k,k+1:n);
b(i)= b(i) - lambda*b(k);
end
end
end
% Back substitution phase
for k=n:-1:1
b(k) = (b(k) - A(k,k+1:n)*b(k+1:n))/A(k,k);
end
x = b, zero=A1*x-b1

The number of operations required to solve a linear system with 𝑛 unknowns by


Gaussian elimination is 𝐶(𝑛) = (2⁄3)𝑛3 + 𝒪(𝑛) flops.
Exercise: Here in this example a few explanations of the Gaussian elimination method

✔ Matrix Inversion by Elimination: Computing the inverse of a matrix and solving


simultaneous equations are related tasks. The most economical way to invert an 𝑛 × 𝑛
matrix 𝑨 is to solve the equations: 𝑨𝐱 = 𝒃 ⟹ 𝑨−1 𝑨𝐱 = 𝑨−1 𝒃, which reduces to 𝐱 = 𝑨−1 𝒃.
Inversion of large matrices should be avoided whenever possible due its high cost,
therefore we should use numerical method such as the augmented method (i.e. [𝑨 𝑰] ⟶
[𝑰 𝑩] where 𝑩 = 𝑨−1) to compute the inverse of 𝑨 and avoiding the calculation of det(𝑨).
This is a fun way to find the inverse of a matrix, play around with the rows (adding,
multiplying or swapping) until we make matrix 𝑨 into the identity matrix 𝑰. In linear
algebra you know that by elementary operation (Row Echelon Form) we can get:

𝑬1 𝑬2 … 𝑬𝑘 𝑨 = 𝑰 ⟺ 𝑨−1 = ⏟
⏟ (𝑬𝑘 )−1 … (𝑬2 )−1 (𝑬1 )−1 𝑰
Elementary Operations Elementary Operations

[𝑨 𝑰] ⟶ [𝑰 𝑨−1 ]

clear all, clc, A1=10*rand(10,10) + 10*eye(10,10); [m,n] = size(A1);


A = [A1, eye([m, n])]; % augmented matrix
for k = 1:m
A(k,k:end) = A(k,k:end)/A(k,k);
A([1:k-1,k+1:end],k:end) = ...
A([1:k-1,k+1:end],k:end) - A([1:k-1,k+1:end],k)*A(k,k:end);
end
IA1 = A(:,n+1:end); Identity=IA1*A1

Example of application: (𝐂𝐥𝐚𝐬𝐬𝐢𝐜𝐚𝐥 𝐂𝐫𝐲𝐩𝐭𝐨) Since humans invented the written


language, they have tried to share information secretly. This is basically, the objective of
Cryptography, the study of the techniques to protect sensitive communications by means
of data encryption and its posterior decryption. Encryption is the transformation of data
into some unreadable form, so, even those who can see the encrypted data cannot
understand the hidden information. Decryption is the reverse of encryption; it is the
transformation of encrypted data back into some intelligible form.

The idea of cryptography is an application of a mapping from some space to itself. The
direct path of application is called Encoding and the revere one is called Decoding.

Encoding = applying the mapping. While the Decoding = applying the inverse mapping.
How does Hill's cipher method work? Suppose we have an invertible matrix 𝑨 (the
encoding matrix) and a text we want to encrypt.

Transform the text to a sequence of numbers by giving each character a unique


numerical value, then split the numbers to form a matrix by grouping the numbers into
columns according to the order of the matrix 𝑨 (the amount of elements in each column
must be equal to the order of the matrix).

Let's call this matrix 𝑩 (the plain matrix). Multiply 𝑨 by the matrix 𝑩: 𝑪 = 𝑨 • 𝑩. The
matrix 𝑪 is the cipher matrix. To decrypt the message, just multiply Inv(𝑨) • 𝑪, where
Inv(𝑨) is the inverse matrix of 𝑨.

The original plaintext can be found again by taking the resulting matrix and splitting it
back up into its separate vectors, making them a sequence, and then converting the
numbers back into their letter forms. In many articles, authors use only the 26 letters of
English alphabet, sometimes 29 when including space, question mark and period. In
these cases a simple mapping of the letters and additional symbols to the first integer
numbers, and the use of modular arithmetic, allow this method to obtain an encrypted
message composed only by the same integer numbers, which can be mapped again to
their corresponding letters and symbols.

Let's see an example: the message be Prepare To Negotiate and the encoding matrix be
−3 −3 4
𝑨=[ 0 1 1]
4 3 4
We assign a number for each letter of the alphabet. For simplicity, let us associate each
letter with its position in the alphabet: 𝐴 is 1, 𝐵 is 2, and so on. Also, we assign the
number 27 (remember we have only 26 letters in the alphabet) to a space between two
words. Thus the message becomes:

P R E P A R E ⋆ T O ⋆ N E G O T I A T E
16 18 5 16 1 18 5 27 20 15 27 14 5 7 15 20 9 1 20 5

Since we are using a 3 by 3 matrix, we break the enumerated message above into a
sequence of 3 by 1 vectors:

16 16 5 15 5 20 20 16 16 5 15 5 20 20
[18] , [ 1 ] , [27] , [27] , [ 7 ] , [ 9 ] , [ 5 ] ⟹ 𝑩 = [18 1 27 27 7 9 5]
5 18 20 14 15 1 27 5 18 20 14 15 1 27
Note that it was necessary to add a space at the end of the message to complete the last
vector. We now encode the message by multiplying each of the above new matrix 𝑩 by
the encoding matrix.
−3 −3 4 16 16 5 15 5 20 20
𝑪 = 𝑨𝑩 = [ 0 1 1] [18 1 27 27 7 9 5]
4 3 4 5 18 20 14 15 1 27
−122 −123 −176 −182 −96 −91 −183
= [ 23 19 47 41 22 10 32 ]
138 139 181 197 101 111 203
The columns of this matrix give the encoded message. The message is transmitted in the
following linear form
−122 23 138 − 123 19 139 − 176 47 18 1
−182 41 197 − 96 22 101 − 91 10 111
−183 32 203
To decode the message, the receiver writes this string as a sequence of 3 by 1 column
matrices and repeats the technique using the inverse of the encoding matrix. The inverse
of this encoding matrix, the decoding matrix, is:

1 0 1
−1
𝑨 =[ 4 4 3]
−4 −3 −3

Thus, to decode the message, perform the matrix multiplication 𝑩 = Inv(𝑨)𝑪 and get the
matrix. The columns of this matrix, written in linear form, give the original message:

16 16 5 15 5 20 20
[18 1 27 27 7 9 5]
5 18 20 14 15 1 27
The columns of this matrix, written in linear form, give the original message:

16 18 5 16 1 18 5 27 20 15 27 14 5 7 15 20 9 1 20 5
P R E P A R E ⋆ T O ⋆ N E G O T I A T E

We will now transform the system into one that is


even easier to solve than a triangular system, namely a diagonal system. The method is
very similar to Gaussian Elimination. It is just a bit more expensive.

clear all, clc, A1=10*rand(10,10)+10*eye(10,10);


n=size(A1,1); A=[A,b];

for k=1:n
A(k,k:n+1) = A(k,k:n+1)/A(k,k);
for i=1:n
if (i~=k)
piv = A(i,k) ;
A(i,k:n+1)=A(i,k:n+1)-piv*A(k,k:n+1);
end
end
end
x = A(:,n+1);

The number of operations required to solve a linear system with 𝑛 unknowns by Gauss-
Jordan elimination is 𝐶(𝑛) = 𝑛3 + 𝒪(𝑛) flops.

Note: remember that Gauss-Jordan costs 50% more than Gauss.


As we have just seen, triangular systems are "easy" to
solve. The idea behind Gaussian elimination is to convert a given system 𝑨𝐱 = 𝒃 to an
equivalent triangular system. The conversion is achieved by taking appropriate linear
combinations of the equations. In Gaussian elimination we have seen (𝑬𝓵 … 𝑬2 𝑬1 )𝑨 = 𝑼 is
upper triangular, it is easy to check that 𝑬𝑘 −1 is an elementary lower triangular matrix,
and therefore 𝑳 = (𝑬𝓵 … 𝑬2 𝑬1 )−1 is a unit lower triangular matrix because each 𝑬𝑘 −1 is
unit lower triangular (𝑬𝓵 … 𝑬2 𝑬1 )𝑨 = 𝑼 ⟺ 𝑨 = 𝑳𝑼.

To take a close look at the 𝑳𝑼 decomposition, we consider a 3 × 3 nonsingular matrix:

𝑎11 𝑎12 𝑎13 1 0 0 𝑢11 𝑢12 𝑢13


𝑎
𝑨 = ( 21 𝑎22 𝑎23 ) = 𝑳𝑼 = (ℓ21 1 0) ( 0 𝑢22 𝑢23 ) with
𝑎31 𝑎32 𝑎33 ℓ31 ℓ32 1 0 0 𝑢33

1 0 0 𝑢11 𝑢12 𝑢13 𝑢11 𝑢12 𝑢13


𝑳𝑼 = (ℓ21 1 0) ( 0 𝑢22 𝑢23 ) = (ℓ21 𝑢11 ℓ21 𝑢12 + 𝑢22 ℓ21 𝑢13 + 𝑢23 )
ℓ31 ℓ32 1 0 0 𝑢33 ℓ31 𝑢11 ℓ31 𝑢12 + ℓ32 𝑢22 ℓ31 𝑢13 + ℓ32 𝑢23 + 𝑢33

From this writing we deduce that the initial values defined as: 𝑢1𝑖 = 𝑎1𝑖 𝑖 = 1,2, … , 𝑛 and

1 𝑗−1
ℓ𝑖𝑗 = (𝑎𝑖𝑗 − ∑ ℓ𝑖𝑘 𝑢𝑘𝑗 ) , 𝑗<𝑖
𝑢𝑗𝑗 𝑘=1
𝑖−1
𝑢𝑖𝑗 = (𝑎𝑖𝑗 − ∑ ℓ𝑖𝑘 𝑢𝑘𝑗 ) , 𝑗≥𝑖
{ 𝑘=1

It is usual practice to store the multipliers in the lower triangular portion of the
coefficient matrix, replacing the coefficients as they are eliminated (ℓ𝑖𝑗 replacing 𝑎𝑖𝑗 ). The
diagonal elements of 𝑳 do not have to be stored, since it is understood that each of them
is unity. The final form of the coefficient matrix would thus be the following mixture of 𝑳
and 𝑼:
𝑢11 𝑢12 𝑢13
𝑬1 𝑬2 … 𝑬𝑘 𝑨 = [𝑳\𝑼] = (ℓ21 𝑢22 𝑢23 )
ℓ31 ℓ32 𝑢33

clear all, clc, A=10*rand(4,4)+10*eye(4,4); n=size(A,1);


A1=A; % storing A before the decomposition process go on
L = eye(n); % Start L off as identity
for k = 1:n
L(k+1:n,k) = A(k+1:n,k)/A(k,k);
for l=k+1:n
A(l,:) = A(l,:) - L(l,k)*A(k,:);
end
end
U = A, Zero=A1-L*U

Alternative derivation: In LU Decomposition method we try to convert 𝑨 matrix to


echelon form by using Gauss elimination method. The code starts from the first row,
tries to find the factor by which we need to multiply the current row and subtract it from
the rows below it, to make the elements in the Lower Triangular Matrix as zeros.
clear all, clc, A=10*rand(4,4) + 10*eye(4,4); n = size(A,1);
A1=A; % storing A before the decomposition process go on
for j = 1:n
for i =j:n-1
t = A(i+1,j)/A(j,j); A(i+1,:) = A(i+1,:)-t*A(j,:); F(i+1,j) = t;
end
end
U= A
L=F; L(:,n)=zeros(n,1);
for i = 1:n
L(i,i)=1;
end
L, Zero=A1-L*U

Example: Write a MATLAB code which carries out the solution phase (forward and back
substitutions). It is assumed that the original coefficient matrix has been decomposed,
so that the input is 𝑨 = [𝑳\𝑼].

The forward and back substitution algorithm:


⦁ let 𝑷𝑨 = 𝑳𝑼 so that 𝑷𝑨𝐱 = 𝑳𝑼𝐱 = 𝑷𝒃 with 𝑷 = 𝑬1 𝑬2 … 𝑬𝑘
⦁ solve 𝑳𝐲 = 𝑷𝒃, store 𝐲
⦁ solve 𝑼𝐱 = 𝐲, store 𝐱

clear all, clc, A=10*rand(4,4) + 10*eye(4,4); n = size(A,1);


b=100*rand(4,1);
A1=A; b1=b; % storing A before the decomposition process go on
A = [A, eye([n, n])]; % augmented matrix
for j = 1:n-1
for i = j+1:n
A(i,j) = A(i,j)/ A(j,j) ;
A(i,j+1:n) = A(i,j+1:n)- A(i,j)* A(j,j+1:n);
end
end
for i = 1:n
for j= 1:n
if i==j
L(i,i)=1; U(i,i) = A(i,i);
elseif i>j
L(i,j)= A(i,j); U(i,j)=0;
else
L(i,j)= 0; U(i,j)= A(i,j);
end
end
end
L, U
Zero=A1-L*U
% Back substitution
for k = 2:n
b(k)= b(k) - L(k,1:k-1)*b(1:k-1);
end
% Forward substitution
for k = n:-1:1
b(k) = (b(k) - U(k,k+1:n)*b(k+1:n))/U(k,k);
end
x = b, Zero=A1*x-b1

Numerical Notes: For an 𝑨 ∈ ℝ𝑛×𝑛 dense matrix (with most entries nonzero) with 𝑛
moderately large.

⦁ Computing an 𝐿𝑈 factorization of 𝑨 takes about 2𝑛3 /3 flops (~ row reducing [𝑨 𝒃]), while
finding 𝑨−1 requires about 2𝑛3 flops.
⦁ Solving 𝑳𝐲 = 𝒃 and 𝑼𝐱 = 𝐲 requires about 2𝑛2 flops, because any 𝑛 × 𝑛 triangular
system can be solved in about 𝑛2 flops.
⦁ Multiplying 𝒃 by 𝑨−1 also requires about 2𝑛2 flops, but the result may not as accurate
as that obtained from 𝑳 and 𝑼 (due to round-off errors in computing 𝑨−1 & 𝑨−1 𝒃).
⦁ If 𝑨 is sparse (with mostly zero entries), then 𝑳 and 𝑼 may be sparse, too. On the other
hand, 𝑨−1 is likely to be dense. In this case, a solution of 𝑨𝐱 = 𝒃 with 𝐿𝑈 factorization is
much faster than using 𝑨−1 .

The Cholesky decomposition is one of the


most useful tools in applied linear algebra. It can be efficiently computed and is an
important tool for both theoretical and computational considerations. For an overview of
the role of matrix decompositions in computing, see Stewart, which lists the big six
matrix decompositions that form the foundations of matrix computations:

1. the Cholesky decomposition, 4. the spectral decomposition,


2. the pivoted LU decomposition, 5. the Schur decomposition, and
3. the QR algorithm, 6. the SVD.

The Cholesky decomposition is sometimes called the symmetric positive define system
factorization, and it deals with systems whose matrices are symmetric positive define. So
it is of great importance to introduce some aspects on those types of matrices.

Definition: 𝑨 ∈ ℝ𝑛×𝑛 is symmetric positive definite. if 𝑨 = 𝑨𝑇 and 𝐱 𝑇 𝑨𝐱 > 0, ∀𝐱 ≠ 0.

Proposition: Let 𝑨 ∈ ℝ𝑛×𝑛 be a real matrix.


1. 𝑨 is symmetric p.d. if and only if 𝑨 = 𝑨𝑇 and all its eigenvalues are positive.
2. If 𝑨 is symmetric p.d and 𝑯 is any principal submatrix of 𝑨 (𝑯 = 𝑨(𝑗: 𝑘, 𝑗: 𝑘) for some
𝑗 ≤ 𝑘), then 𝑯 is symmetric p.d.
3. If 𝑿 is nonsingular, then 𝑨 is symmetric p.d. if and only if 𝐗 𝑇 𝑨𝐗 is symmetric p.d.
4. If 𝑨 is symmetric p.d., then all 𝑎𝑖𝑖 > 0, and maxij max𝑖𝑗 |𝑎𝑖𝑗 | = max𝑖 𝑎𝑖𝑖 > 0.
5. 𝑨 is symmetric p.d. if and only if there is a unique lower triangular nonsingular
matrix 𝑳, with positive diagonal entries, such that 𝑨 = 𝑳𝑳𝑇 (called Cholesky
decomposition theorem see the proof at the beginning of the chapter).
The decomposition 𝑨 = 𝑳𝑳𝑇 is called the Cholesky factorization of 𝑨, and 𝑳 is called the
Cholesky factor of 𝑨. (For more detail see Gene H Golub 1996)

Cholesky algorithm: Remark The Cholesky factorization is


𝐟𝐨𝐫 𝑗 = 1 𝑡𝑜 𝑛 mainly used for the numerical solution of
𝑗−1 1/2 linear systems 𝑨𝐱 = 𝒃.
2
𝑙𝑗𝑗 = (𝑎𝑗𝑗 − ∑ 𝑙𝑗𝑘 ) ⦁ For symmetric linear systems, the
𝑘=1
Cholesky decomposition (or its 𝑳𝑫𝑳𝑇
𝐟𝐨𝐫 𝑖 = 𝑗 + 1 𝑡𝑜 𝑛 variant) is the method of choice, for
𝑗−1
superior efficiency and numerical stability.
𝑙𝑖𝑗 = (𝑎𝑖𝑗 − ∑ 𝑙𝑖𝑘 𝑙𝑗𝑘 ) /𝑙𝑗𝑗
𝑘=1 ⦁ Compared with the 𝐿𝑈-decomposition, it
𝐞𝐧𝐝 𝐟𝐨𝐫 is roughly twice as efficient (𝑂(𝑛3 = 3)
𝐞𝐧𝐝 𝐟𝐨𝐫 flops).

(Cholesky: Gaxpy Version) We second give an implementation of Cholesky that is rich


in the gaxpy operation (you can see Gene H Golub 1996).

𝐟𝐨𝐫 𝑗 = 1 𝑡𝑜 𝑛
𝐢𝐟 𝑗 > 1
𝑨(𝑗: 𝑛, 𝑗) = 𝑨(𝑗: 𝑛, 𝑗) − 𝑨(𝑗: 𝑛, 1: 𝑗 − 1)𝑨𝑇 (𝑗, 1: 𝑗 − 1)
𝐞𝐧𝐝
𝑨(𝑗 ∶ 𝑛, 𝑗) = 𝑨(𝑗 ∶ 𝑛, 𝑗) = √𝑨(𝑗, 𝑗)
𝐞𝐧𝐝

(Cholesky: Outer Product Version)


𝐟𝐨𝐫 𝑘 = 1 𝑡𝑜 𝑛
𝑨(𝑘, 𝑘) = √𝑨(𝑘, 𝑘)
𝑨(𝑘 + 1: 𝑛, 𝑘) = 𝑨(𝑘 + 1 ∶ 𝑛, 𝑘)/𝑨(𝑘, 𝑘)
𝐟𝐨𝐫 𝑗 = 𝑘 + 1 𝑡𝑜 𝑛
𝑨(𝑗 ∶ 𝑛, 𝑗) = 𝑨(𝑗: 𝑛, 𝑗) − 𝑨(𝑗 ∶ 𝑛, 𝑘)𝑨(𝑗, 𝑘)
𝐞𝐧𝐝
𝐞𝐧𝐝
Total cost of the Cholesky algorithms: 𝑂(𝑛3 = 3) flops.

Definition: 𝑨 ∈ ℝ𝑛×𝑛 is symmetric positive Semidefinite. if 𝑨 = 𝑨𝑇 and 𝐱 𝑇 𝑨𝐱 ≥ 0, ∀𝐱 ≠ 0.

Cholesky Algorithm for Symmetric Positive Semidefinite Matrices

𝐟𝐨𝐫 𝑘 = 1 𝑡𝑜 𝑛
𝐢𝐟 𝑨(𝑘, 𝑘) > 0
𝑨(𝑘, 𝑘) = √𝑨(𝑘, 𝑘)
𝑨(𝑘 + 1: 𝑛, 𝑘) = 𝑨(𝑘 + 1 ∶ 𝑛, 𝑘)/𝑨(𝑘, 𝑘)
𝐟𝐨𝐫 𝑗 = 𝑘 + 1 𝑡𝑜 𝑛
𝑨(𝑗 ∶ 𝑛, 𝑗) = 𝑨(𝑗: 𝑛, 𝑗) − 𝑨(𝑗 ∶ 𝑛, 𝑘)𝑨(𝑗, 𝑘)
𝐞𝐧𝐝
𝐞𝐧𝐝
𝐞𝐧𝐝
In mathematics (linear algebra), the Faddeev–LeVerrier
algorithm is a recursive method to calculate the coefficients of the characteristic
polynomial 𝑝(𝜆) = det(𝜆𝑰𝑛 − 𝑨) of a square matrix, 𝑨, named after Dmitry Konstantinovich
Faddeev and Urbain Le Verrier. Calculation of this polynomial yields the eigenvalues of 𝑨
as its roots; as a matrix polynomial in the matrix 𝑨 itself, it vanishes by the fundamental
Cayley–Hamilton theorem.

This method can be used to compute all the Eigenvalues, vectors, inverse, etc., of any
given matrix.
𝑩1 = 𝑨 and 𝑃1 = Trace(𝑩1 )
1
𝑩2 = 𝑨(𝑩1 − 𝑃1 𝑰) and 𝑃2 = Trace(𝑩2 )
2
1
𝑩3 = 𝑨(𝑩2 − 𝑃2 𝑰) and 𝑃3 = Trace(𝑩3 )
3

1
𝑩𝑛−1 = 𝑨(𝑩𝑛−2 − 𝑃𝑛−2 𝑰) and 𝑃𝑛−1 = Trace(𝑩𝑛−1 )
𝑛−1
1
𝑩𝑛 = 𝑨(𝑩𝑛−1 − 𝑃𝑛−1 𝑰) and 𝑃𝑛 = Trace(𝑩𝑛 )
𝑛

If 𝑨 is nonsingular then the inverse of 𝑨 can be determined by 𝑨−1 = (𝑩𝑛−1 − 𝑃𝑛−1 𝑰)/𝑃𝑛 .

clear all, clc, A=10*rand(4,4) + 10*eye(4,4); [m,n]=size(A);


b=100*rand(4,1); t=0; B=A; I=eye(n,n);

p=trace(B) % Initial value of trace to start algorithm

% Calculations of (Bi & pi) from B2 to Bn-1 & from p2 to pn-1


for k=2:n-1
B=A*(B-p*I); p=trace(B)/k;
end
Bn1=B; pn1=p;

Bn=A*(Bn1 - pn1*I); pn=trace(Bn)/n;

%invers of A & a solution x

disp('your Matrix invers is:')


IA=(Bn1-pn1*I)/pn
disp('your solution is:')
x= IA*b
disp('verification Zero=A*x-b:')
Zero=A*x-b

Remark: This method is very limited and it concern only to matrices of distinct real
eigenvalues. Also from the computational point of view it is very poor and much cost
method. Even for all the above constraints, it is a solution to very hard problem in some
particular situations.
--------------------------Iterative Methods to Solve Equations---------------------------

Iterative methods formally yield the solution x of a linear system after an infinite number
of steps. At each step they require the computation of the residual of the system. In the
case of a full matrix, their computational cost is therefore of the order of 𝑛2 operations for
2
each iteration, to be compared with an overall cost of the order of 3 𝑛3 operations needed
by direct methods. Iterative methods can therefore become competitive with direct
methods provided the number of iterations that are required to converge (within a
prescribed tolerance) is either independent of 𝑛 or scales sub-linearly with respect to 𝑛.
So far we have considered direct methods for solution of a system of linear equations.
For sparse matrices, it may not be possible to take advantage of sparsity while using
direct methods, since the process of elimination can make the zero elements nonzero,
unless the zero elements are in a certain well defined pattern. Hence, the number of
arithmetic operations as well as the storage requirement may be the same for sparse and
filled matrices. This requirement may be prohibitive for large matrices and in those
cases, it may be worthwhile to consider the iterative methods.

The convergence condition for iterative algorithms: In general we have assumed that
the matrix can be factorized into 𝑨 = 𝑯 + 𝑮 with 𝑯−1 exist, so the system 𝑨𝐱 = 𝒃 will be
equivalent to 𝐱 = 𝑯−1 (𝒃 − 𝑮𝐱). Let we define 𝑸 = −𝑯−1 𝑮 and 𝒓 = 𝑯−1 𝒃, the iterative
process can be define as
𝐱 𝑘+1 = 𝒓 + 𝑸𝐱 𝑘

The recursive evaluation of this formula with an initial starting 𝐱 0 will give
𝐱1 = 𝒓 + 𝑸𝐱 0
𝐱 2 = (𝑰 + 𝑸)𝒓 + 𝑸2 𝐱 0

𝑘−1
𝐱 𝑘 = (∑ 𝑸𝑖 ) 𝒓 + 𝑸𝑘 𝐱 0
𝑖=0

If there exist (𝑰 − 𝑸)−1 the we can write ∑𝑘−1 𝑖 −1 𝑘


𝑖=0 𝑸 = (𝑰 − 𝑸) (𝑰 − 𝑸 ) therefore

𝐱 𝑘 = (𝑰 − 𝑸)−1 (𝑰 − 𝑸𝑘 )𝒓 + 𝑸𝑘 𝐱 0

And as nontrivial solution we want 𝐱 𝑘 to converge independently on the initial state 𝐱 0 ,


this means that lim𝑘→∞ 𝑸𝑘 = 𝟎 or |𝜆𝑚𝑎𝑥 (𝑸)| < 1 hence

lim 𝐱 𝑘 = lim (𝑰 − 𝑸)−1 (𝑰 − 𝑸𝑘 )𝒓 + 𝑸𝑘 𝐱 0


𝑘→∞ 𝑘→∞
= lim (𝑰 − 𝑸)−1 (𝑰 − 𝑸𝑘 )𝒓 = lim (𝑰 − 𝑸)−1 𝒓
𝑘→∞ 𝑘→∞
= lim (𝑰 + 𝑯−1 𝑮)−1 𝑯−1 𝒃 = (𝑯 + 𝑮)−1 𝒃 = 𝑨−1 𝒃
𝑘→∞

The convergence condition is lim𝑘→∞ 𝑸𝑘 = 𝟎 or equivalently max1≤𝑖≤𝑛 {∑𝑛𝑗=1|𝑸(𝑖, 𝑗)|} < 1 or.

Iterative, or indirect methods, start with an initial guess of the solution x and then
repeatedly improve the solution until the change in x becomes negligible. Since the
required number of iterations can be very large, the indirect methods are, in general,
slower than their direct counterparts.
A serious drawback of iterative methods is that they do not always converge to the
solution. It can be shown that convergence is guaranteed only if the coefficient matrix is
diagonally dominant. The initial guess for x plays no role in determining whether
convergence takes place if the procedure converges for one starting vector, it would do so
for any starting vector. The initial guess affects only the number of iterations that are
required for convergence.

We can write the matrix 𝑨 in the form 𝑨 = 𝑫 + 𝑳 + 𝑼,


where 𝑫 is a diagonal matrix, and 𝑳 and 𝑼 are respectively, lower and upper triangular
matrices with zeros on the diagonal. Then the system of equations can be written as

𝑫𝐱 = −(𝑳 + 𝑼)𝐱 + 𝒃 or 𝐱 = −𝑫−1 (𝑳 + 𝑼)𝐱 + 𝑫−1 𝒃

Here we have assumed that all diagonal elements of A are nonzero. This last equation
can be used to define an iterative process, which generates the next approximation 𝐱 𝑘+1
using the previous one on the right-hand side 𝐱 𝑘+1 = −𝑫−1 (𝑳 + 𝑼)𝐱 𝑘 + 𝑫−1 𝒃.

In numerical linear algebra, the Jacobi method is an iterative algorithm for determining
the solutions of a strictly diagonally dominant system of linear equations. A sufficient
(but not necessary) condition for the method to converge is that the matrix 𝑨 is strictly or
irreducibly diagonally dominant. Strict row diagonal dominance means that for each row,
the absolute value of the diagonal term is greater than the sum of absolute values of
other terms. The Jacobi method sometimes converges even if these conditions are not
satisfied.

This iterative process is known as the Jacobi iteration or the method of simultaneous
displacements. The latter name follows from the fact that every element of the solution
vector is changed before any of the new elements are used in the iteration. Hence, both
𝐱 𝑘+1 and 𝐱 𝑘 need to be stored separately. The iterative procedure can be easily expressed
in the component form as

𝑛
1
𝑥𝑘+1 (𝑖) = 𝑏(𝑖) − ∑ 𝑎(𝑖, 𝑗)𝑥𝑘 (𝑗)
𝑎(𝑖, 𝑖)
𝑗=1
𝑗≠𝑖
( )
Algorithm1: The Matrix-based formula

clear all, clc, A =rand(10,10); b=20*rand(10,1);


A = 0.5*(A+A'); A = A + 10*eye(10); D= diag(A); M= diag(D);
n=max(size(A)); m=min(size(b)); x0=rand(n,m);
for k=1: 10
% x1= inv(M)*(b-(A-M)*x0); % Jacobi method
x1= x0 + inv(M)*(b-A*x0); % Jacobi method 'Alternative writing'
x0=x1;
end
x1, ZERO1=A*x1-b

‫ الكتابة المصفوفية لطريقة جكوب هي فقط للشرح وليست طريقة عملية إذ فيها المعكوس وهو أكبر شيء تم الهروب منه‬:‫مالحظة هامة‬
Algorithm2: The element-based formula

clear all, clc,


A =rand(10,10); b=100*rand(10,1);
A = 0.5*(A+A'); A = A + 10*eye(10);
n=max(size(A)); m=min(size(b)); x0=rand(n,1); tol=1e-5;
% the first iteration
for j = 1:n
x(j)=((b(j)-A(j,[1:j-1,j+1:n])*x0([1:j-1,j+1:n]))/A(j,j));
end
x1 = x';
% the next iterations
k = 1;
while norm(x1-x0,1)>tol
for j = 1:n
x_ny(j)=((b(j)-A(j,[1:j-1,j+1:n])*x1([1:j-1,j+1:n]))/A(j,j));
end
x0 = x1; x1 = x_ny';
k = k + 1;
end
k, x = x1,
ZERO1=A*x-b

In numerical linear algebra, the Gauss–Seidel method,


also known as the Liebmann method or the method of successive displacement, is an
iterative method used to solve a system of linear equations. It is named after the German
mathematicians Carl Friedrich Gauss and Philipp Ludwig von Seidel, and is similar to
the Jacobi method. Though it can be applied to any matrix with non-zero elements on
the diagonals, convergence is only guaranteed if the matrix is either strictly diagonally
dominant, or symmetric and positive definite.

We can write the matrix 𝑨 in the form 𝑨 = 𝑳 + 𝑼, where 𝑳 and 𝑼 are respectively, lower
and strictly upper triangular matrices. Then the system of equations can be written as

𝑳𝐱 = 𝒃 − 𝑼𝐱 or 𝐱 = 𝑳−1 𝒃 − 𝑳−1 𝑼𝐱

This last equation can be used to define an iterative process, which generates the next
approximation 𝐱 𝑘+1 using the previous one on the right-hand side

𝐱 𝑘+1 = 𝑳−1 (𝒃 − 𝑼𝐱 𝑘 )

The convergence properties of the Gauss–Seidel method are dependent on the matrix 𝑨.
Namely, the procedure is known to converge if either:

𝑨 is symmetric positive-definite, or
𝑨 is strictly or irreducibly diagonally dominant.

The Gauss–Seidel method sometimes converges even if these conditions are not satisfied.
Algorithm1: (Gauss–Seidel) The Matrix-based formula

clear all, clc,


A =rand(10,10); b=100*rand(10,1);
A = 0.5*(A+A');
A = A + 10*eye(10);

L = tril(A); U=A-L; % L1 = tril(A,-1) strictly lower


n=max(size(A)); m=min(size(b)); x0=rand(n,m);
for k=1: 10
x1= inv(L)*(b-(U)*x0); % Gauss–Seidel method
x0=x1;
end
x1
ZERO1=A*x1-b

‫ الكتابة المصفوفية لطريقة غوص هي فقط للشرح وليست طريقة عملية إذ فيها المعكوس وهو أكبر شيء تم الهروب منه‬:‫مالحظة هامة‬

However, by taking advantage of the triangular form of 𝑳, the elements of 𝐱 𝑘+1 can be
computed sequentially using forward substitution:

𝑖−1 𝑛
1
𝑥𝑘+1 (𝑖) = (𝑏(𝑖) − ∑ 𝑎(𝑖, 𝑗)𝑥𝑘+1 (𝑗) − ∑ 𝑎(𝑖, 𝑗)𝑥𝑘 (𝑗))
𝑎(𝑖, 𝑖)
𝑗=1 𝑗=𝑖+1

The procedure is generally continued until the changes made by an iteration are below
some tolerance, such as a sufficiently small residual.

The element-wise formula for the Gauss–Seidel method is extremely similar to that of the
Jacobi method. The computation of 𝑥𝑘+1 (𝑖) uses the elements of 𝐱 𝑘+1 that have already
been computed, and only the elements of 𝐱 𝑘 that have not been computed in the 𝑘 + 1
iteration. This means that, unlike the Jacobi method, only one storage vector is required
as elements can be overwritten as they are computed, which can be advantageous for
very large problems.

Algorithm2: (Gauss–Seidel) The element-based formula

clear all, clc, A =rand(10,10); b=100*rand(10,1);


A = 0.5*(A+A'); A = A + 10*eye(10);
n=max(size(A)); m=min(size(b)); x=rand(n,m); iters=20;

for i=1:iters
for j = 1:size(A,1)
x(j) = (1/A(j,j)) * (b(j) - A(j,:)*x + A(j,j)*x(j));
end
end
x
ZERO1=A*x-b
The successive over-relaxation method
(SOR) is derived from the Gauss-Seidel method by introducing an “extrapolation”
parameter ω, resulting in faster convergence. It was devised simultaneously by David M.
Young, Jr. and by Stanley P. Frankel in 1950 for the purpose of automatically solving
linear systems on digital computers. Over-relaxation methods had been used before the
work of Young and Frankel.

The component 𝑥𝑖 is computed as for Gauss-Seidel but then averaged with its previous
value.
𝑖−1 𝑛
𝜔
𝑥𝑘+1 (𝑖) = (1 − 𝜔)𝑥𝑘 (𝑖) + (𝑏(𝑖) − ∑ 𝑎(𝑖, 𝑗)𝑥𝑘+1 (𝑗) − ∑ 𝑎(𝑖, 𝑗)𝑥𝑘 (𝑗))
𝑎(𝑖, 𝑖)
𝑗=1 𝑗=𝑖+1

We first mention that one cannot choose the relaxation parameter arbitrarily if one
wants to obtain a convergent method. This general, very elegant result is due to Kahan
from his PhD thesis 1958. For convergence of SOR it is necessary to choose 0 < ω < 2.

clear all, clc,


A =rand(10,10); b=100*rand(10,1);
A = 0.5*(A+A'); A = A + 10*eye(10); w=2/3;

n=max(size(A)); m=min(size(b)); x=rand(n,m); iters=20;

for i=1:iters

for j = 1:size(A,1)
x(j) = (1-w)*x(j) + (w/A(j,j))*(b(j) - A(j,:)*x + A(j,j)*x(j));
end

end
x
ZERO1=A*x-b

Remark: The optimal choice of the relaxation parameter ω, is given by Young in his
thesis 1950.

Remark: Gauss-Seidel method 𝐱 𝑘+1 = 𝐟(𝐱𝑘+1 , 𝐱 𝑘 ) is very close to the simple iterative
method 𝐱 𝑘+1 = 𝐟(𝐱𝑘 ) and differs only in that: at any iteration we use just the calculated
components of this iteration. This method can be used for both linear and nonlinear
systems, and its convergence depends on the choice of the starting point and the
Jacobian of mapping, defined by the system.

Gauss − Seidel method 𝐱 𝑘+1 = 𝐟(𝐱 𝑘+1 , 𝐱 𝑘 ) Simple iterative method (Jacobi) 𝐱𝑘+1 = 𝐟(𝐱 𝑘 )
𝑥1 (𝑘 + 1) = f1 (𝑥1 (𝑘), 𝑥2 (𝑘), 𝑥3 (𝑘)) 𝑥1 (𝑘 + 1) = f1 (𝑥1 (𝑘), 𝑥2 (𝑘), 𝑥3 (𝑘))
𝑥2 (𝑘 + 1) = f2 (𝑥1 (𝑘 + 1), 𝑥2 (𝑘), 𝑥3 (𝑘)) 𝑥2 (𝑘 + 1) = f2 (𝑥1 (𝑘), 𝑥2 (𝑘), 𝑥3 (𝑘))
𝑥3 (𝑘 + 1) = f3 (𝑥1 (𝑘 + 1), 𝑥2 (𝑘 + 1), 𝑥3 (𝑘)) 𝑥3 (𝑘 + 1) = f3 (𝑥1 (𝑘), 𝑥2 (𝑘), 𝑥3 (𝑘))
Example: Solve the following set of nonlinear equations by the Gauss-Seidel method

1
𝑥𝑘+1 = (3 + 0.12𝑧𝑘 − 𝑒 𝑥𝑘 cos(𝑦𝑘 ))
27𝑥 + 𝑒 𝑥 cos(𝑦) − 0.12𝑧 = 3 27
1
{−0.2𝑥 2 + 37𝑦 + 3𝑥𝑧 = 6 ⟺ 𝑦𝑘+1 = (6 − 3𝑥𝑘+1 𝑧𝑘 + 0.2(𝑥𝑘+1 )2 )
37
𝑥 2 − 0.2𝑦 sin(𝑥) + 29𝑧 = −4 1
(−4 + 0.2𝑦𝑘+1 sin(𝑥𝑘+1 ) − (𝑥𝑘+1 )2 )
{ 𝑧𝑘+1 =
29
Start by 𝑥 = 𝑦 = 𝑧 = 1

clear all, clc, x0=1; y0=1; z0=1; s=1; k=1;

while s>0.01
x1=(1/27)*(3+0.12*z0-exp(x0)*cos(y0));
y1=(1/37)*(6-3*x1*z0+ 0.2*x1*x1);
z1=(1/27)*(-4+0.2*y1*sin(x1)-x1*x1);

v1= x1-x0; v2= y1-y0; v3= z1-z0; v=[v1; v2; v3]; s=norm(v);
x0=x1; y0=y1; z0=z1;
k=k+1;
end
k
x=x1, y=y1, z=z1,

is a mathematical technique that can be useful


for the optimization of both linear and non-linear systems. This technique is generally
used as an iterative algorithm, however, it can be used as a direct method, and it will
produce a numerical solution. Generally this method is used for very large systems
where it is not practical to solve with a direct method. This method was developed by
Magnus Hestenes and Eduard Stiefel. A sufficient and necessary condition for the
method to converge is that the matrix 𝑨 could be symmetric and positive definite.

The preconditioned conjugate gradient method is used nowadays as a standard method


in many software libraries, and CG is the starting point of Krylov methods, which are
listed among the top ten algorithms of the last century.

Consider the problem of finding the vector 𝐱 that minimizes the scalar function (which is
called the energy of system)

1 1
f(𝐱) = 𝐱 𝑇 𝑨𝐱 − 𝒃𝑇 𝐱 with ∇𝑓(𝐱) = (𝑨𝑇 + 𝑨)𝐱 − 𝒃 = 𝑨𝐱 − 𝒃
2 2
where the matrix 𝑨 is symmetric and positive definite. Because f(𝐱) is minimized when
its gradient ∇f = 𝑨𝐱 − 𝒃 is zero, we see that minimization is equivalent to solving 𝑨𝐱 = 𝒃.

Gradient methods accomplish the minimization by iteration, starting with an initial


vector 𝐱 0 . Each iterative cycle 𝑘 computes a refined solution

𝐱 𝑘+1 = 𝐱 𝑘 + 𝛼𝑘 𝐬𝑘
The step length 𝛼𝑘 is chosen so that 𝐱 𝑘+1 minimizes f(𝐱 𝑘+1 ) in the search direction 𝐬𝑘 . That
is, 𝐱 𝑘+1 must satisfy 𝑨𝐱 = 𝒃 so
𝑨(𝐱 𝑘 + 𝛼𝑘 𝐬𝑘 ) = 𝒃

Introducing the residual 𝐫𝑘 = 𝒃 − 𝑨𝐱 𝑘 this last equation becomes 𝛼𝑘 𝑨𝐬𝑘 = 𝐫𝑘 .


Premultiplying both sides by 𝐬𝑘 𝑇 and solving for 𝛼𝑘 , we obtain
𝑇
𝐬𝑘 𝑇 𝐫𝑘 (∇𝑓(𝐱 𝑘 )) ∇𝑓(𝐱 𝑘 )
𝛼𝑘 = 𝑇 =
𝐬𝑘 𝑨𝐬𝑘 (∇𝑓(𝐱 ))𝑇 𝑨∇𝑓(𝐱 )
𝑘 𝑘

We are still left with the problem of determining the search direction 𝐬𝑘 . Intuition tells us
to choose 𝐬𝑘 = −∇𝑓 = 𝐫𝑘 , since this is the direction of the largest negative change in 𝑓(𝐱).
The resulting procedure is known as the method of steepest descent or a gradient
method. Summarizing, the gradient method can be described as follows: given 𝐱 0 ∈ ℝ𝑛 ,
for 𝑘 = 0, 1, … until convergence, compute

▪ 𝐫𝑘 = 𝒃 − 𝑨𝐱 𝑘
𝐬𝑘 𝑇 𝐫𝑘 𝐫𝑘 𝑇 𝐫𝑘
▪ 𝛼𝑘 = 𝑇 =
𝐬𝑘 𝑨𝐬𝑘 𝐫𝑘 𝑇 𝑨𝐫𝑘
▪ 𝐱 𝑘+1 = 𝐱 𝑘 + 𝛼𝑘 𝐫𝑘

It is not a popular algorithm due to slow convergence. In order to increase the speed of
convergence some peoples proposed a correction of the search direction.

Now the improved version of the steepest descent algorithm is based on the so called A-
conjugacy. A-conjugacy means that a set of nonzero vectors {𝐬0 , 𝐬1 , … , 𝐬𝑛−1 } are conjugate
with respect to the symmetric positive definite matrix A. That is 𝐬𝑖𝑇 𝑨𝐬𝑗 = 0 ∀ 𝑖 ≠ 𝑗. A set
of 𝑛 such vectors are linearly independent and hence span the whole space ℝ𝑛 . The
reason why such A-conjugate sets are important is that we can minimize our quadratic
function f(𝐱) in 𝑛 steps by successively minimizing it along each of the directions. Since
the set of A-conjugate vectors acts as a basis for ℝ𝑛 .

There are several ways to choose such a set. The eigenvectors of 𝑨 form a A-conjugate
set, butfinding the eigenvectors is a task requiring a lot of computations, so we betterfind
another strategy. A second alternative is to modify the usual Gram-Schmidt
orthogonalization process. This is also not optimal, as it requires storing all the
directions. In order to search optimally a complete set of linearly independent vectors we
use the more efficient algorithm " conjugate gradient method " which is based on the
following recursive formula 𝐬𝑘+1 = 𝐫𝑘+1 + 𝛽𝑘 𝐬𝑘 , and we try to find the constant 𝛽𝑘 for
which any two successive search directions are conjugate (noninterfering) to each other,
𝑇
meaning 𝐬𝑘+1 𝑨𝐬𝑘 = 0.
𝑇 𝑇
Substituting 𝐬𝑘+1 into 𝐬𝑘+1 𝑨𝐬𝑘 = 0 we get (𝐫𝑘+1 + 𝛽𝑘 𝐬𝑘𝑇 )𝑨𝐬𝑘 = 0, which yields
𝑇
𝐫𝑘+1 𝑨𝐬𝑘
𝛽𝑘 = − 𝑇
𝐬𝑘 𝑨𝐬𝑘
Here is the outline of the conjugate gradient algorithm:

▪ Choose 𝐱 0 (any vector will do, but one close to solution results in fewer iterations)
▪ 𝐫0 ← 𝒃 − 𝑨𝐱 0 & 𝐬0 ← 𝐫0
do with 𝑘 = 0, 1, 2, …
𝐬𝑘 𝑇 𝐫𝑘
▪ 𝛼𝑘 ← 𝑇
𝐬𝑘 𝑨𝐬𝑘
▪ 𝐱 𝑘+1 ← 𝐱 𝑘 + 𝛼𝑘 𝐫𝑘
▪ 𝐫𝑘+1 ← 𝒃 − 𝑨𝐱 𝑘+1 if ‖𝐫𝑘+1 ‖ ≤ 𝜀 exit loop (convergence criterion; 𝜀 is the error tolerance)
𝑇
𝐫𝑘+1 𝑨𝐬𝑘
▪ 𝛽𝑘 ← − 𝑇
𝐬𝑘 𝑨𝐬𝑘
▪ 𝐬𝑘+1 = 𝐫𝑘+1 + 𝛽𝑘 𝐬𝑘
end do

clear all, clc, A =100*rand(10,10); A = 0.5*(A+A'); A = A + 5*eye(10);


b=100*rand(10,1); n=max(size(A)); m=min(size(b));
x=10*rand(n,m); r=b-A*x; s=r; % A=[4 -1 1;-1 4 -2;1 -2 4]; b=[12;-1;5];
for k=1: length(b)
alpha = (s'*r)/(s'*A*s);
x = x + alpha*s; r = b - A*x;
if sqrt(dot(r,r))< 1.0e-15
return
else
beta=-(r'*A*s)/(s'*A*s); s = r + beta*s;
end
end
x, ZERO1=A*x-b

Many current problems of


research are reduced to large systems of linear differential equations which, in turn,
must be solved. Among the most common industrial problems which give rise to the
need for solving such a system are the vibration problem in which the corresponding
matrix is real and symmetric, but generally we across problems in which the matrix is
usually nonsymmetric and complex. Among the iterative methods for solving large linear
systems 𝑨𝐱 = 𝒃 with a sparse or possibly structured non-symmetric matrix 𝑨, those that
are based on the Lanczos process feature short recurrences for the generation of the
Krylov space. This means low cost and low memory requirement. A Lanczos, conjugate
gradient-like, method for the iterative solution of the equation is one of the most popular
and well known.

■ How KRYLOV Subspaces Come Into Play! In 1931 A. N. KRYLOV published a paper
entitled "On the numerical solution of the equation by which the frequency of small
oscillations is determined in technical problems", of course in this work KRYLOV was not
thinking in terms of projection processes, and he was not interested in solving a linear
system. Motivated by an application in the analysis of oscillations of mechanical
systems, he constructed a method for computing the minimal polynomial of a matrix.
Algebraically, his method is based on the following important fact.
Given 𝑨 ∈ 𝔽𝑛×𝑛 and a nonzero vector 𝐯 ∈ 𝔽𝑛 , consider the Krylov sequence {𝐯, 𝑨𝐯, 𝑨2 𝐯, … }
generated by 𝑨 and 𝐯. There then exists a uniquely defined integer 𝑑 = 𝑑(𝑨, 𝐯), so that
the vectors {𝐯, 𝑨𝐯, 𝑨2 𝐯, … 𝑨𝑑−1 𝐯} are linearly independent, and the vectors
2 𝑑
{𝐯, 𝑨𝐯, 𝑨 𝐯, … 𝑨 𝐯} are linearly dependent. By construction, there exist scalars
𝛾0 , . . . , 𝛾𝑑−1 with 𝑨𝑑 𝐯 = ∑𝑑−1 𝑘
𝑘=0 𝛾𝑖 𝑨 𝐯. In polynomial notation we can write:
𝑨𝑑 𝐯 = ∑𝑑−1 𝑘
𝑘=0 𝛾𝑖 𝑨 𝐯 ⟺ 𝑝(𝑨)𝐯 = 𝟎 where 𝑝(𝜆) = 𝜆𝑑 − ∑𝑑−1 𝑘
𝑘=0 𝛾𝑖 𝜆 . The polynomial 𝑝(𝜆) is
called the minimal polynomial of 𝐯 with respect to 𝑨. Krylov's observation can be
rephrased in the following way. For each matrix 𝑨 and vector 𝐯, the Krylov subspaces is
defined by 𝒦𝑑 (𝑨, 𝐯) = span{𝐯, 𝑨𝐯, 𝑨2 𝐯, … 𝑨𝑑−1 𝐯}. Most important iterative techniques for
solving large scale linear systems 𝑨𝐱 = 𝒃 are based on the projection processes onto the
Krylov subspace. In short this technique approximate 𝑨−1 𝒃 by 𝑝(𝑨)𝒃 where 𝑝 is a good
polynomial. A general projection method for solving linear system 𝑨𝐱 = 𝒃 is a method
which seek an approximate solution from an affine subspace 𝐱 𝑚 = 𝐱 0 + 𝒦𝑚 of dimension
𝑚 by imposing the Petrov–Galerkin condition 𝒓 = (𝒃 − 𝑨𝐱 𝑚 ) ⊥ ℒ𝑚 where ℒ𝑚 is
another 𝑚 dimensional subspace. Here, 𝐱 0 represents an arbitrary initial guess to the
solution. Krylov subspace method is a method for which 𝒦𝑚 is a Krylov subspace.

𝒦𝑚 (𝑨, 𝒓0 ) = span{𝒓0 , 𝑨𝒓0 , 𝑨2 𝒓0 , … 𝑨𝑚−1 𝒓0 }

With 𝒓0 = (𝒃 − 𝑨𝐱 0 ). The different versions of Krylov subspace methods arise from


different choices of the subspace ℒ𝑚 and from the ways in which the system is
preconditioned.

Example: Two broad choices for ℒ𝑚 give rise to some of the best-known techniques
⦁ ℒ𝑚 = 𝒦𝑚 (𝑨, 𝒓0 ) Full Orthogonalization Method (FOM)
⦁ ℒ𝑚 = 𝑨𝒦𝑚 (𝑨, 𝒓0 ) Generalized minimal residual method (GMRES)

An important fact: Though the non-symmetric linear system is often the motivation for
applying the Lanczos algorithm, the operation the algorithm primarily performs is
tridiagonalization of a matrix, but what about the tridiagonalization process?

For every real square matrix 𝑨 ∈ ℝ𝑛×𝑛 there exists a nonsingular real matrix 𝑿, for which
𝑻 = 𝑿−1 𝑨𝑿 ∈ ℝ𝑛×𝑛 is a tridiagonal matrix, and under certain conditions 𝑿 is uniquely
determined. Simple proofs for the existence and uniqueness of this transformation are
presented in a paper Angelika Bunse-Gerstner 1982.

However, the use of Krylov subspace method does not guarantee the non-singularity of
the matrix 𝑿, since "𝑚 ≤ 𝑛" the dimension of Krylov subspace is less or equal to the
dimension of the matrix 𝑨. Therefore, we can write 𝑻 = 𝐖 𝑇 𝑨𝐕 ∈ ℝ𝑚×𝑚 with 𝐖, 𝐕 ∈ ℝ𝑛×𝑚 .

Theorem If the minimal polynomial of the nonsingular matrix 𝑨 has degree 𝑚, then the
solution to 𝑨𝐱 = 𝒃 lies in the space 𝐾𝑚 (𝑨, 𝒃) = span{𝒃, 𝑨𝒃, … , 𝑨𝑚 𝒃}.

(Existence of a Krylov Solution)


A square linear system 𝑨𝐱 = 𝒃 has a Krylov solution if and only if 𝒃 ∈ ℛ(𝑨𝑘 ), where 𝑘 is
the index of the zero eigenvalue of 𝑨.

Proof: Carl D. Meyer and Ilse C. F. Ipsen 1997-1998


■ The Nonsymmetric Lanczos Algorithm: For a nonsymmetric matrix 𝑨, the Lanczos
algorithm constructs, starting with two vectors 𝐯1 and 𝐰1, a pair of biorthogonal bases
{𝐯1 , 𝐯2 , … , 𝐯𝑚 } & {𝐰1 , 𝐰2 , … , 𝐰𝑚 } for the two Krylov subspaces, 𝐾𝑚 (𝑨, 𝐯1 ) & ℒ𝑚 (𝑨𝑇 , 𝐰1 ) where

𝐾𝑚 (𝑨, 𝐯1 ) = span{𝐯1 , 𝑨𝐯1 , . . . , 𝑨𝑚−1 𝐯1 } and ℒ𝑚 (𝑨𝑇 , 𝐰1 ) = span{𝐰1 , 𝑨𝑇 𝐰1 , . . . , (𝑨𝑇 )𝑚−1 𝐰1 }.

Two sets {𝐯𝑖 } & {𝐰𝑖 } of vectors satisfying (𝐰𝑖 𝑇 𝐯𝑗 = 0 ∀ 𝑖 ≠ 𝑗) are said to be biorthogonal
and can be obtained through the following algorithm: (Biswa Nath Datta 2003)

Algorithm
Initialization: Scale the vectors 𝐯 and 𝐰 to get the vectors 𝐯1 and 𝐰1 such that 𝐰1 𝑇 𝐯1 = 1.
Set 𝛽1 = 0, 𝛾1 = 0, 𝐰0 = 𝐯0 = 0.
begin: For 𝑘 = 1, 2, . . . , 𝑚 do
𝛼𝑘 = 𝐰𝑘 𝑇 𝑨𝐯𝑘
𝝁𝑘+1 = 𝑨𝐯𝑘 − 𝛼𝑘 𝐯𝑘 − 𝛽𝑘 𝐯𝑘−1
𝜼𝑘+1 = 𝑨𝑇 𝐰𝑘 − 𝛼𝑘 𝐰𝑘 − 𝛾𝑘 𝐰𝑘−1
𝜼𝑇𝑘+1 𝝁𝑘+1
𝛾𝑘+1 = √|𝜼𝑇𝑘+1 𝝁𝑘+1 | ; 𝛽𝑘+1 =
𝛾𝑘+1
𝜼𝑇𝑘+1 𝝁𝑇𝑘+1
𝐰𝑘+1 = ; 𝐯𝑘+1 =
𝛽𝑘+1 𝛾𝑘+1
end

■ Proof of the Nonsymmetric Lanczos Algorithm: Suppose 𝑨 ∈ ℝ𝑛×𝑛 and that a


biorthogonal rectangular matrices 𝐖𝑘 , 𝐕𝑘 ∈ ℝ𝑛×𝑘 exists so

𝛼1 𝛽2 0
𝑨𝐕𝑘 = 𝐕𝑘 𝑻𝑘 𝛾2 ⋱
{ 𝑇 where 𝑻𝑘 = ( 𝛼2 𝑇
𝐖𝑘 𝑨 = 𝑻𝑘 𝐖𝑘𝑇 ⋱ 𝛽𝑘 ) and 𝐖𝑘 𝐕𝑘 = 𝑰𝑘
⋱ 𝛾𝑘 𝛼𝑘
0

With the column partitioning 𝐕𝑘 = [𝐯1 , 𝐯2 , … , 𝐯𝑘 ] and 𝐖𝑘 = [𝐰1 , 𝐰2 , … , 𝐰𝑘 ], we find upon


comparing columns in 𝑨𝐕𝑘 = 𝐕𝑘 𝑻𝑘 and 𝑨𝑇 𝐖𝑘 = 𝐖𝑘 𝑻𝑇𝑘 that

𝑨𝐯𝑘 = 𝛽𝑘 𝐯𝑘−1 + 𝛼𝑘 𝐯𝑘 + 𝛾𝑘+1 𝐯𝑘+1 𝛽1 𝐯0 = 0


with 𝑘 = 1,2, … 𝑚
𝑨𝑇 𝐰𝑘 = 𝛾𝑘 𝐰𝑘−1 + 𝛼𝑘 𝐰𝑘 + 𝛽𝑘+1 𝐰𝑘+1 𝛾1 𝐰0 = 0

These equations together with the biorthogonality condition 𝐖𝑘 𝑇 𝐕𝑘 = 𝑰

𝐰𝑖 𝑇 𝐯𝑗 = 0 𝑖 ≠ 𝑗
{ 𝑇
𝐰𝑖 𝐯𝑗 = 1 𝑖 = 𝑗

imply 𝐰𝑘 𝑇 𝑨𝐯𝑘 = 𝛽𝑘 𝐰𝑘 𝑇 𝐯𝑘−1 + 𝛼𝑘 𝐰𝑘 𝑇 𝐯𝑘 + 𝛾𝑘+1 𝐰𝑘 𝑇 𝐯𝑘+1 ⟹ 𝛼𝑘 = 𝐰𝑘 𝑇 𝑨𝐯𝑘 and

𝛾𝑘+1 𝐯𝑘+1 = 𝝁𝑘+1 = 𝑨𝐯𝑘 − 𝛼𝑘 𝐯𝑘 − 𝛽𝑘 𝐯𝑘−1


𝛽𝑘+1 𝐰𝑘+1 = 𝜼𝑘+1 = 𝑨𝑇 𝐰𝑘 − 𝛼𝑘 𝐰𝑘 − 𝛾𝑘 𝐰𝑘−1

There is some flexibility in choosing the scale factors 𝛽𝑘 and 𝛾𝑘 . Note that

𝜼𝑘+1 𝑇 𝝁𝑘+1
1 = 𝐰𝑘+1 𝑇 𝐯𝑘+1 =
𝛽𝑘+1 𝛾𝑘+1
It follows that once 𝛾𝑘 is specified 𝛽𝑘 is given by 𝛽𝑘 = (𝜼𝑇𝑘 𝝁𝑘 )/𝛾𝑘 with the "canonical" choice
𝛾𝑘 = √|𝜼𝑇𝑘 𝝁𝑘 | we obtain the above algorithm.
𝛼1 𝛽2 0
𝛾2 ⋱
If 𝑻𝑘 = ( 𝛼2
⋱ 𝛽𝑘 ) then the situation at the bottom of the loop is summarized by
⋱ 𝛾𝑘 𝛼𝑘
0
the equations
𝑨[𝐯1 , … , 𝐯𝑘 ] = [𝐯1 , … , 𝐯𝑘 ]𝑻𝑘 + [0 0 ⋯ 𝛾𝑘+1 𝐯𝑘+1 ] = 𝐕𝑘 𝑻𝑘 + 𝛾𝑘+1 𝐯𝑘+1 𝐞𝑘 𝑇
{
𝑨𝑇 [𝐰1 , … , 𝐰𝑘 ] = [𝐰1 , … , 𝐰𝑘 ]𝑻𝑇𝑘 + [0 0 ⋯ 𝛽𝑘+1 𝐰𝑘+1 ] = 𝐖𝑘 𝑻𝑇𝑘 + 𝛽𝑘+1 𝐰𝑘+1 𝐞𝑘 𝑇

If 𝛾𝑘+1 = 0 , then the iteration terminates and span{𝐯1 , 𝐯2 , … , 𝐯𝑘 } is an invariant subspace


for 𝑨. If 𝛽𝑘+1 = 0, then the iteration also terminates and span{𝐰1 , 𝐰2 , … , 𝐰𝑘 } is an
invariant subspace for 𝑨𝑇 . However, if neither of these conditions are true and 𝜼𝑇𝑘 𝝁𝑘 = 0,
then the tridiagonalization process ends without any invariant subspace information.
This is called serious breakdown. See Wilkinson (965, p.389) for an early discussion of
the matter.

■ Derivation of the solution: Assuming that the real general matrix 𝑨 ∈ ℝ𝑛×𝑛 is
transformed to its equivalent form 𝑻 = 𝑾𝑇 𝑨𝑽 such that 𝑻 ∈ ℝ𝑚×𝑚 is presented in Krylov
space of dimension 𝑚, 𝑽 = [𝐯1 , 𝐯2 , … 𝐯𝑚 ] ∈ ℝ𝑛×𝑚 is a matrix whose column-vectors form a
basis of 𝒦𝑚 and 𝑾 = [𝐰1 , 𝐰2 , … 𝐰𝑚 ] ∈ ℝ𝑛×𝑚 is a matrix whose column-vectors form a basis
of ℒ𝑚 with 𝑽𝑾𝑇 = 𝑰.

Let we start the generation of Krylov space by the following initial vector

𝒃 − 𝑨𝐱 0 𝒓0
𝐯1 = = = 𝑽𝒆1 with 𝒆1 = [1,0, … ,0]𝑇 ∈ ℝ𝑘 ⟹ 𝒓0 = ‖𝒓0 ‖2 𝑽𝒆1
‖𝒃 − 𝑨𝐱 0 ‖2 ‖𝒓0 ‖2

The exact solution is given by 𝐱 = 𝑨−1 𝒃 = (𝑨−1 𝒃 − 𝑨−1 𝒓0 ) + 𝑨−1 𝒓0 = 𝐱 0 + 𝑨−1 𝒓0 . The idea is
to approximate 𝑨−1 𝒓0 by 𝑝(𝑨)𝒓0 where 𝑝 is a good polynomial.

𝐱 = 𝐱 0 + 𝑨−1 𝒓0
= 𝐱 0 + 𝑨−1 ‖𝒓0 ‖2 𝑽𝒆1
= 𝐱 0 + ‖𝒓0 ‖2 𝑽𝑾𝑇 𝑨−1 𝑽𝒆1
= 𝐱 0 + ‖𝒓0 ‖2 𝑽𝑻−1 𝒆1

Let we define a new vector 𝐲 such that 𝐲 is a solution of the system 𝑻𝐲 = ‖𝒓0 ‖2 𝒆1

𝐱 = (𝐱 0 + 𝑽𝐲) ∈ 𝐱 0 + 𝒦𝑚

Once the vector 𝐲 is obtained from the system 𝑻𝐲 = ‖𝒓0 ‖2 𝒆1 we can construct the
solution.

■ Termination Criterion: The above algorithm depends on a parameter 𝑚 which is the


dimension of the Krylov subspace. In practice it is desirable to select 𝑚 in a dynamic
fashion. This would be possible if the residual norm of the solution 𝐱 𝑚 is available
inexpensively (without having to compute 𝐱 𝑚 itself). Then the algorithm can be stopped
at the appropriate step using this information. The following proposition gives a result in
this direction.
The residual vector of the approximate solution 𝐱 𝑘 computed by the Lanczos Algorithm is
such that
𝒃 − 𝑨𝐱 𝑘 = 𝒃 − 𝑨(𝐱 0 + 𝑽𝑘 𝐲𝑘 )
= 𝒓0 − 𝑨𝑽𝑘 𝐲𝑘
= ‖𝒓0 ‖2 𝑽𝑘 𝒆1 − (𝑽𝑘 𝑻𝑘 𝐲𝑘 + 𝛾𝑘+1 𝐯𝑘+1 𝒆𝑇𝑘 𝐲𝑘 )
= −𝛾𝑘+1 𝐯𝑘+1 𝒆𝑇𝑘 𝐲𝑘

Then the tolerance error 𝜀 is defined by ‖𝒓0 ‖2 𝜀 = ‖𝒓𝑘 ‖2 = 𝛾𝑘+1 |𝒆𝑇𝑘 𝐲𝑘 | × ‖𝐯𝑘+1 ‖2

clear all, clc, M=10*rand(7,7); D=diag([1 1 1 2 2 7 6]);


A=M*D*inv(M); b=100*rand(7,1); n =length(b);
x0= rand(n,1); toll=0.0001; r0=b-A*x0; nres0=norm(r0,2);
V=r0/nres0; W=V; gamma(1)=0; beta(1)=0; k=1; nres=1;
while k <= n && nres > toll
vk=V(:,k); wk=W(:,k);
if k==1, vk1=0*vk; wk1=0*wk;
else, vk1=V(:,k-1); wk1= W(:,k-1);
end
alpha(k)=wk'*A*vk;
tildev=A*vk-alpha(k)*vk-beta(k)*vk1;
tildew=A'*wk-alpha(k)*wk-gamma(k)*wk1;
gamma(k+1)=sqrt(abs(tildew'*tildev)); % gamma(k+1)=tildez'*tildev;
if gamma(k+1) == 0, k=n+2;
else
beta(k+1)=tildew'*tildev/gamma(k+1);
W=[W,tildew/beta(k+1)];
V=[V,tildev/gamma(k+1)];
end
if k<n+2
if k==1
Tk = alpha;
else
Tk=diag(alpha)+diag(beta(2:k),1)+diag(gamma(2:k),-1);
end
yk=Tk\(nres0*[1,0*[1:k-1]]');
xk=x0+V(:,1:k)*yk;
nres=abs(gamma(k+1)*[0*[1:k-1],1]*yk)*norm(V(:,k+1),2)/nres0;
k=k+1;
else
return
end
end
m=k-1; % The Krylov space dimension
A, Tk, eig(A), eig(Tk), xk,
Zero1=Tk-W(:,1:m)'*A*V(:,1:m) % verification1
Zero2= A*xk-b % verification2
[X1 D]=eig(Tk); X=V(:,1:m)*X1;
A*X-X*D
Example: Let we execute the program with the following matrix
A =
1.250321 3.209004 4.593262 2.748218 9.173651 1.236518
4.137617 5.958772 7.751354 5.337032 6.935770 3.988718
3.362769 1.916382 9.758420 3.546427 6.875781 3.475287
2.461339 5.068413 2.549812 7.292116 6.241738 2.997823
3.688499 7.626944 8.882880 5.214819 6.937188 3.948593
8.288752 0.028778 8.265965 4.578493 0.596674 8.944608

The result will be

Tk =
25.6608 9.7823 0 0 0 0
9.7823 8.8551 2.3272 0 0 0
0 2.3272 -4.8674 -4.8879 0 0
0 0 4.8879 3.4774 -0.7855 0
0 0 0 0.7855 5.5885 -0.8947
0 0 0 0 0.8947 1.4270

Remark:1 It is of great importance to know how to solve 𝑻𝑘 𝐲𝑘 = 𝒛𝑘 with 𝒛𝑘 = ‖𝒓0 ‖2 𝒆1 ?

𝛼1 𝛽2 0 1 0 ⋱ 0 𝑑1 𝜇2 0
𝛾2 ⋱ ℓ2 ⋱
𝑻𝑘 = ( 𝛼2 1 ⋱ 0 𝑑2
⋱ 𝛽𝑘 ) = ( 0 )( ⋱ 𝜇𝑘 ) = 𝑳 𝑘 𝑹 𝑘
⋱ 𝛾𝑘 𝛼𝑘 ⋱ ℓ𝑘 1 ⋱ 0 𝑑𝑘
0 0 0
Hence the equation 𝑻𝑘 𝐲𝑘 = 𝑳𝑘 𝑹𝑘 𝐲𝑘 = 𝑳𝑘 𝝑𝑘 = 𝒛𝑘 can be solved recursively by back-
substitutions.

Remark:2 If the algorithm does not break down before completion of 𝑛 steps, then,
defining 𝐕𝑛 = [𝐯1 , 𝐯2 , … , 𝐯𝑛 ] and 𝐖𝑛 = [𝐰1 , 𝐰2 , … , 𝐰𝑛 ] with 𝐖𝑛 𝑇 𝐕𝑛 = 𝑰.

we obtain 𝑻 = 𝐖𝑛 𝑇 𝑨𝐕𝑛 , 𝑨𝐕𝑛 = 𝐕𝑛 𝑻 + 𝛾𝑛+1 𝐯𝑛+1 𝐞𝑛 𝑇 and 𝑨𝑇 𝐖𝑛 = 𝐖𝑛 𝑻 + 𝛽𝑛+1 𝐰𝑛+1 𝐞𝑛 𝑇 where


𝛼1 𝛽2 0
𝛾2 ⋱
𝑻 is tridiagonal(𝛼1 , 𝛼2 , … , 𝛼𝑛 , 𝛽2 , … , 𝛽𝑛 , 𝛾2 , … , 𝛾𝑛 ) = ( 𝛼2 𝑛×𝑛
⋱ 𝛽𝑛 ) ∈ ℝ
⋱ 𝛾𝑛 𝛼𝑛
0

■ Link between Lanczos Method and Krylov Space: From the above algorithm we
know that
𝑨𝐯𝑘 = 𝛽𝑘 𝐯𝑘−1 + 𝛼𝑘 𝐯𝑘 + 𝛾𝑘+1 𝐯𝑘+1 𝛽1 𝐯0 = 0
𝑇 with 𝑘 = 1,2, … 𝑚
𝑨 𝐰𝑘 = 𝛾𝑘 𝐰𝑘−1 + 𝛼𝑘 𝐰𝑘 + 𝛽𝑘+1 𝐰𝑘+1 𝛾1 𝐰0 = 0

A back-substitutions will give


𝑘 𝑘
𝐯𝑘 = ∑ 𝑐𝑖 𝑨𝑖−1 𝐯1 and 𝐰𝑘 = ∑ 𝑑𝑖 𝑨𝑖−1 𝐰1 with 𝑘 = 1,2, … 𝑚
𝑖=1 𝑖=1

𝐾𝑚 (𝑨, 𝐯1 ) = span{𝐯1 , 𝐯2 , … , 𝐯𝑚 } = span{𝐯1, 𝑨𝐯1 , … , 𝑨𝑚−1 𝐯1 }


ℒ𝑚 (𝑨𝑇 , 𝐰1 ) = span{𝐰1 , 𝐰2 , … , 𝐰𝑚 } = span{𝐰1 , 𝑨𝑇 𝐰1 , . . . , (𝑨𝑇 )𝑚−1 𝐰1 }
How do you solve a system of linear equations
𝑨𝐱 = 𝒃 when your coefficient matrix 𝑨 is large and sparse (i.e., contains many zero
entries)? What if the order 𝑛 of the matrix is so large that you cannot afford to spend
about 𝑛3 operations to solve the system by Gaussian elimination? Or what if you do not
have direct access to the matrix? Perhaps the matrix 𝑨 exists only implicitly as a
subroutine that, when given a vector 𝐯, returns 𝑨𝐯. In this case you may want to use a
Krylov method. The Arnoldi method belongs to a class of linear algebra algorithms that
give a partial result after a small number of iterations, in contrast to so-called direct
methods which must complete to give any useful results. The partial result in this case
being the first few vectors of the basis the algorithm is building. When applied to
Hermitian matrices it reduces to the Lanczos algorithm. The Arnoldi iteration was
invented by W. E. Arnoldi in 1951.

One way to extend the Lanczos process to unsymmetric matrices is due to Arnoldi (1951)
and revolves around the Hessenberg reduction 𝑸𝑇 𝑨𝑸 = 𝑯. In particular, if 𝑸 = [𝒒1 , … , 𝒒𝑚 ]
and we compare columns in 𝑨𝑸 = 𝑸𝑯 , then
𝑘+1
𝑨𝑸 = 𝑸𝑯 ⟺ 𝑨𝒒𝑘 = ∑ ℎ𝑖𝑘 𝒒𝑖 1≤𝑘 ≤𝑛−1
𝑖=1

Isolating the last term in the summation gives


𝑘
ℎ(𝑘 + 1, 𝑘)𝒒𝑘+1 = 𝑨𝒒𝑘 − ∑ ℎ𝑖𝑘 𝒒𝑖 ≜ 𝒓𝑘 (residue)
𝑖=1

where ℎ𝑖𝑘 = 𝒒𝑖 𝑇 𝑨𝒒𝑘 for 𝑖 = 1: 𝑘. It follows that if 𝒓𝑘 ≠ 0, then 𝒒𝑘+1 is specified by

𝒒𝑘+1 = 𝒓𝑘 /ℎ(𝑘 + 1, 𝑘)

where ℎ(𝑘 + 1, 𝑘) = ‖𝒓𝑘 ‖2 . These equations define the Arnoldi process and in strict analogy
to the symmetric Lanczos process we obtain:

𝑟0 = 𝑞1 , ℎ10 = 1, 𝑘 = 0,
while (if ℎ(𝑘 + 1, 𝑘) ≠ 0)
𝒒𝑘+1 = 𝒓𝑘 /ℎ(𝑘 + 1, 𝑘)
𝑘 = 𝑘 + 1
𝒓𝑘 = 𝑨𝒒𝑘
for 𝑖 = 1: 𝑘
ℎ𝑖𝑘 = 𝒒𝑖 𝑇 𝒓𝑘
𝒓𝑘 = 𝒓𝑘 − ℎ𝑖𝑘 𝒒𝑖
end
ℎ(𝑘 + 1, 𝑘) = ‖𝒓𝑘 ‖2
end

The algorithm breaks down when 𝒓𝑘 is the zero vector. This happens when the minimal
polynomial of 𝑨 is of degree 𝑘. (see the Carl D. Meyer theorem 1998)
clear all, clc, M=rand(8,8); D=diag([1 2 3 4 6 6 6 6]); A=M*D*inv(M);
n=size(A,1); b=100*rand(8,1); x0= rand(n,1); r0=b-A*x0; nr0=norm(r0,2);
q(:,1) = r0/norm(r0); q1=q(:,1); r(:,1)=q1;
d = 1; k = 1; toll=0.001;
while (d>toll)
r(:,k)= A*q(:,k);

for i = 1:k
h(i,k) = (q(:,i))'*r(:,k);
r(:,k)=r(:,k) - h(i,k)*q(:,i)
end

h(k+1,k) = norm(r(:,k),2);
q(:,k+1) = r(:,k)/h(k+1,k);
d=abs(h(k+1,k));

m=k; % The Krylov dimension


k = k+1;
end
H=h(1:m,1:m)
Q=q(:,1:m)
zero=H-triu(Q'*A*Q,-1)

ym=inv(h(1:m,1:m))*(nr0*[1,0*[1:m-1]]'); % The Krylov solution


xm=x0+q(:,1:m)*ym;
Zero= A*xm-b

Interpretation: How was Arnoldi's algorithm interpreted as generating Krylav subspaces


and what is the relationship between them? From the algorithm we know that: 𝒒𝑘+1 =
𝒓𝑘 /ℎ(𝑘 + 1, 𝑘) and 𝒓𝑘 = 𝑨𝒒𝑘 ⟹ 𝒒𝑘+1 = 𝛼𝑘 𝑨𝒒𝑘 with 𝛼𝑘 = 1/ℎ(𝑘 + 1, 𝑘)

⟹ 𝒒𝑘 = (𝛼1 𝛼2 … 𝛼𝑘 )𝑨𝑘 𝒒1 = 𝛽𝑘 𝑨𝑘 𝒒1

We assume that 𝒒1 is a given unit 2-norm starting vector. The 𝒒𝑘 are called the Arnoldi
vectors and they define an orthonormal basis for the Krylov subspace 𝐾𝑛 (𝑨, 𝒒1 , 𝑘):

𝐾𝑛 (𝑨, 𝒒1 , 𝑘) = span{𝒒1 , 𝒒2 , . . . , 𝒒𝑘 } = span{𝒒1 , 𝑨𝒒1 , . . . , 𝑨𝑘−1 𝒒1 }.

The situation after 𝑘 𝑡ℎ step is summarized by the 𝑘 𝑡ℎ step Arnoldi factorization

𝑨𝑸𝑘 = 𝑸𝑘 𝑯𝑘 + 𝒓𝑘 𝒆𝑇𝑘

where 𝑸𝑘 = [𝒒1 , . . . , 𝒒𝑘 ], 𝒆𝑘 = 𝑰(: , 𝑘), and


ℎ11 ℎ12 ⋯ ℎ1,𝑘−1 ℎ1𝑘
ℎ21 ℎ22 ℎ2,𝑘−1 ℎ2𝑘
ℎ32 ⋮
𝑯𝑘 = 0 ⋱ ⋮ ⋮
⋱ ℎ𝑘−1,𝑘−1
⋮ ⋱ ⋮
( 0 ⋯ 0 ℎ𝑘,𝑘−1 ℎ𝑘𝑘 )
If 𝒓𝑘 = 0, then the columns of 𝑸𝑘 define an invariant subspace and 𝜆(𝑯𝑘 ) ⊂ 𝜆(𝑨).
Otherwise, the focus is on how to extract information about 𝑨′𝑠 eigensystem from the
Hessenberg matrix 𝑯𝑘 and the matrix 𝑸𝑘 of Arnoldi vectors.

The matrix 𝑯 = 𝑸𝑇 𝑨𝑸 can be interpreted as the representation in the basis {𝒒1 , . . . , 𝒒𝑛 } of


the orthogonal projection of 𝑨 onto 𝐾𝑛 .

The Arnoldi iteration has two roles ❶ the basis of many of the iterative algorithms of
numerical linear algebra ❷ find eigenvalues of non-Hermitian matrices (i.e. using QR).

Note:1 In particular we showed that the solution to a nonsingular linear system 𝑨𝐱 = 𝒃


lies in a Krylov space whose dimension is the degree of the minimal polynomial of 𝑨.
Therefore, if the minimal polynomial of 𝑨 has low degree then thespace in which a Krylov
method searches for the solution can be small. In this case a Krylov method has the
opportunity to converge fast.

Note:2 Let 𝑯 = 𝑸𝑇 𝑨𝑸 ∈ ℝ𝑚×𝑚 be the Arnoldi factorization and 𝑻 = 𝑾𝑇 𝑨𝑽 ∈ ℝ𝑚×𝑚 be the


Lanczo factorization, then
𝜆𝑰 − 𝑯 = 𝜆𝑰 − 𝑸𝑇 𝑨𝑸 = 𝜆𝑸𝑇 𝑸 − 𝑸𝑇 𝑨𝑸 = 𝑸𝑇 (𝜆𝑰 − 𝑨)𝑸
⟹ det(𝜆𝑰 − 𝑯) = det(𝜆𝑰 − 𝑨) det(𝑸𝑇 𝑸)
⟹ det(𝜆𝑰 − 𝑯) = det(𝜆𝑰 − 𝑨)
And for the Lanczo factorization
𝜆𝑰 − 𝑻 = 𝜆𝑰 − 𝑾𝑇 𝑨𝑽 = 𝜆𝑾𝑇 𝑽 − 𝑾𝑇 𝑨𝑽 = 𝑾𝑇 (𝜆𝑰 − 𝑨)𝑽
⟹ det(𝜆𝑰 − 𝑻) = det(𝜆𝑰 − 𝑨) det(𝑾𝑇 𝑽)
⟹ det(𝜆𝑰 − 𝑻) = det(𝜆𝑰 − 𝑨)
From this result we conclude that 𝑯 ∈ ℝ𝑚×𝑚 and 𝑻 ∈ ℝ𝑚×𝑚 are minimal representation of
𝑨 ∈ ℝ𝑛×𝑛 with the same spectrum. (This is the idea behind the use of Krylov methods in
order reduction). Also the Arnoldi factorization yields to upper Hessenberg form.

In numerical optimization, the


Broyden algorithm is an iterative method for solving unconstrained nonlinear
optimization problems. The optimization problem is to minimize 𝑓(𝐱), where 𝐱 is a vector
in ℝ𝑛 , and 𝑓 is a differentiable scalar function. There are no constraints on the values
that 𝐱 can take. The algorithm begins at an initial estimate for the optimal value 𝐱 0 and
proceeds iteratively to get a better estimate at each stage. The development of this
algorithm is detailed in the fourth chapter. This method converges without any
conditions (i.e. symmetry positive-definiteness dominance etc…).

Algorithm:
Data: 𝐟(𝐱) = (𝑨𝐱 − 𝒃), 𝐱 0 , 𝐟(𝐱 0 ) and 𝑩0
Result: 𝐱 𝑘
begin:
𝐱 𝑘+1 = (𝑰 − 𝑩𝑘 𝑨)𝐱 𝑘 + 𝑩𝑘 𝒃
𝐲𝑘 = 𝑨(𝐱 𝑘+1 − 𝐱 𝑘 )
𝒔𝑘 − 𝑩𝑘 𝐲𝑘
𝑩𝑘+1 = 𝑩𝑘 + ( 𝑇 ) 𝒔𝑘 𝑇 𝑩𝑘
𝒔𝑘 𝑩𝑘 𝐲𝑘
𝒔𝑘 = 𝐱 𝑘+1 − 𝐱 𝑘
end
This is the author property and in the limit of our knowledge none has been used this
algorithm in linear systems.

clear all, clc, V= diag([-1 -2 -3 -4 -5 -6 -7 -8 -9 -10]);


M=rand(10,10); A=M*V*inv(M); b=10*rand(10,1); n=max(size(A));
m=min(size(b)); x0=zeros(n,m); B=10*eye(n,n); I=eye(n,n);
for k=1:50
x1=(I-B*A)*x0+B*b; % x1 = x0 - B*(A*x0-b)
y=A*(x1-x0); s=x1-x0;
B = B + ((s-B*y)*(s'*B))/(s'*B*y);
x0=x1;
end
x1, ZERO1=A*x1-b, ZERO2= eye(10) - B*A % verification

There are several other


matrix decompositions such as QR decomposition, and singular value decomposition
(SVD). Instead of looking into the details of these algorithms, we will simply survey the
MATLAB built-in functions implementing these decompositions.

Given a system of linear equations of the form 𝑨𝐱 = 𝒃 and assume that 𝑨 is decomposed
in the 𝑸𝑹 form, then 𝑸𝑹𝐱 = 𝒃 ⟺ 𝑹𝐱 = 𝑸𝑇 𝒃 = 𝒃𝑛𝑒𝑤 ⟺ 𝑹𝐱 = 𝒃𝑛𝑒𝑤 so we have solve an upper
triangular matrix equation by back-substitutions.

clear all, clc, A=10*rand(10,10); b=10*rand(10,1);


n=size(A,1); x=zeros(n,1); b1=b; [Q R]=qr(A); b=Q'*b; % 𝒃𝑛𝑒𝑤
for j=n:-1:1
if (R(j,j)==0)
error('Matrix is singular!');
end
x(j)=b(j)/R(j,j); b(1:j-1)=b(1:j-1)-R(1:j-1,j)*x(j);
end
A*x-b1

The SVD (singular value decomposition) is to express an 𝑚 × 𝑛 matrix 𝑨 in the following


form 𝑨 = 𝑼𝑺𝑽𝑇 where 𝑼 is an orthogonal (unitary) 𝑚 × 𝑚 matrix, 𝑽 is an orthogonal
(unitary) 𝑛 × 𝑛 matrix, and 𝑺 is a real diagonal 𝑚 × 𝑛 matrix having the singular values of
𝑨 (the square roots of the eigenvalues of 𝑨𝑇 𝑨) in decreasing order on its diagonal.

𝜎1−1 𝟎
𝑇 +
𝑨𝐱 = 𝒃 ⟺ 𝑼𝑺𝑽 𝐱 = 𝒃 ⟺ 𝐱 = 𝑽𝑺 𝑼 𝒃 𝑇 +
𝑺 =( ⋱ )
𝜎𝑟−1
𝟎 𝟎
This is implemented by the MATLAB built-in functions svd() and pinv().

clear all, clc, A=10*rand(10,10); b=10*rand(10,1);


n=size(A,1); [U,S,V] = svd(A); x=V*pinv(S)*U'*b;
A*x-b
Many practical problems in
engineering and physics lead to eigenvalue problems. Eigenvalue problems in science
and engineering are often formulated at the level of the differential equation, but the
essential features of eigenvalue problems can be studied at the matrix level. The matrix
eigenvalue problem can be stated as 𝑨𝐱 = 𝜆𝐱 where 𝑨 is a given 𝑛 × 𝑛 matrix. The problem
is to find the scalar 𝜆 and the vector 𝐱. Rewriting the previous equation in the form
(𝑨 − 𝜆𝑰)𝐱 = 𝟎 it becomes apparent that we are dealing with a system of n homogeneous
equations. An obvious solution is the trivial one 𝐱 = 𝟎. A nontrivial solution can exist
only if the determinant of the coefficient matrix vanishes; that is, if |𝑨 − 𝜆𝑰|𝐱 = 0.
Expansion of the determinant leads to the polynomial equation known as the
characteristic equation Δ(𝜆) = ∏𝑛𝑖=1(𝜆 − 𝜆𝑖 ) = ∑𝑛𝑖=0 𝑎𝑖 𝜆𝑛−𝑖 which has the roots 𝜆𝑖 with
𝑖 = 1,2, … , 𝑛, called the eigenvalues of the matrix 𝑨. The solutions 𝐱 𝑖 of (𝑨 − 𝜆𝑖 𝑰)𝐱 = 𝟎 are
known as the eigenvectors. Eigenvalue problems that originate from physical problems
often end up with a symmetric 𝑨. This is fortunate, because symmetric eigenvalue
problems are much easier to solve than their non-symmetric counterparts.

For many real situations, the eigenvalue problem does not arise in the standard form
𝑨𝐱 = 𝜆𝐱 but rather in the form 𝑨𝐱 = 𝜆𝑩𝐱 Where 𝑨 & 𝑩 are two symmetric matrices. It is
much more convenient if the equation 𝑨𝐱 = 𝜆𝑩𝐱 can be converted to the standard form.

if 𝑩 is nonsingular then 𝑨𝐱 = 𝜆𝑩𝐱 ⟺ (𝑩−1 𝑨)𝐱 = 𝜆𝐱


if 𝑨 is nonsingular then 𝑨𝐱 = 𝜆𝑩𝐱 ⟺ (𝑨−1 𝑩)𝐱 = 𝛾𝐱 with 𝛾 = 1/𝜆
if 𝑩 is positive definite then 𝑩 can be written in Cholsky decomposition 𝑩 = 𝑳𝑳𝑇

𝑨𝐱 = 𝜆𝑩𝐱 ⟺ 𝑳−1 𝑨𝐱 = 𝜆𝑳−1 𝑩𝐱 = 𝜆𝑳𝑇 𝐱


⟺ 𝑳−1 𝑨(𝑳−𝑇 𝑳𝑇 )𝐱 = 𝜆𝑳𝑇 𝐱
⟺ (𝑳−1 𝑨𝑳−𝑇 )(𝑳𝑇 𝐱) = 𝜆(𝑳𝑇 𝐱)

If we define 𝑯 = 𝑳−1 𝑨𝑳−𝑇 and 𝐳 = 𝑳𝑇 𝐱 we get: 𝑨𝐱 = 𝜆𝑩𝐱 ⟺ 𝑯𝐳 = 𝜆𝐳, which is in standard


form, Moreover the matrix 𝑯 = 𝑳−1 𝑨𝑳−𝑇 has the same eignevalues with the original one.

if 𝑨 is positive definite then 𝑨 can be written in Cholesky decomposition 𝑨 = 𝑳𝑳𝑇

𝑨𝐱 = 𝜆𝑩𝐱 ⟺ (𝑳−1 𝑩𝑳−𝑇 )𝐳 = 𝛾𝐳 with 𝛾 = 1/𝜆

The main algorithms for actually computing eigenvalues and eigenvectors are the 𝑄𝐷
algorithm (by Heinz Rutishauser ), the power method, the 𝐿𝑅 algorithm of Rutishauser
and the powerful 𝑄𝑅 algorithm of Francis using the Householder Transformations. In
this Chapter, we are concerned with numerical methods for computing the eigenvalues
and eigenvectors of an 𝑛 × 𝑛 matrix 𝑨. The first method we study is called the power
method. The power method is an iterative method for finding the dominant eigenvalue of
a matrix and a corresponding eigenvector. By the dominant eigenvalue, we mean an
eigenvalue λ1 satisfying |𝜆1 | > |𝜆𝑖 | for 𝑖 = 2, . . . , 𝑛 . If the eigenvalues of 𝑨 satisfy |𝜆1 | >
|𝜆2 | > ⋯ > |𝜆𝑛 | then the power method can be used to compute the eigenvalues one at a
time. The QR algorithm, is an iterative method involving orthogonal similarity
transformations. It has many advantages over the power method. It will converge
whether or not 𝑨 has a dominant eigenvalue, and it calculates all the eigenvalues at the
same time.
Before starting introducing the numerical algorithms it is of great importance to give
some basic elements of Eigen-systems.

Definition: The set of all solutions of (𝑨 − 𝜆𝑰)𝐱 = 𝟎 is called the eigenspace of 𝑨


corresponding to eigenvalue 𝜆. We determine 𝜆 by solving the characteristic equation
det(𝑨 − 𝜆𝑰) = 𝟎.

Theorem: (Fundamental Theorem of Algebra) Every non-constant polynomial has at


least one root (possibly, in the complex field). If 𝑝(𝜆) is a non-constant polynomial of real
coefficients and complex variable, then it can be factorized as a product of linear and
quadratic factors of which coefficients are all real.

Remark: Eigenspace is a subspace of ℝ𝑛 and the eigenspace of 𝑨 corresponding to 𝜆 is


Null(𝑨 − 𝜆𝑰). If 𝜆 = 0 is an eigenvalue of 𝑨 then 𝑨 is not invertible. If 𝑛 × 𝑛 matrices 𝑨 and
𝑩 are similar, then they have the same characteristic polynomial, and hence the same
eigenvalues (with the same multiplicities)

Theorem: The eigenvalues of a triangular matrix are the entries on its main diagonal.

An 𝑛 × 𝑛 matrix 𝑨 is said to be diagonalizable if there exists an invertible matrix 𝑷 and a


diagonal matrix 𝑫 such that 𝑨 = 𝑷𝑫𝑷−1 and 𝑷 is called modal matrix.

Remark: Any 𝑛 × 𝑛 matrix with 𝑛 distinct eigenvalues is diagonalizable, but it is not


always true that: any matrix 𝑨 ∈ ℝ𝑛×𝑛 is diagonalizable.

Theorem: (Matrices Whose Eigenvalues Are Not Distinct) Let 𝑨 ∈ ℝ𝑛×𝑛 be a matrix
whose distinct eigenvalues are 𝜆1 , 𝜆2 , … , 𝜆𝑝 . Let 𝑬𝑨 (𝜆𝑘 ) be the eigenspace for 𝜆𝑘 .
1. dim(𝑬𝑨 (𝜆𝑘 )) ≤ (the multiplicity of the eigenvalue 𝜆𝑘 ), for 1 ≤ 𝑘 ≤ 𝑝
2. The matrix 𝑨 is diagonalizable ⟺ the sum of the dimensions of the eigenspaces
equals 𝑛 ⟺ dim(𝑬𝑨 (𝜆𝑘 )) = the multiplicity of 𝜆𝑘 , for each 1 ≤ 𝑘 ≤ 𝑝 (and the
characteristic polynomial factors completely into linear factors)
3. If 𝑨 is diagonalizable and ℬ𝑘 is a basis for 𝑬𝑨 (𝜆𝑘 ), then the total collection of vectors in
the sets {ℬ1 , ℬ2 , … , ℬ𝑝 } forms an eigenvector basis for ℝ𝑛 .

Proof: See the Algebra book by BEKHITI B 2020

Theorem: If 𝜆 = 𝜎 + 𝜔𝑗 is an eigenvalue of 𝑨 ∈ ℝ𝑛×𝑛 with corresponding eigenvector 𝐱,


then the complex conjugate of 𝜆, 𝜆̅ = 𝜎 − 𝜔𝑗, is also an eigenvalue with eigenvector 𝐱̅.

Proof: we have 𝑨 ∈ ℝ𝑛×𝑛 ⟺ 𝑨̅ = 𝑨 ⟺ 𝑨𝐱 = 𝑨𝐱̅ = ̅̅̅


̅̅̅̅ 𝜆𝐱 = 𝜆̅𝐱̅ ■

We now spotlight to the problem of finding the


simplest structures to which any square matrix can be reduced via similarity
transformations. To make things more clear, let we start by exemplification of the
problem. Assume that a matrix 𝑨 ∈ ℝ4×4 be a non-diagonalizable (i.e. the algebraic and
geometric multiplicities of 𝜆 coincide 𝑞𝑖 = 𝑚𝑖 ∀ 𝑖) and assume that 𝑨𝑐 = 𝑻𝑐 𝑨𝑻−1
𝑐 be the
controllable canonical form.
Before starting development it is recommended to ask the following question: How the
eigenvectors of 𝑨𝑐 & 𝑨 are related to each other?

𝑨𝑐 𝐱 = 𝜆𝐱
{ ⟺ 𝑐 )𝐱 = 𝜆𝐱 ⟺ 𝑨(𝑻𝑐 𝐱) = 𝜆(𝑻𝑐 𝐱) ⟺ 𝐱 = 𝑻𝑐 𝐯
(𝑻𝑐 𝑨𝑻−1 −1 −1
𝑨𝐯 = 𝜆𝐯

As a second question what is the relationship between eigenvectors 𝐱1 , 𝐱 2 , … , 𝐱 𝑚 of the


matrix 𝑨 corresponding to the eigenvalue 𝜆?

To answer this question we consider characteristic equation of 𝑨 ∈ ℝ4×4 which is given by

∆(𝜆) = det(𝑨 − 𝜆𝑰) = 𝜆4 + 𝛼1 𝜆3 + 𝛼2 𝜆2 + 𝛼3 𝜆 + 𝛼4

Assume that 𝜆1 is repeated three times, that is 𝑚1 = 3 so 𝜆2 is not repeated.

∆(𝜆) = (𝜆 − 𝜆1 )3 (𝜆 − 𝜆2 )

Observe that 𝜆1 is a solution to the following complex equations

𝑑∆(𝜆) 𝑑 2 ∆(𝜆)
∆(𝜆) = 0, =0 & =0
𝑑𝜆 𝑑𝜆2

But 𝜆2 is only the solution of ∆(𝜆) = 0, starting from this fact we can say that
∆(𝜆1 ) = 0 𝜆14 + 𝛼1 𝜆13 + 𝛼2 𝜆12 + 𝛼3 𝜆1 + 𝛼4 = 0
𝑑∆(𝜆)
| =0
𝑑𝜆 𝜆 4𝜆13 + 3𝛼1 𝜆12 + 2𝛼2 𝜆1 + 𝛼3 = 0
1

𝑑2 ∆(𝜆)
| =0 6𝜆12 + 3𝛼1 𝜆1 + 𝛼2 = 0
𝑑𝜆2 𝜆
1
{∆(𝜆2 ) = 0 {𝜆42 + 𝛼1 𝜆32 + 𝛼2 𝜆22 + 𝛼3 𝜆2 + 𝛼4 = 0

In matrix form we can write 𝑨𝑐 𝐕𝑐 = 𝐕𝑐 𝚲 with 𝐕𝑐 = [𝐱11 ⋮ 𝐱12 ⋮ 𝐱13 ⋮ 𝐱 2 ] 𝚲 = 𝑱1 ⊕ 𝑱2 or

0 1 0 0 1 0 0 1 1 0 0 1 𝜆1 1 0 0
0 0 1 0 𝜆1 1 0 𝜆2 𝜆1 1 0 𝜆2 0 𝜆1 1 0
( ) (𝜆2 2𝜆1 1 𝜆22 ) = (𝜆12 2𝜆1 1 𝜆22 ) ( 0 )
0 0 0 1 1 0 𝜆1 0
−𝛼4 −𝛼3 −𝛼2 −𝛼1 𝜆3 3𝜆12 3𝜆1 𝜆32 𝜆13 3𝜆12 3𝜆1 𝜆32 0 0 0 𝜆2
1

In vector form we write


𝑨𝑐 𝐱11 = 𝜆1 𝐱11 𝐱11 ∈ Null(𝑨𝑐 − 𝜆1 𝑰) 𝐯11 ∈ Null(𝑨 − 𝜆1 𝑰)
𝑨 𝐱 = 𝜆1 𝐱12 + 𝐱11 𝐱 ∈ Null(𝑨𝑐 − 𝜆1 𝑰)2 𝐯 ∈ Null(𝑨 − 𝜆1 𝑰)2
{ 𝑐 12
𝑨𝑐 𝐱13 = 𝜆1 𝐱13 + 𝐱12
⟺ 12 ⟺ 12
𝐱13 ∈ Null(𝑨𝑐 − 𝜆1 𝑰)3 𝐯13 ∈ Null(𝑨 − 𝜆1 𝑰)3
𝑨𝑐 𝐱 2 = 𝜆2 𝐱 2 { 𝐱2 ∈ Null(𝑨𝑐 − 𝜆2 𝑰) { 𝐯2 ∈ Null(𝑨 − 𝜆2 𝑰)

Definition: A Jordan chain of length 𝑚 is a sequence of nonzero vectors 𝐮1 , 𝐮2 , … 𝐮𝑚 in ℂ𝑛


such that: {𝑨𝐮1 = 𝜆𝐮1 and 𝑨𝐮𝑘 = 𝜆𝐮𝑘 + 𝐮𝑘−1 𝑘 = 2,3, … , 𝑚} for some eigenvalue 𝜆 of 𝑨.

Observe that 𝐮1 is an eigenvector of 𝑨 associated with the eigenvalue 𝜆. Therefore, Jordan


chains are necessarily associated with eigenvalues of 𝑨. The vectors 𝐮2 , 𝐮3 , … 𝐮𝑚 are called
generalized eigenvectors, which we define below.
Definition: A nonzero vector 𝐱 is said to be generalized eigenvector associated with
eigenvalue 𝜆 of 𝑨 if (𝑨 − 𝜆1 𝑰)𝑘 𝐱 = 𝟎 for some positive integer 𝑘.

If 𝐱 is an eigenvector associated with eigenvalue 𝜆, then it satisfies (𝑨 − 𝜆1 𝑰)𝑘 𝐱 = 𝟎 with


𝑘 = 1. Therefore, an eigenvector of 𝑨 is also a generalized eigenvector of 𝑨. The converse
is false: a generalized eigenvector is not necessarily an eigenvector. The above
development can be generalized as
𝑨𝑐 𝐕𝑐 = 𝐕𝑐 𝚲 ⟺ 𝑨𝑻−1 −1 ℓ
𝑐 𝐕𝑐 = 𝑻𝑐 𝐕𝑐 (⊕𝑘=1 𝑱𝑘 )
⟺ 𝑨𝑷 = 𝑷(⊕ℓ𝑘=1 𝑱𝑘 )
𝑱1 𝟎
⟺ 𝑨[𝐏1 𝐏2 … 𝐏ℓ ] = [𝐏1 𝐏2 … 𝐏ℓ ] ( ⋱ )
𝟎 𝑱ℓ
⟺ 𝑨𝐏𝑘 = 𝐏𝑘 𝑱𝑘 𝑘 = 1…ℓ

𝜆𝑘 1 ⋯ 0
𝜆𝑘 ⋱
Where 𝐏𝑘 = [𝐩𝑘1 𝐩𝑘2 … 𝐩𝑘𝑚 ] and 𝑱𝑘 = ( ⋮ ) ∈ ℝ𝑚𝑘 ×𝑚𝑘 with ∑ℓ𝑘=1 𝑚𝑘 = 𝑛. The
1 ⋱
𝜆𝑘 …
0
matrix 𝐏 = [𝐏1 𝐏2 … 𝐏ℓ ] is called the generalized modal matrix, and 𝚲 = (⊕ℓ𝑘=1 𝑱𝑘 ) is called
the Jordan normal form.

Theorem: Every 𝑛 × 𝑛 matrix (even if non-diagonalizable) is similar to a Jordan matrix.

Remark: If some pairs of Eigenvalues are complex conjugate then the Jordan matrix 𝚲
will contain complex entries on the main diagonal, the problem here is: how to avoid this
representation? To answer such question let we introduce an example of 2 × 2 matrix
𝑎 𝑏
whose eigenvalues are complex conjugate 𝑨 = ( ) Solving its characteristic equation
𝑐 𝑑
∆(𝜆) = det(𝑨 − 𝜆𝑰) = (𝜆 − 𝛼)2 + 𝛽 2 ⟹ 𝜆 = 𝛼 ± 𝑗𝛽

The pair of complex conjugate eigenvalues 𝜆 = 𝛼 ± 𝑗𝛽 corresponding to complex conjugate


eigenvectors 𝐮 ± 𝑗𝐯 such that
𝑨(𝐮 + 𝑗𝐯) = (𝛼 + 𝑗𝛽)(𝐮 + 𝑗𝐯) 𝛼 + 𝑗𝛽 0
{
𝑨(𝐮 − 𝑗𝐯) = (𝛼 − 𝑗𝛽)(𝐮 − 𝑗𝐯)
⟺ 𝑨(𝐮 + 𝑗𝐯|𝐮 − 𝑗𝐯) = (𝐮 + 𝑗𝐯|𝐮 − 𝑗𝐯) (
0 𝛼 + 𝑗𝛽
)

To avoid the complex representation let we consider 𝑨(𝐮 + 𝑗𝐯) = (𝛼 + 𝑗𝛽)(𝐮 + 𝑗𝐯) which can
𝛼 𝛽
be written as 𝑨𝐮 = 𝛼𝐮 − 𝛽𝐯 and 𝑨𝐯 = 𝛼𝐯 + 𝛽𝐮 ⟺ 𝑨[𝐮 𝐯] = [𝐮 𝐯] ( ). In general, if a
−𝛽 𝛼
matrix 𝑨 has complex eigenvalues, it may be similar to a block-diagonal matrix 𝑩, i.e.,
there exists an invertible matrix 𝑷 such that 𝑨𝑷 = 𝑷𝑩; where 𝑩 has the form
𝑩1 𝟎 ⋯
… 𝟎
𝑩2 ⋮
𝑩=(𝟎 ⋱ ), 𝑷 = [𝐏1 𝐏2 … 𝐏ℓ ]
⋮ ⋮ 𝟎
𝟎 𝟎 … 𝑩ℓ

Blocks 𝑩𝑘 are either a real eigenvalue 𝜆𝑘 (1 × 1 matrix), or 𝑩𝑘 is a 2 × 2 matrix (called a


block), corresponding to a complex eigenvalue 𝜆𝑘 = Re(𝜆𝑘 ) + 𝑖Im(𝜆𝑘 ) in the form

Re(𝜆𝑘 ) Im(𝜆𝑘 )
𝑩𝑘 = ( )
−Im(𝜆𝑘 ) Re(𝜆𝑘 )
and 𝑷 is assembled accordingly: if 𝑩𝑘 is an real eigenvalue, the corresponding column 𝐏𝑘
in 𝑷 is an eigenvector (real); if 𝑩𝑘 is a 2 × 2 block associated with a complex eigenvalue
𝜆𝑘 ; then 𝐏𝑘 consists of two columns: 𝐏𝑘 = [Re(𝐮𝑘 ), Im(𝐮𝑘 )] where 𝐮𝑘 is a complex
eigenvector for 𝜆𝑘 (note that both Re(𝐮𝑘 ) and Im(𝐮𝑘 ) are real vectors).

Remark: If some pairs of Eigenvalues are 𝑚 times repeated complex conjugate then the
Jordan matrix 𝚲 will be of the following form

𝑱1 𝟎 𝑪𝑘 𝑰 𝟎
𝑪𝑘 ⋱ 𝑎 −𝑏𝑘
𝚲=( ⋱ ) with 𝑱𝑘 = ( 𝟎 ⋱ 𝑰) 𝑪𝑘 = ( 𝑘 )
𝑏𝑘 𝑎𝑘
𝟎 𝑱ℓ 𝟎 𝑪𝑘

This real Jordan form is a consequence of the complex Jordan form. For a real matrix
the nonreal eigenvectors and generalized eigenvectors can always be chosen to form
complex conjugate pairs. Taking the real and imaginary part (linear combination of the
vector and its conjugate), the matrix has this form with respect to the new basis.

One important goal of linear algebra is to find the


simplest forms of matrices to which any square matrix can be reduced via similarity
transformations. This is given by the Jordan Canonical Form, which are special
triangular matrices that have zeroes everywhere except on the diagonal and the diagonal
immediately above the main diagonal. If we relax our requirement from Jordan forms to
triangular forms, then we have a remarkable result: every square matrix is similar to a
triangular matrix with the understanding that we allow complex entries for all matrices
we consider here. This result, which is fundamentally important in theory and
applications of linear algebra, is named after the mathematician Issai Schur and is
presented below.

Schur's Triangularization Theorem: Every matrix 𝑨 ∈ ℝ𝑛×𝑛 is similar to a upper-


triangular matrix 𝑹 = 𝑷−1 𝑨𝑷 whose diagonal entries are the eigenvalues of 𝑨.

Proof: Use induction on 𝑛, the size of the matrix. For 𝑛 = 1, there is nothing to prove.
For 𝑛 > 1, assume that all (𝑛 − 1) × (𝑛 − 1) matrices are unitarily similar to an upper-
triangular matrix, and consider an 𝑛 × 𝑛 matrix 𝑨. Suppose that (𝜆, 𝐱) is an eigenpair
for 𝑨, and suppose that 𝐱 has been normalized so that ‖𝐱‖2 = 1. We can construct an
elementary reflector 𝑹2 = 𝑰 or 𝑹 = 𝑹𝐻 = 𝑹−1 with the property that 𝑹𝐱 = 𝒆1 or,
equivalently, 𝐱 = 𝑹𝒆1

𝐮𝐮𝐻 1 if x1 is real
𝑹 = (𝑰 − 2 𝐻 ) with 𝐮 = 𝐱 ± 𝜇 ‖𝐱‖𝒆1 & 𝜇 = { x1
𝐮 𝐮 if x1 is not real}
|x | 1

This implies that 2𝐮𝐻 𝐱 = 𝐮𝐻 𝐮 ⟹ 𝑹𝐱 = 𝒆1 . Thus 𝐱 is the first column in 𝑹, so 𝑹 = (𝐱 |𝑽),


𝐻
and 𝑹𝑨𝑹 = 𝑹𝑨(𝐱 |𝑽) = 𝑹(𝜆𝐱 |𝑨𝑽) = (𝜆𝒆1 |𝑹𝑨𝑽) = (𝜆 𝐱 𝐻𝑨𝑽 ). Since 𝐕𝐻 𝑨𝑽 ∈ ℝ(𝑛−1)×(𝑛−1) , the
𝟎 𝐕 𝑨𝑽
induction hypothesis insures that there exists a unitary matrix 𝑸 ∈ ℝ(𝑛−1)×(𝑛−1) such that
1 𝟎
𝑻1 = 𝑸𝐻 (𝐕𝐻 𝑨𝑽)𝑸 is upper triangular. If 𝑼 = 𝑹 ( ), then 𝑼 is unitary (because
𝟎 𝑸
𝜆 𝐱 𝐻 𝑨𝑽𝑸 𝜆 𝐱 𝐻 𝑨𝑽𝑸
𝑼𝐻 = 𝑼−1 ), and 𝐔𝐻 𝑨𝑼 = ( ) = ( ) = 𝑻 is upper triangular. Since
𝟎 𝑸𝐻 𝐕𝐻 𝑨𝑽𝑸 𝟎 𝑻1
similar matrices have the same eigenvalues, and since the eigenvalues of a triangular
matrix are its diagonal entries, the diagonal entries of 𝑻 must be the eigenvalues of 𝑨. ■

Notably, Schur realized that the similarity transformation to triangularize the matrix can
be made unitary. Unitary matrices are analogues of orthogonal matrices in that their
columns form an orthonormal set but the columns may now have complex entries.

The Real Schur Form: Schur's triangularization theorem insures that every square
matrix 𝑨 is unitarily similar to an upper-triangular matrix say, 𝑼𝐻 𝑨𝑼 = 𝑻. But even
when 𝑨 is real, 𝑼 and 𝑻 may have to be complex if 𝑨 has some complex eigenvalues.
However, the matrices (and the arithmetic) can be constrained to be real by settling for a
block-triangular result with 2 × 2 or scalar entries on the diagonal. Also it can be proved
that for each 𝑨 ∈ ℝ𝑛×𝑛 there exists an orthogonal matrix 𝑷 ∈ ℝ𝑛×𝑛 and real matrices 𝑩𝑖𝑗
such that

𝑩11 𝑩12 ⋯ 𝑩1𝑘



𝑩22 𝑩2𝑘
𝑩 = 𝑷𝑇 𝑨𝑷 = ( 𝟎 ⋱ ) where 𝑩𝑖𝑖 is 1 × 1 or 2 × 2.
⋮ ⋮ ⋮
𝟎 𝟎 … 𝑩𝑘𝑘

For a real Matrix 𝑨, the Schur Form is an upper quasi-triangular Matrix 𝑻 with 1 × 1 or
2 × 2 blocks on its diagonal. The blocks correspond to either eigenvalues, or complex
conjugate pairs of eigenvalues, of the Matrix argument 𝑨.

The Real Schur Form by Givens Rotations: Now We are going to develop the QR
method in order to construct the real Schur decomposition of the matrix 𝑨 starting from
its upper Hessenberg form. The algorithm we will give now is summarized by the
function [𝑻, 𝑸, 𝑹] = SchurForm(𝑨, nmax) and starts by reducing the matrix 𝑨 to its upper
Hessenberg form 𝑯; so at each iteration 𝑄𝑅-factorization is performed by using Givens
rotations on the Hessenberg form. The global efficiency of the algorithm is ensured by
the use of the Givens matrices and by the construction of the orthonormal matrix
(𝑘) (𝑘)
𝑸(𝑘) = 𝑮1 ··· 𝑮𝑛−1 in the function prodgiv, with a cost of 𝑛2 − 2 flops, without explicitly
(𝑘)
calculating the Givens matrices 𝑮𝑗 for 𝑗 = 1 … 𝑛 − 1.

Again the Real Schur Form decompositions can also be computed with a series of Givens
rotations. Each rotation zeroes an element in the subdiagonal of the matrix, forming the
𝑹 matrix. The concatenation of all the Givens rotations forms the orthogonal 𝑸 matrix.

In practice, Givens rotations are not actually performed by building a whole matrix and
doing a matrix multiplication. A Givens rotation procedure is used instead which does
the equivalent of the sparse Givens matrix multiplication, without the extra work of
handling the sparse elements. The Givens rotation procedure is useful in situations
where only a relatively few off diagonal elements need to be zeroed, and is more easily
parallelized than Householder transformations.
Algorithm: (QR and Real Schur Form)
function [𝑯] = garow(𝑯, 𝑐, 𝑠, 𝑖, 𝑘, 𝑗1 , 𝑗2 )
function [𝑻, 𝑸, 𝑹] = SchurForm(𝑨, nmax) for 𝑗 = 𝑗1 : 𝑗2
[𝑛, 𝑚] = 𝑠𝑖𝑧𝑒(𝑨); 𝑡1 = 𝑯(𝑖, 𝑗); 𝑡2 = 𝑯(𝑘, 𝑗);
if 𝑛~ = 𝑚, 𝑯(𝑖, 𝑗) = 𝑐 ⋆ 𝑡1 − 𝑠 ⋆ 𝑡2 ; 𝑯(𝑘, 𝑗) = 𝑠 ⋆ 𝑡1 + 𝑐 ⋆ 𝑡2 ;
error('The matrix is not square'); end
end ---------------------------------
[𝑯, 𝑸ℎ𝑒𝑠𝑠 ] = hess(𝑨);
for 𝑗 = 1: nmax function [𝑐, 𝑠] = givcos(𝑥𝑖 , 𝑥𝑘 )
[𝑸, 𝑹, 𝑐, 𝑠] = qrgivens(𝑯);
𝑯 = 𝑹; if 𝑥𝑘 = 0
for 𝑘 = 1: 𝑛 − 1, 𝑐 = 1; 𝑠 = 0
𝑻 = gacol(𝑯, 𝑐(𝑘), 𝑠(𝑘),1, 𝑘 + 1, 𝑘, 𝑘 + 1); else
end if |𝑥𝑘 | > |𝑥𝑖 |
end 𝜏 = − 𝑥𝑖 ⁄𝑥𝑘 ; 𝑠 = 1⁄√1 + 𝜏 2 ; 𝑐 = 𝑠𝜏;
--------------------------------- else
function [𝑸, 𝑹, 𝑐, 𝑠] = qrgivens(𝑯) 𝜏 = − 𝑥𝑘 ⁄𝑥𝑖 ; 𝑐 = 1⁄√1 + 𝜏 2 ; 𝑠 = 𝑐𝜏;
[𝑚, 𝑛] = 𝑠𝑖𝑧𝑒(𝑯); end
for 𝑘 = 1: 𝑛 − 1 end
[𝑐(𝑘), 𝑠(𝑘)] = givcos(𝑯(𝑘, 𝑘), 𝑯(𝑘 + 1, 𝑘)); ---------------------------------
𝑯 = garow(𝑯, 𝑐(𝑘), 𝑠(𝑘), 𝑘, 𝑘 + 1, 𝑘, 𝑛); function 𝑸 = prodgiv(𝑐, 𝑠, 𝑛)
end 𝑛1 = 𝑛 − 1; 𝑛2 = 𝑛 − 2; 𝑸 = 𝑒𝑦𝑒(𝑛);
𝑹 = 𝑯; 𝑸 = prodgiv(𝑐, 𝑠, 𝑛); 𝑸(𝑛1 , 𝑛1 ) = 𝑐(𝑛1 ); 𝑸(𝑛, 𝑛) = 𝑐(𝑛1 );
--------------------------------- 𝑸(𝑛1 , 𝑛) = 𝑠(𝑛1 ); 𝑸(𝑛, 𝑛1 ) = −𝑠(𝑛1 );
function [𝑯] = gacol(𝑯, 𝑐, 𝑠, 𝑗1 , 𝑗2 , 𝑖, 𝑘) for 𝑘 = 𝑛2 : −1: 1,
for 𝑗 = 𝑗1 : 𝑗2 𝑘1 = 𝑘 + 1;
𝑡1 = 𝑯(𝑗, 𝑖); 𝑡2 = 𝑯(𝑗, 𝑘); 𝑸(𝑘, 𝑘) = 𝑐(𝑘); 𝑸(𝑘1 , 𝑘) = −𝑠(𝑘);
𝑯(𝑗, 𝑖) = 𝑐 ⋆ 𝑡1 − 𝑠 ⋆ 𝑡2 ; 𝑞 = 𝑸(𝑘1 , 𝑘1 : 𝑛); 𝑸(𝑘, 𝑘1 : 𝑛) = 𝑠(𝑘) ⋆ 𝑞;
𝑯(𝑗, 𝑘) = 𝑠 ⋆ 𝑡1 + 𝑐 ⋆ 𝑡2 ; 𝑸(𝑘1 , 𝑘1 : 𝑛) = 𝑐(𝑘) ⋆ 𝑞;
end end

Remark: This QR-Schur iterations do not always converge to the real Schur form of a
given matrix 𝑨. An effective technique to improve the converge results is to introduce a
translation (shift) technique in the QR method. This leads to the so called: shifted QR
method which is used to speed up the convergence QR iterations when the eigenvalues of
𝑨 are close to one of the other.

Example:
A =
3 17 -37 18 -40
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0

T =
4.9997 18.9739 -34.2570 32.8760 -28.4604
0.0002 -3.9997 6.7693 -6.4968 5.6216
0 0.0000 2.0000 -1.4557 1.1562
0 0 0.0000 0.3129 -0.8709
0 0 0 1.2607 -0.3129
clear all,clc, A=[3 17 -37 18 -40;eye(4,4) zeros(4,1)] %A=10*rand(8,8);
nmax=40; [n,m]=size(A); tol=0.001;
if n~=m, error('The matrix is not square'); end
%--------------------------------------%
[Qhess T]=hess(A); [m,n]=size(T); % you can use the code given early
%--------------------------------------%
for j=1:nmax
H=T;
for k=1:n-1
if H(k+1,k)==0
c(k)=1; s(k)=0;
else
if abs(H(k+1,k))>abs(H(k,k))
t=-H(k,k)/H(k+1,k);
s(k)=1/sqrt(1+t^2); c(k)=s(k)*t;
else
t=-H(k+1,k)/H(k,k);
c(k)=1/sqrt(1+t^2); s(k)=c(k)*t;
end
end
for j=k:n
t1=H(k,j); t2=H(k+1,j);
H(k,j)=c(k)*t1-s(k)*t2;
H(k+1,j)=s(k)*t1+c(k)*t2;
end
end
%--------------------------------------%
R=H; n1=n-1; n2=n-2;
Q=eye(n); Q(n1,n1)=c(n1); Q(n,n)=c(n1);
Q(n1,n)=s(n1); Q(n,n1)=-s(n1);
for k=n2:-1:1,
k1=k+1; Q(k,k)=c(k); Q(k1,k)=-s(k);
q=Q(k1,k1:n); Q(k,k1:n)=s(k)*q;
Q(k1,k1:n)=c(k)*q;
end
T=R;
for k=1:n-1,
for j=1:k+1
t1=T(j,k); t2=T(j,k+1);
T(j,k)=c(k)*t1-s(k)*t2;
T(j,k+1)=s(k)*t1+c(k)*t2;
end
end
end
%--------------------------------------%
T, T_MATLAB=schur(A)
eig(T), eig(A)
The next step in order to achieve a competitive algorithm is to improve the convergence
speed of the QR-method. We will now see that the convergence to the Schur form can be
dramatically improved by considering a QR-step applied to the matrix formed by
subtracting a multiply of the identity matrix. This type of acceleration is called
shifting.

Algorithm: (The Shifted QR Iteration)

function [𝑻] = ShiftedQR(𝑨, nmax)


[𝑛, 𝑚] = 𝑠𝑖𝑧𝑒(𝑨);
if 𝑛~ = 𝑚, error('The matrix is not square'); end
[𝑸ℎ𝑒𝑠𝑠 , 𝑻] = hess(𝑨);
for 𝑘 = 𝑛: −1: 2
𝑰 = eye(𝑘, 𝑘);
while abs(𝑻(𝑘, 𝑘 − 1)) > tol ⋆ (abs(𝑻(𝑘, 𝑘)) + abs(𝑻(𝑘 − 1, 𝑘 − 1)))
iter = iter + 1; if iter > nmax, return; end
𝜇 = 𝑻(𝑘, 𝑘);
[𝑸, 𝑹] = qr(𝑻(1: 𝑘, 1: 𝑘) − 𝜇𝑰);
𝑻(1: 𝑘, 1: 𝑘) = 𝑹𝑸 + 𝜇𝑰;
end
𝑻(𝑘, 𝑘 − 1) = 0;
end
---------------------------------

clear all,clc, A=[3 17 -37 18 -40;eye(4,4) zeros(4,1)];% A=10*rand(8,8);


iter=0; nmax=40; tol=0.001; [n,m]=size(A);
if n~=m, error('The matrix is not square'); end
%--------------------------------------%
[Qhess T]=hess(A); [m,n]=size(T); % you can use the code given early
%--------------------------------------%
for k=n:-1:2
I=eye(k);
while abs(T(k,k-1))>tol*(abs(T(k,k))+abs(T(k-1,k-1)))
iter=iter+1;
if iter>nmax, return; end
mu=T(k,k);
[Q,R]=qr(T(1:k,1:k)-mu*I);
T(1:k,1:k)=R*Q+mu*I;
end
T(k,k-1)=0;
end
%--------------------------------------%
T,
TMATLAB=schur(A),
eig(T),
eig(A)
Up to now, we worked in the setting of Single shift QR-algorithm and hence required that
𝑨 is a non-defective matrix with real eigenvalues. In order to extend this to more general
matrices, we have to change our strategy for the choice of shifts.

The simple shift strategy we described in the previous section gives local quadratic
convergence, but it is not globally convergent. As a particular example, consider what
happens if we want to compute a complex conjugate pair of eigenvalues of a real matrix.
With our simple shifting strategy, the iteration will never produce a complex iterate, a
complex shift, or a complex eigenvalue. The best we can hope for is that our initial shift
is closer to both eigenvalues in the conjugate pair than it is to anything else in the
spectrum;

Algorithm: (The Double Shift Explicit QR Iteration)

function [𝑻] = DoubleShiftQR(𝑨, nmax) clear all; clc, A=10*rand(8,8);


H=hess(A); tol=0.001;
[𝑛, 𝑚] = 𝑠𝑖𝑧𝑒(𝑨); n=size(A,1); L1=eig(A)
mu1=(i+1); mu2=(2*i+1);
if 𝑛~ = 𝑚, for k=1:80
error('The matrix is not square'); [Q1 R1]=qr(H-(i+1)*eye(n,n));
end H1=R1*Q1+(i+1)*eye(n,n);
[𝑸ℎ𝑒𝑠𝑠 , 𝑻] = hess(𝑨); [Q2 R2]=qr(H1-(2*i+1)*eye(n,n));
H=R2*Q2+(2*i+1)*eye(n,n);
for 𝑘 = 1: … 𝑑𝑜 end
Choose the two shifts 𝜇1 , 𝜇2 %----------------------------------%
[𝑸1 , 𝑹1 ] = qr(𝑻 − 𝜇1 𝑰); for i=1:n,
𝑻1 = 𝑹1 𝑸1 + 𝜇1 𝑰; for j=1:n
[𝑸2 , 𝑹2 ] = qr(𝑻1 − 𝜇2 𝑰); if norm(H(i,j))<tol
𝑻2 = 𝑹2 𝑸2 + 𝜇2 𝑰; H(i,j)=0;
𝑻 = (𝑸1 𝑸2 )𝑻2 (𝑸1 𝑸2 )𝑇 ; end;end;end;
end H;
L2=diag(H)

The shifts 𝜇1 and 𝜇2 at each iteration are chosen as the eigenvalues of the 2 × 2 trailing
principal submatrix at that iteration. The process is called the explicit double-shift QR
iteration process.

The above explicit scheme requires complex arithmetic (since 𝜇1 and 𝜇2 are complex) to
implement, and furthermore, the matrices 𝑻 − 𝜇1 𝑰 and 𝑻1 − 𝜇2 𝑰 need to be formed
explicitly. In practice, an equivalent implicit version, known as the double shift implicit
QR iteration scheme, is used. We state one step of this process in the following.

Algorithm: (The Double Shift Implicit QR Iteration)

■ Compute the first column 𝒏1 = 𝑵(: ,1) of 𝑵 = (𝑻 − 𝜇1 𝑰)(𝑻 − 𝜇2 𝑰) = 𝑻2 − (𝜇1 + 𝜇2 )𝑻 + 𝜇1 𝜇2 𝑰.


■ Find a Householder matrix 𝑸𝑜 such that 𝑸𝑜 𝒏1 is a multiple of 𝒆1 .
■ Find Householder matrices 𝑸1 through 𝑸𝑛−2 such that: 𝑻2 = (𝑸0 𝑸1 … 𝑸𝑛−2 )𝐻 𝑻𝑸0 𝑸1 … 𝑸𝑛−2
is an upper Hessenberg matrix.
Transformation from Schur to Jordan Forms: Let the matrix 𝚲 be the Jordan form of
𝑨 so that 𝑨 = 𝑽𝚲𝑽−1. Now we apply the QR algorithm on the matrix (𝑽−1 )𝐻 so
that (𝑽−1 )𝐻 = 𝑸𝑹 and define a new matrix 𝑻 = 𝑸𝐻 𝑨𝑸, this new matrix is a lower
triangular form.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% RATIONAL TRANSFORMATION FROM SCHUR TO JORDAN FORM
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
clear all, clc,
% M1=10*rand(5,5); A=M1*diag([-1 -2 -3 -4 -5])*inv(M1);
A=[3 5 4 6 1;4 5 6 7 7;3 3 2 7 8;4 2 8 7 7;3 6 9 5 1];
[V D]=eig(A); W=(inv(V))'; % inv(V)*A*V
[Q R]=qr(W); T=Q'*A*Q;
T = tril(T), Q,

Remark: If we select 𝑽 = 𝑸𝑹 then 𝑻 = 𝑸𝐻 𝑨𝑸, is an upper triangular form.

clear all, clc, M1=10*rand(5,5); A=M1*diag([-1 -2 -3 -4 -5])*inv(M1);


[V D]=eig(A); [Q R]=qr(V); T=Q'*A*Q; T = triu(T)
[V1 D1]=eig(T); [Vnew,Dnew] = cdf2rdf(V,D);
T= inv(Vnew)*Dnew*Vnew, Q
%---------------- verifications by MATLAB instructions ----------------%
[Q2 T2]=schur(A); T2, Q2,

This is perhaps the most widely used


factorization in modern applications of linear algebra and, because of its widespread use,
is sometimes referred to as the “Singularly Valuable Decomposition.” See Kalman (1996)
for an excellent expository article of the same name. In linear algebra, the singular value
decomposition (SVD) is a factorization of a real or complex matrix that generalizes the
Eigen-decomposition of a square normal matrix to any 𝑚 × 𝑛 matrix via an extension of
the polar decomposition. SVD may be the most important matrix decomposition of all,
for both theoretical and computational purposes.

The SVD considers a rectangular 𝑚 × 𝑛 matrix 𝑨 with real entries. If the rank of 𝑨 is 𝑟,
then the SVD says that there are unique positive constants 𝜎1 ≥ 𝜎2 ≥ ⋯ ≥ 𝜎𝑟 and
orthogonal matrices 𝑼 and 𝑽 of order 𝑚 × 𝑚 and 𝑛 × 𝑛, respectively, such that

∑ 𝑶 𝑇 [𝑼 ∑ 𝑶 𝑽1𝑇
𝑨 = 𝑼𝑫𝑽𝑇 = 𝑼 ( ) 𝑽 = 1 𝑼2 ] ( ) ( ) = 𝑼1 ∑𝑽1𝑇
𝑶 𝑶 𝑶 𝑶 𝑽𝑇2

where ∑ is an 𝑟 × 𝑟 diagonal matrix whose 𝑖 𝑡ℎ diagonal element is 𝜎𝑖 . The 𝑶 matrices have


compatible numbers of rows and columns for the above partition to make sense. These
𝜎𝑖 ’s are called the singular values of 𝑨.

𝑼 is 𝑚 × 𝑚 orthogonal (the left singular vectors of 𝑨.) 𝜎1 0


∑ 𝑶 ⋱
) = [[ ] 𝑶]
𝑫 is 𝑚 × 𝑛 block diagonal (the singular values of 𝑨.) 𝑫=(
𝑶 𝑶 0 𝜎𝑟
𝑽 is 𝑛 × 𝑛 orthogonal (the right singular vectors of 𝑨.) 𝑶 𝑶
𝑨𝐯 = 𝜎 𝐮𝑗 𝑗 = 1,2, … , 𝑟 𝑨𝑇 𝐮𝑗 = 𝜎𝑗 𝐯𝑗 𝑗 = 1,2, … , 𝑟
𝑨 = 𝑼𝑫𝑽𝑇 ⟺ {𝑨𝐯𝑗 = 𝟎𝑗 𝑗 = 𝑟 + 1,2, … , 𝑛
; 𝑨𝑇 = 𝑽𝑫𝑼𝑇 ⟺ {𝑨𝐮
𝑗 𝑗 =𝟎 𝑗 = 𝑟 + 1,2, … , 𝑛

Summary: It follows that (See the Algebra book by BEKHITI B 2020)

■ {𝜎𝑗2 }, 𝑗 = 1, 2, … , 𝑟, are positive eigenvalues of 𝑨𝑇 𝑨. 𝑨𝑇 𝑨𝐯𝑗 = 𝜎𝑗 𝑨𝑇 𝐮𝑗 = 𝜎𝑗2 𝐯𝑗 𝑗 = 1, 2, … , 𝑟


So, the singular values play the role of eigenvalues.
■ Equation 𝑨 = 𝑼𝑫𝑽𝑇 gives how to find the singular values and the right singular vectors,
while 𝑨𝑇 = 𝑽𝑫𝑼𝑇 shows a way to compute the left singular vectors.
■ (Dyadic decomposition) The matrix 𝑨 ∈ ℝ𝑚×𝑛 , with rank(𝑨) = 𝑟 ≤ 𝑛, can be expressed
as 𝑨 = ∑𝑟𝑗=1 𝜎𝑗 𝐮𝑗 𝐯𝑗𝑇 . This property has been utilized for various approximations and
applications, e.g., by dropping singular vectors corresponding to small singular values.

Geometric Interpretation of the SVD: The matrix 𝑨 maps an orthonormal basis


𝛽1 = {𝐯1 , 𝐯2 , … , 𝐯𝑛 } of ℝ𝑛 onto a new “scaled” orthogonal basis 𝛽2 = {𝜎1 𝐮1 , 𝜎2 𝐮2 , … , 𝜎𝑟 𝐮𝑟 } for a
subspace of ℝ𝑚 .
𝑨
𝛽1 = {𝐯1 , 𝐯2 , … , 𝐯𝑛 } → 𝛽2 = {𝜎1 𝐮1 , 𝜎2 𝐮2 , … , 𝜎𝑟 𝐮𝑟 }

Consider a unit sphere 𝓢𝑛−1 in ℝ𝑛 :


𝑛

𝓢 = {𝐱 | ∑ 𝑥𝑗2 = 1}
𝑗=1
Then, ∀𝐱 ∈ 𝓢𝑛−1 , can be written as linear combination of the orthonormal basis 𝛽1

𝐱 = 𝑥1 𝐯1 + 𝑥2 𝐯2 + ⋯ + 𝑥𝑛 𝐯𝑛 ⟺ 𝑨𝐱 = 𝑥1 𝑨𝐯1 + 𝑥2 𝑨𝐯2 + ⋯ + 𝑥𝑛 𝑨𝐯𝑛


⟺ 𝑨𝐱 = 𝜎1 𝑥1 𝐮1 + 𝜎2 𝑥2 𝐮2 + ⋯ + 𝜎𝑟 𝑥𝑟 𝐮𝑛
Let we define 𝑦𝑗 = 𝜎𝑗 𝑥𝑗 then 𝑨𝐱 = 𝑦1 𝐮1 + 𝑦2 𝐮2 + ⋯ + 𝑦𝑟 𝐮𝑛 So, we have
𝑛 𝑟 2
𝑦𝑗
∑ 𝑥𝑗2 = 1 (sphere) ⟺ ∑( ) = 𝛼 ≤ 1 (ellipsoid)
𝜎𝑗
𝑗=1 𝑗=1

Remark: the singular values play the role of eigenvalues where

𝜎𝑗 = √𝜆𝑗 (𝑨𝐻 𝑨) 𝑗 = 1, … , 𝑛 if 𝑛 ≤ 𝑚 & 𝜎𝑗 = √𝜆𝑗 (𝑨𝑨𝐻 ) 𝑗 = 1, … , 𝑚 if 𝑚 < 𝑛

Theorem: Let 𝑨 ∈ ℝ𝑚×𝑛 and 𝑝 = min(𝑚, 𝑛). Let we have the following decomposition
𝑨 = 𝑼∑𝑽𝑇 be the SVD of 𝑨, with 𝜎1 ≥ 𝜎2 ≥ ⋯ ≥ 𝜎𝑟 ≥ 𝜎𝑟+1 = ⋯ 𝜎𝑝 = 0 Then,

rank(𝑨) = 𝑟 ‖𝑨‖2= 𝜎1
Null(𝑨) = span{𝐯𝑟+1, 𝐯𝑟+2, … , 𝐯𝑛 } ‖𝑨‖2𝐹= 𝜎21 + 𝜎22 + ⋯ + 𝜎2𝑟
❶ Range𝑟 𝑨 = span 𝐮1 , 𝐮2 , … , 𝐮𝑟
( ) { }
❷ min(‖𝑨𝐱‖2 /‖𝐱‖2) = 𝜎𝑟
𝐱≠0
𝑨 = ∑ 𝜎𝑗 𝐮𝑗 𝐯𝑇𝑗 −1
{ 𝜅2
(𝑨) = ‖𝑨‖2 ‖𝑨 ‖ = 𝜎1/𝜎𝑟
{ 𝑗=1 2

{𝐮1 , 𝐮2 , … , 𝐮𝑟 } is an orthonormal basis for col(𝑨).


{𝐮𝑟+1 , 𝐮𝑟+2 , … , 𝐮𝑚 } is an orthonormal basis for null(𝑨𝑇 ).
{𝐯1 , 𝐯2 , … , 𝐯𝑟 } is an orthonormal basis for row(𝑨).
{𝐯𝑟+1 , 𝐯𝑟+2 , … , 𝐯𝑛 } is an orthonormal basis for null(𝑨).
Proof: (See the Algebra book by BEKHITI B 2020)
From the four fundamental subspaces we conclude that

ℝ𝑚 = span {𝐮
⏟1 , 𝐮2 , … , 𝐮𝑟 , 𝐮
⏟𝑟+1 , 𝐮𝑟+2 , … , 𝐮𝑚 }
ℛ(𝑨) 𝒩(𝑨𝑇 )
𝑇)
= ℛ(𝑨)⨁𝒩(𝑨

ℝ𝑛 = span {𝐯
⏟1 , 𝐯2 , … , 𝐯𝑟 , ⏟
𝐯𝑟+1 , 𝐯𝑟+2 , … , 𝐯𝑛 }
ℛ(𝑨𝑇 ) 𝒩(𝑨)
𝑇 )⨁𝒩(𝑨)
= ℛ(𝑨

𝑪 𝟎
Observe that 𝐮𝑗𝑇 𝑨𝐯𝑗 = 𝟎 for 𝑖 > 𝑟 or 𝑗 > 𝑟. Therefore 𝑼𝑇 𝑨𝑽 = 𝑹 = ( ) , 𝑪 ∈ ℝ𝑟×𝑟
𝟎 𝟎
𝑪 𝟎 𝑇
𝑨 = 𝑼𝑹𝑽𝑇 = 𝑼 ( ) 𝑽 General class of URV decompositions
𝟎 𝟎
Remark:1 Singular value decomposition is a special case of URV where 𝑹 =diagonal=∑.
Remark:2 For a matrix 𝑨 ∈ ℝ𝑚×𝑛 the singular value decomposition algorithm is extremely
stable, and is of computation cost 4𝑚2 𝑛 + 8𝑛2 𝑚 + 9𝑛3 .

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or


complex matrix that generalizes the Eigen-decomposition of a square normal matrix to
any 𝑚 × 𝑛 matrix via an extension of the polar decomposition. Here an algorithm that
does this decomposition. In order to write an SVD code you might use a basic scheme of
Householder transformations to bidiagonalize the matrix. Then further rotations to kill
off the off-diagonal elements. One idea might be to use QR, iteratively, and if you want to
overcome the problem of repeatedly calling QR, you could easily replace the QR calls
with Householder transformations.

The idea is to use the QR decomposition on 𝑨 to gradually "pull" 𝑼 out from the left and
then use QR on 𝑨 transposed to "pull" 𝑽 out from the right. This process makes 𝑨 lower
triangular and then upper triangular alternately. Eventually, 𝑨 becomes both upper and
lower triangular at the same time, (i.e. Diagonal) with the singular values on the
diagonal.

Algorithm:
Initialization: Give 𝑨 ∈ ℝ𝑚×𝑛 , 𝑼 = 𝑰𝑚 , 𝑽 = 𝑰𝑛 , 𝜀 and 𝛼 = ∞
While 𝛼 > 𝜀
⦁ The QR decomposition of 𝑨 gives 𝑨 = 𝑸𝑢 𝑹𝑢 and define 𝑼 = 𝑼 ⋆ 𝑸𝑢
⦁ and the QR decomposition of 𝑹𝑇𝑢 gives 𝑹𝑢 = 𝑹𝑇𝑣 𝑸𝑇𝑣 and define 𝑽 = 𝑽 ⋆ 𝑸𝑣
⦁ Thus, at every iteration, we have 𝑨 = 𝑸𝑢 𝑹𝑇𝑣 𝑸𝑇𝑣 ,
⦁ update 𝑨 ⇐ 𝑹𝑇𝑣 and repeat the orthogonalizations
⦁ 𝛼 = ‖𝑡𝑟𝑖𝑙(𝑨, −1)‖∞
end
𝑼 = 𝑼(: ,1: 𝑛); 𝑺 = triu(𝑨(1: 𝑛, : ));
𝑽 = 𝐕.⋆ (sign(diag(𝑺))) .𝑇 ; 𝑺 = abs(𝑺); % correct negative singular values
𝑍𝑒𝑟𝑜 = 𝑨 − 𝑼 ⋆ 𝑺 ⋆ 𝑽𝑇
clear all, clc, A=10*rand(5,3); A1=A; % m>n
[m,n] = size(A); U = eye(m); V = eye(n); tol = max(abs(A(:)))*1.e-15;
Arem = inf;
while Arem > tol
[Qu,Ru] = qr(A);
U = U*Qu;
[Qv,Rv] = qr(Ru');
V = V*Qv;
A = Rv';
Arem = norm(tril(A,-1),inf); % exit when we get "close"
end
U = U(:,1:n); S = triu(A(1:n,:));
V = V.*sign(diag(S)).'; S = abs(S); % correct negative singular values
Zero=U*S*V'-A1

clear all, clc, A=10*rand(5,8); A1=A; % n>m


A=A'; [m,n]=size(A); U=eye(m); V=eye(n); tol=max(abs(A(:)))*1.e-15;
Arem = inf;
while Arem > tol
[Qu,Ru] = qr(A);
U = U*Qu;
[Qv,Rv] = qr(Ru');
V = V*Qv;
A = Rv';
Arem = norm(tril(A,-1),inf); % exit when we get "close"
end
U = U(:,1:n); S = triu(A(1:n,:));
% correct any negative singular values
V = V.*sign(diag(S)).'; S = abs(S);
U1=V; V1=U; S1=S';
Zero=U1*S1*V1'-A1

Applications of the SVD:


■ Range, null space and rank computations
■ Solving (rank-deficient) LS problems (pseudoinverse 𝑨+ = 𝑽∑+ 𝑼𝑇 , when 𝑨 = 𝑼∑𝑽𝑇 )
■ Low-rank matrix approximation
■ Control engineering (Model Order reduction and minimal realization etc…)
■ Denoising by Least squares
■ Signal processing Image/Data compression
■ Principal component analysis (e.g., signal processing and pattern recognition)
■ Numerical weather prediction
■ Reduced order modeling
■ Inverse problem theory
■ Recognition of Handwritten Digits. And Text Mining
■ Page Ranking for a Web Search Engine
■ Automatic Key Word and Key Sentence Extraction
■ Face Recognition Using Tensor SVD
In numerical analysis, one of the most
important problems is designing efficient and stable algorithms for finding the
eigenvalues of a matrix. These eigenvalue algorithms may also find eigenvectors.

Any monic polynomial is the characteristic polynomial of its companion matrix.


Therefore, a general algorithm for finding eigenvalues could also be used to find the roots
of polynomials. The Abel–Ruffini theorem shows that any such algorithm for dimensions
greater than 4 must either be infinite, or involve functions of greater complexity than
elementary arithmetic operations and fractional powers. For this reason algorithms that
exactly calculate eigenvalues in a finite number of steps only exist for a few special
classes of matrices. For general matrices, algorithms are iterative, producing better
approximate solutions with each iteration.

The eigenvalues can be determined by forming the characteristic polynomial and finding
its roots. However, this procedure is generally not recommended for numerical
computations. The difficulty is that often a small change in one or more of the
coefficients of the characteristic polynomial can result in a relatively large change in the
computed zeros of the polynomial. For example, consider the polynomial 𝑝(𝜆) = 𝜆10. The
lead coefficient is 1 and the remaining coefficients are all 0. If the constant term is
altered by adding −2−10 , we obtain the polynomial 𝑞(𝜆) = 𝜆10 − 2−10 . Although the
coefficients of 𝑝(𝜆) and 𝑞(𝜆) differ only by 2−10 , the roots of 𝑞(𝜆) all have absolute value
1⁄2 , whereas the roots of 𝑝(𝜆) are all 0. Thus, even when the coefficients of the
characteristic polynomial have been determined accurately, the computed eigenvalues
may involve significant error. For this reason, the methods presented in this section do
not involve the characteristic polynomial. To see that there is some advantage to working
directly with the matrix 𝑨, we must determine the effect that small changes in the entries
of 𝑨 have on the eigenvalues. This is done in the next theorem.

Theorem: (Bauer-Fike) Let 𝑨 be an 𝑛 × 𝑛 matrix with 𝑛 linearly independent


eigenvectors, and let 𝑿 be a matrix that diagonalizes 𝑨. That is, 𝑿−1 𝑨𝑿 = 𝑫 = diag(𝜆𝑖 )𝑛𝑖=1. If
𝑨1 = 𝑨 + 𝑬 and 𝜇 is an eigenvalue of 𝑨1 , then min1≤𝑖≤𝑛 |𝜇 − 𝜆𝑖 | ≤ cond(𝑿)‖𝑬‖2 .

Proof: We may assume that 𝜇 is unequal to any of the 𝜆𝑖 ’s (otherwise there is nothing to
prove). Thus, if we set 𝑫1 = 𝑫 − 𝜇𝑰 then 𝑫1 is a nonsingular diagonal matrix. Since 𝜇 is an
eigenvalue of 𝑨1 it is also an eigenvalue of 𝑿−1 𝑨1 𝑿 Therefore, 𝑿−1 𝑨1 𝑿 − 𝜇𝑰 is singular, and
hence 𝑫1−1 (𝑿−1 𝑨1 𝑿 − 𝜇𝑰) is also singular. But

𝑫1−1 (𝑿−1 𝑨1 𝑿 − 𝜇𝑰) = 𝑫1−1 𝑿−1 (𝑨1 − 𝜇𝑰) 𝑿


= 𝑫1−1 𝑿−1 (𝑨 + 𝑬 − 𝜇𝑰) 𝑿
= 𝑫1−1 𝑿−1 𝑬𝑿 + 𝑰

Therefore, −1 is an eigenvalue of 𝑫1−1 𝑿−1 𝑬𝑿. It follows that

|−1| ≤ ‖𝑫1−1 𝑿−1 𝑬𝑿‖2 ≤ ‖𝑫1−1 ‖2 (‖𝑿−1 ‖2 ‖𝑿‖)‖𝑬‖2


≤ ‖𝑫1−1 ‖2 cond(𝑿)‖𝑬‖2

The 2-norm of 𝑫1−1 is given by max1≤𝑖≤𝑛 |𝜇 − 𝜆𝑖 |−1 . The index 𝑖 that maximizes |𝜇 − 𝜆𝑖 |−1 is
the same index that minimizes |𝜇 − 𝜆𝑖 |. Thus, min1≤𝑖≤𝑛 |𝜇 − 𝜆𝑖 | ≤ cond(𝑿)‖𝑬‖2 
Because the eigenvalues of a triangular matrix are its diagonal elements, for general
matrices there is no finite method like Gaussian elimination to convert a matrix to
triangular form while preserving eigenvalues. But it is possible to reach something close
to triangular (upper/lower Hessenberg matrix). Hessenberg and tridiagonal matrices are
the starting points for many eigenvalue algorithms because the zero entries reduce the
complexity of the problem. Several methods are commonly used to convert a general
matrix into a Hessenberg matrix with the same eigenvalues. If the original matrix was
symmetric or Hermitian, then the resulting matrix will be tridiagonal.

Here you are given the illustration of the most well-known algorithms for Eige_Problems

The Unsymmetric Eigenvalue Problem: The Symmetric Eigenvalue Problem:

⦁ Power Iterations ⦁ The Power Method


⦁ Inverse Iteration ⦁ Inverse Iteration
⦁ Orthogonal Iteration ⦁ Rayleigh Quotient Iteration
⦁ The QR Iteration ⦁ Orthogonal Iteration
⦁ LR Iterations ⦁ The QR Iteration
⦁ QR with Hessenberg Form ⦁ (Householder Tridiagonalization)
⦁ Shifted QR Iteration ⦁ Deflation Technique
⦁ Deflation Technique ⦁ Choleaky Decomposition method
⦁ The QZ Method for 𝑨𝐱 = 𝜆𝑩𝐱

The power method as written here will only converge to a


single eigenvalue of the matrix 𝑨. There are several approaches to modifying the power
method to find the 𝑘 dominant eigenvalues of 𝑨. One technique, deflation, is reasonably
straight forward: once the eigenpair (𝐱1 , 𝜆1 ) is computed, a transformation is applied to
the matrix 𝑨 to move 𝜆1 to the interior of the spectrum, so that the second largest
eigenvalue 𝜆2 becomes the dominant eigenvalue of the transformed matrix. This process
is repeated until the 𝑘 dominant eigenvalues have been found.

Let 𝑨 ∈ ℂ𝑛×𝑛 be a diagonalizable matrix and let 𝑿 = [𝐱1 , … , 𝐱 𝑛 ] ∈ ℂ𝑛×𝑛 be the matrix of its
eigenvectors 𝐱 𝑖 , for 𝑖 = 1, . . . , 𝑛. Let us also suppose that the eigenvalues of 𝑨 are ordered
as |𝜆1 | > |𝜆2 | > ⋯ > |𝜆𝑛 | where 𝜆1 has algebraic multiplicity equal to 1. Under these
assumptions, 𝜆1 is called the dominant eigenvalue of matrix 𝑨.

Given an arbitrary initial vector 𝒒0 ∈ ℂ𝑛 of unit Euclidean 𝐟𝐨𝐫 𝑘 = 1: 𝑛


norm, consider for 𝑘 = 1, 2, … the following iteration based on 𝒛𝑘 = 𝑨𝒒𝑘−1
𝒒𝑘 = 𝒛𝑘 /‖𝒛𝑘 ‖2
the computation of powers of matrices, commonly known as
𝜂𝑘 = 𝒒𝑘 𝐻 𝑨𝒒𝑘
the power method.
𝐞𝐧𝐝

Let us analyze the convergence properties of this method. By induction on 𝑘 one can
check that 𝒒𝑘 = (𝑨𝑘 𝒒0 )⁄‖𝑨𝑘 𝒒0 ‖ with 𝑘 ≥ 1. This relation explains the role played by the
powers of A in the method. Because A is diagonalizable, its eigenvectors 𝐱 𝑖 form a basis
of ℂ𝑛 ; it is thus possible to represent 𝒒0 as
𝑛

𝒒0 = ∑ 𝛼𝑖 𝐱 𝑖 𝛼𝑖 ∈ ℂ𝑛 , 𝑖 = 1, 2, … 𝑛
𝑖=1
Moreover, since 𝐴𝐱 𝑖 = 𝜆𝑖 𝐱 𝑖 , we have 𝑨𝑘 𝒒0 = ∑𝑛𝑖=1 𝛼𝑖 𝑨𝑘 𝐱 𝑖 = ∑𝑛𝑖=1 𝛼𝑖 𝜆𝑘𝑖 𝐱 𝑖
𝑛
𝛼𝑖 𝜆𝑖 𝑘
𝑘
𝑨 𝒒0 = 𝛼1 𝜆1𝑘 (𝐱1 + ∑ ( ) 𝐱 𝑖 ) 𝑖 = 1, 2, … 𝑛
𝛼1 𝜆1
𝑖=2

Since |𝜆𝑖 /𝜆1 | < 1 for 𝑖 = 1, 2, … 𝑛, as 𝑘 increases the vector 𝑨𝑘 𝒒0 , tends to assume an
increasingly significant component in the direction of the eigenvector 𝐱1

lim 𝑨𝑘 𝒒0 = 𝛼1 𝜆1𝑘 𝐱1
𝑘→∞

𝑨𝑘 𝒒0 𝛼1 𝜆1𝑘 𝐱1 𝐱1
lim 𝒒𝑘 = lim = 𝑘 =
𝑘→∞ 𝑘→∞ ‖𝑨 𝒒0 ‖2
𝑘
𝛼1 𝜆1 ‖𝐱1‖2 ‖𝐱1 ‖2

As 𝑘 → ∞, the vector 𝒒𝑘 thus aligns itself along the direction of eigenvector 𝐱1 . Therefore
the sequence of Rayleigh quotients 𝜂𝑘 will converge to 𝜆1 .

𝐻
𝐱1 𝐻 𝐱1 𝐱1 𝐻 𝜆1 𝐱1 𝐱1 𝐻 𝐱1
lim 𝜂𝑘 = lim 𝒒𝑘 𝑨𝒒𝑘 = ( ) 𝑨( )=( ) = 𝜆1 = 𝜆1
𝑘→∞ 𝑘→∞ ‖𝐱1 ‖2 ‖𝐱1 ‖2 ‖𝐱1 ‖2 ‖𝐱1 ‖2 (‖𝐱1 ‖2 )2

and the convergence will be faster when the ratio |𝜆2 /𝜆1 | is smaller.

4 5 −4 10
Example: matrix 𝐴 = ( ) needs only 6 steps, ratio is 0.1, while matrix 𝐴 = ( )
6 5 7 5
needs 68 steps, ratio is 0.9

clear all, clc,

% Power iteration for the dominant eigen-pair


% Input: A, square matrix not necessarily symmetric
% N: number of iterations
% Output: lambda, sequence of eigenvalue approximations (vector)
% x final eigenvector approximation

A=randi(10,4); A=0.5*(A'+A); A1=A; N=10; L=eig(A)


n = length(A); x = randn(n,1); x = x/norm(x,2);

for k = 1:N
q = A*x;
x = q/norm(q,2);
lambda(k) = x'*A*x;
end
lambda(N)
x

Similarly, the left eigenvector can be obtained using the following algorithm

𝒘𝑘 = 𝑨𝑇 𝒑𝑘−1 𝒗𝑘 = 𝒑𝑇𝑘−1 𝑨
{ 𝒑𝑘 = 𝒘𝑘 /‖𝒘𝑘 ‖2 ⟺ {𝒑𝑇𝑘 = 𝒗𝑘 /‖𝒗𝑘 ‖2
𝜂𝑘 = 𝒑𝑘 𝑇 𝑨𝑇 𝒑𝑘 𝜂𝑘 = 𝒑𝑘 𝑇 𝑨𝒑𝑘
clear all, clc, A=randi(10,4); A=0.5*(A'+A); A1=A; N=20; L=eig(A)
n = length(A); y = randn(1,n); y = y/norm(y,inf);
for k = 1:N
y = (y*A)/norm(y*A,2);
lambda(k) = y*A*y';
end
lambda(N)
y

The power method converges if 𝜆1 is dominant and if 𝒒0 has a component in the direction
of the corresponding dominant eigenvector 𝐱1 . The behavior of the iteration without these
assumptions is discussed in (Wilkinson 1965, p.570 and Parlett and Poole 1973).

Finding Other Eigenvectors If the matrix 𝑨 is symmetric then we can just remove the
dominant direction from the matrix and repeat the process.

Assume that the matrix 𝑨 is diagonalizable and let 𝐱 𝑖 , 𝐲𝑖 𝑇 be the right and left
eigenvectors respectively, means that 𝑨𝐱 𝑖 = 𝜆𝑖 𝐱 𝑖 , 𝐲𝑖 𝑇 𝑨 = 𝜆𝑖 𝐲𝑖 𝑇 , then by using the spectral
decomposition theorem
𝐲1 𝑇 𝐲1 𝑇 𝑛 𝑛
𝑇 𝑇
𝑨 = 𝑿𝜦𝒀 = [𝐱1 𝐱 2 … 𝐱 𝑛 ]𝜦 ( 𝐲2 ) = [𝜆1 𝐱1 𝜆2 𝐱 2 … 𝜆𝑛 𝐱 𝑛 ] ( 𝐲2 ) = ∑ 𝜆𝑖 𝐱 𝑖 𝐲𝑖 with ∑ 𝐱 𝑖 𝐲𝑖 𝑇 = 𝑰
𝑇
⋮ ⋮
𝑖=1 𝑖=1
𝐲𝑛 𝑇 𝐲𝑛 𝑇

Now use power iteration to find 𝐱1 and 𝜆1 then let 𝑨2 ← 𝑨 − 𝜆1 𝐱1 𝐲1 𝑇 repeat power iteration
on 𝑨2 to find 𝐱 2 and 𝜆2 continue like this for 𝜆3 , . . . , 𝜆𝑛 .

Remark: This method is good approximation only when the matrix 𝑨 is symmetric,
means that 𝑨 = 𝑿𝜦𝑿𝑇 . When 𝑨 is not symmetric then the method will be failed.

In order to obtain also the inverse of the symmetric matrix 𝑨 we use the following
iteration 𝑨 ← 𝑨 − 𝜆𝑖 𝐱 𝑖 𝐱 𝑖 𝑇 + (1/𝜆𝑖 )𝐱 𝑖 𝐱 𝑖 𝑇 . (Proposed by the author)

clear all, clc, M=10*rand(4,4);


A=M*diag([-10 -20 -30 -40])*inv(M) ; A=0.5*(A+A'); B=A; L1=eig(A),
n = length(A); N=100; L2=[]; X=[];
for i = 1:n
x = rand(n,1); x=x/norm(x,2);
for k = 1:N
x = (B*x)/norm((B*x),2);
lambda(k) = x'*B*x;
end
x1= x; X=[X,x1]; % zero= B*x - lambda(N)*x;
s=lambda(N); L2=[L2;s];
B = B - s*x1*x1'+ (1/s)*x1*x1';
end
L2, X, % All the eigenvalues and all eigenvectors of A
IA=A*B % B return to the inverse of A
Remark: this proposed method works only when, 𝑨 is symmetric matrix has at most one
eigenvalue of modulus less than one, and can be improved by conditional statements.

Inverse iteration is an algorithm to compute the


smallest eigenvalue (in modulus) of a symmetric matrix 𝑨:

Choose 𝐱 0
for 𝑘 = 1,2, … , 𝑚 (until convergence)
solve 𝑨𝐱 𝑘+1 = 𝐱 𝑘
normalize 𝐱 𝑘+1 ∶= 𝐱 𝑘+1 /‖𝐱 𝑘+1 ‖
𝑇 𝑇
𝜆𝑘 = 𝐱 𝑘+1 𝑨𝐱 𝑘+1 /(𝐱 𝑘+1 𝐱 𝑘+1 )
end
For large matrices, one can save operations if we compute the 𝐿𝑈 decomposition of the
matrix 𝑨 only once. The iteration is performed using the factors 𝑳 and 𝑼. This way, each
iteration needs only 𝑂(𝑛2 ) operations, instead of 𝑂(𝑛3 ) with the program above.

clear all, clc, A=randi(10,4); A=0.5*(A'+A); L=eig(A)


% Shifted inverse iteration for the closest Eigenpair.
% Input: A is square matrix
% s value close to desired eigenvalue (complex scalar)
% Output: gamma sequence of eigenvalue approximations (vector)
n = length(A); x = randn(n,1); x = x/norm(x,inf); s=max(L);
B = A - s*eye(n); [L,U] = lu(B); N=20; % N number of iterations
for k = 1:N
y = U\(L\x); [normy,m] = max(abs(y));
gamma(k) = x(m)/y(m) + s;
x = y/normy; % x final eigenvector approximation
end

The power method can be used to compute the eigenvalue


𝜆1 of largest magnitude and a corresponding eigenvector 𝐯1 . What about finding
additional eigenvalues and eigenvectors? If we could reduce the problem of finding
additional eigenvalues of 𝑨 to that of finding the eigenvalues of some (𝑛 − 1) × (𝑛 − 1)
matrix 𝑨1 , then the power method could be applied to 𝑨1 . This can actually be done by a
process called deflation.

The idea behind deflation is to find a nonsingular matrix 𝑯 such that 𝑯𝑨𝑯−1 is a matrix
of the form
𝜆1 × ⋯ ×
0
𝑯𝑨𝑯−1 = ( )
⋮ 𝑨1
0

Since 𝑨 and 𝑯𝑨𝑯−1 are similar, they have the same characteristic polynomials. Thus, if
𝑯𝑨𝑯−1 is of the proposed form, then: det(𝑨 − 𝜆𝑰) = det(𝑯𝑨𝑯−1 − 𝜆𝑰) = (𝜆1 − 𝜆) det(𝑨1 − 𝜆𝑰)
and it follows that the remaining 𝑛 − 1 eigenvalues of 𝑨 are the eigenvalues of 𝑨1 . The
question remains; How do we find such a matrix 𝑯? Note that the form proposed form it
requires that the first column of 𝑯𝑨𝑯−1 be 𝜆1 𝒆1. The first column of 𝑯𝑨𝑯−1 is 𝑯𝑨𝑯−1 𝒆1.
Thus, 𝑯𝑨𝑯−1 𝒆1 = 𝜆1 𝒆1 or, equivalently, 𝑨(𝑯−1 𝒆1 ) = 𝜆1 (𝑯−1 𝒆1 ) So (𝑯−1 𝒆1 ) is in the
eigenspace corresponding to 𝜆1 . Thus, for some eigenvector 𝐱1 belonging to 𝜆1 ,

𝑯−1 𝒆1 = 𝐱1 or 𝑯𝐱1 = 𝒆1

We must find a matrix 𝑯 such that 𝑯𝐱1 = 𝒆1 for some eigenvector 𝐱1 belonging to 𝜆1 . This
can be done by means of a Householder transformation. Because 𝑯 is a Householder
transformation, it follows that 𝑯−1 = 𝑯 , and hence 𝑯𝑨𝑯 is the desired similarity
transformation.
𝑨𝐱1 = 𝜆1 𝐱1 ⟺ 𝑨𝑯−1 𝑯 𝐱1 = 𝜆1 𝑯−1 𝑯𝐱1
⟺ 𝑨𝑯−1 𝒆1 = 𝜆1 𝑯−1 𝒆1
⟺ 𝑯𝑨𝑯−1 𝒆1 = 𝜆1 𝒆1
Remark: if (𝐱1 , 𝜆1 ) is an Eigen-pair of 𝑨 then (𝒆1 , 𝜆1 ) is an Eigen-pair of 𝑯𝑨𝑯−1.

𝜆1 𝒃𝑇
𝑯𝑨𝑯−1 = 𝑯𝑨𝑯−1 𝑰 = 𝑯𝑨𝑯−1 [𝒆1 𝒆2 … 𝒆𝑛 ] = ( 0 )
⋮ 𝑨1
0
We can apply the same procedure on 𝑨1 and so on until we get all eigenvalues.

clear all, clc, M=10*rand(6,6); D=diag([6 6 -3 6 -2 1]); N=200; s=[];


A=M*D*inv(M); n=size(A,1); AA=A; % save a copy of A
for k=1:n
A1=A(k:n,k:n);
x = randn(n-k+1,1);
x=x/norm(x,2);
for i = 1:N
q = A1*x;
x = q/norm(q,2);
lambda= x'*A1*x;
end
s=[s; lambda];
x1=x/norm(x,2);
if x1(1)>0
a=-norm(x1,2);
else
a=norm(x1,2);
end
v=x1-a*eye(n-k+1,1);
P1=eye(n-k+1,n-k+1)-2*(v*v')/(v'*v);
H(:,:,k)=blkdiag(eye(k-1,k-1),P1);
A=H(:,:,k)*A*H(:,:,k);
end
H1=H(:,:,1); H2=H(:,:,2); H3=H(:,:,3); H4=H(:,:,4); H5=H(:,:,5);
H=H5*H4*H3*H2*H1;
L1=H*AA*H'
s
The orthogonal iteration sometimes is called
the subspace method and it is a straightforward generalization of the power method can
be used to compute higher-dimensional invariant subspaces. Let 𝑟 be a chosen integer
satisfying 1 < 𝑟 < 𝑛. Given an 𝑛-by-𝑟 matrix 𝑸0 with orthonormal columns, the method of
orthogonal iteration generates a sequence of matrices {𝑸𝑘 } ⊆ ℂ𝑛×𝑟 as follows:

𝐟𝐨𝐫 𝑘 = 1: 𝑛
𝒀𝑘 = 𝑨𝑸𝑘−1
𝑸𝑘 𝑹𝑘 = 𝒀𝑘 (𝑄𝑅 factorization)
𝐞𝐧𝐝

% The orthogonal iteration


clear all, clc, M=10*rand(6,6); A=M*diag([6 5 -4 -3 2 -1])*inv(M);
M0=10*rand(6,6);[Q R]=qr(M0); % Create an Orthogonal Matrix Q
for k=1:100
Y=A*Q; [Q R]=qr(Y);
end
T=R
%--------------------------------------%

clear all, clc, M=10*rand(6,6); A=M*diag([6 5 -4 -3 2 -1])*inv(M);


M0=10*rand(6,6);[Q R]=qr(M0); r=3; Q= Q(:,1:r);
for k=1:100
Y=A*Q; [Q R]=qr(Y); Q= Q(:,1:r);
end
T=triu(Q'*A*Q)

Note that if 𝑟 = 1, then this is just the power method. Moreover, the sequence {𝑸𝑘 𝒆1 } is
precisely the sequence of vectors produced by the power iteration with starting vector
𝒒0 = 𝑸0 𝒆1 . In order to analyze the behavior of this iteration, suppose that

𝑻𝑟 = [𝒒1 𝒒2 … 𝒒𝑟 ]𝐻 𝑨[𝒒1 𝒒2 … 𝒒𝑟 ]

and if 𝑟 = 𝑛 then 𝑻 = 𝑸𝐻 𝑨𝑸 = (diag(𝜆𝑖 )𝑛𝑖=1 + 𝑵) with |𝜆1 | > |𝜆2 | > ⋯ > |𝜆𝑛 |. So
𝑛×𝑛 𝑛×𝑛
𝑻∈ ℂ is a Schur decomposition of 𝑨 ∈ ℂ and 𝑵 are upper triangular matrices. And
it is of great importance to know that this method will work only when the matrix 𝑨 has
only real distinct eigenvalues.

We have shown how Householder


transformations can be used to compute 𝑄𝑅 factorization. The 𝑄𝑅 factorization of an
𝑚 × 𝑛 matrix 𝑨 is given by 𝑨 = 𝑸𝑹 where 𝑸 ∈ ℝ𝑚×𝑚 is orthogonal and 𝑹 ∈ ℝ𝑚×𝑛 is upper
triangular. In this section we assume 𝑚 = 𝑛, we have seen that if 𝑨 has full column rank,
then the first 𝑛 columns of 𝑸 form an orthonormal basis for 𝑟𝑎𝑛𝑘(𝑨). Thus, calculation of
the 𝑄𝑅 factorization is one way to compute an orthonormal basis for a set of vectors. This
computation can be arranged in several ways, which in essence, transforms the matrix
into the Hessenberg Schur decomposition. It was developed in 1961 independently by
John G.F. Francis (England) and Vera N. Kublanovskaya (USSR).
Remark: The QR factorization is different from the QR iteration, the later one is a
procedure to calculate eigenvalues.

In 1958 Rutishauser worked as a research assistant of Eduard Stiefel at ETH Zurich and
experimented with a similar algorithm that we are going to present, but based on the LR
factorization, i.e., based on Gaussian elimination without pivoting. That algorithm was
not successful as the LR factorization (nowadays called LU factorization) is not stable
without pivoting. Francis noticed that the QR factorization would be the preferred and
can be improved to be an essential strategy in eigenvalues problems.

I. The Basic QR Iteration (Without Shift)

𝐟𝐨𝐫 𝑘 = 1: maxit
𝑨0 = 𝑨
𝑨𝑘 = 𝑸𝑘 𝑹𝑘 (𝑄𝑅 factorization)
𝑨𝑘+1 = 𝑹𝑘 𝑸𝑘
𝐞𝐧𝐝
It can be checked that

𝑨𝑘+1 = (𝑹𝑘 … 𝑹2 𝑹1 )𝑨(𝑹𝑘 … 𝑹2 𝑹1 )−1 or 𝑨𝑘+1 = (𝑸1 𝑸2 … 𝑸𝑘 )𝑇 𝑨(𝑸1 𝑸2 … 𝑸𝑘 )

Important remark: Notice that, if the convergence is guaranteed then at some iteration
(𝑘 = maxit) the algorithm will stop updating (i.e. it will be stationary) and this cannot be
occurred only if 𝑸𝑘 = 𝑰 that is 𝑨𝑘+1 = 𝑹𝑘 , so this basic QR iteration will tend to an upper
triangular matrix whose eigenvalues are the main diagonal entries. From the other side,
matrices 𝑨𝑘+1 and 𝑨 are similar to each other, so they have the same spectrum. Iteration
of this form is called the simple QR iteration and it forms the backbone of the most
effective algorithm for computing simple eigenvalues.

All the 𝑨𝑘 matrices are similar and hence they have the same eigenvalues. The algorithm
is numerically stable because it precedes by orthogonal similarity transforms. Under
certain conditions, the matrices 𝑨𝑘 converge to a triangular matrix, the Schur form of 𝑨.
The eigenvalues of a triangular matrix are listed on the diagonal, and the eigenvalue
problem is solved. However this algorithm has some limitations which can be improved
later on, among them:

⦁ The rate of convergence depends on the separation (distance) between eigenvalues.


⦁ The simple QR iteration is of slow convergence.
⦁ The simple QR iteration is valid only for full rank and real eigenvalues problem.
⦁ The simple QR iteration is valid only distinct absolute eigenvalues.

 Suppose you start with a matrix 𝑨. In case 𝑨 has complex eigenvalues then the
improved 𝑄𝑅 method will not give a triangular matrix as result (no matter how many
iterations you make). However it will yield a Hessemberg matrix and you can deal with
that fairly easy. In a nutshell, the 𝑄𝑅 algorithm applied to a matrix 𝑨 is an iterative
procedure that converges to the real Schur decomposition: a unitary matrix 𝑻 and a
matrix 𝑹 in block upper triangular form (see below) such that 𝑨 = 𝑻𝑹𝑻𝑇 . It follows that 𝑹
has the same eigenvalues as 𝑨.
The key point is the block upper triangular form, which means that

𝑹11 𝑹12 ⋯ 𝑹1𝑘



𝑹22 𝑹2𝑘
𝑹=( 𝟎 ⋱ )
⋮ ⋮ ⋮
𝟎 𝟎 … 𝑹𝑘𝑘

where 𝑹𝑖𝑖 are real blocks of either: size 1 × 1, in which case 𝑹𝑖𝑖 is a (real) eigenvalue of 𝑨,
or size 2 × 2, in which case 𝑹𝑖𝑖 has a pair of complex conjugate eigenvalues of 𝑨 (such as
𝑎 + 𝑏𝑖 and 𝑎 − 𝑏𝑖). Since you can compute eigenvalues of 2 × 2 matrices analytically (as
roots of a quadratic polynomial), it is a cheap step to extract the complex eigenvalues
from the computed (approximation of) 𝑹 in the end -- and this is what eigvals does.

Remark: Above basic algorithm is never used as is in practice. Two variations:

(1) Use shift of origin and


(2) Start by transforming 𝑨 into an upper Hessenberg matrix

Why is Hessenberg Form and Why is the Shift? The most expensive computation
inside the simple QR iteration is the QR decomposition, which runs in 𝒪(𝑛3 ). Since at
least one iteration per eigenvalue is needed, the operation count of finding all the
eigenvalues is, at the very least, 𝒪(𝑛4 ). The solution to this problem is to somehow
compute the QR decomposition in a cheaper way.

From, the computational point of view the QR decomposition of upper Hessenberg


matrices can be done in 𝒪(𝑛2 ). Computing the QR decomposition in 𝒪(𝑛2 ) brings down
the total operation count of the QR method to 𝒪(𝑛3 ). The important fact is that any
matrix can be reduced to upper Hessenberg form.

The convergence may be slow when some of the eigenvalues are close together. To speed
up convergence, it is customary to introduce origin shifts. At the 𝑘 𝑡ℎ step, a scalar 𝜇𝑘 is
chosen and 𝑨𝑘 − 𝜇𝑘 𝑰 (rather than 𝑨𝑘 ) is decomposed into a product 𝑸𝑘 𝑹𝑘 .

The rate of convergence depends on the separation between eigenvalues, so a practical


algorithm will use shifts, either explicit or implicit, to increase separation and accelerate
convergence. A typical symmetric QR algorithm isolates each eigenvalue (then reduces
the size of the matrix) with only one or two iterations, making it efficient as well as
robust.

The Shift Strategy:

The shifted QR-iteration with 𝜇𝑘 = 𝜆 will compute an eigenvalue in one step, but the
exact eigenvalue not available, then how to select the shifts?

If we are close to convergence the diagonal element will be an approximate eigenvalue,


hence it is strongly recommended to use Rayleigh shifts: 𝜇𝑘 = 𝑨𝑘 (𝑘, 𝑘)
To reduce the cost of numerical complexity we transform the matrix 𝑨 to an upper matrix
Hessenberg 𝑯 then apply the shifted QR-algorithm, after the execution of the program
one iteration will produce the next

If 𝜇𝑘 = 𝜆 is a good approximate (or a good estimate) of one eigenvalue of the matrix 𝑨 then
the matrix 𝑯 − 𝜆𝑰 is singular. Now if we factorize this matrix 𝑯 − 𝜆𝑰 into a product of 𝑸𝑹
then the upper triangular matrix 𝑹 is singular and its last diagonal element is zero.

Notice that the spectrum of matrix 𝑯 ̅ 1 are included in the spectrum of 𝑨, so we can use
̅ 1.
the deflation technique. i.e., proceed the algorithm with the smaller matrix 𝑨𝑘+1 = 𝑯

Remember that in the power method we have proved that the sequence of Rayleigh
quotients 𝜇𝑘 = 𝑸𝑘 (: , 𝑘)𝐻 𝑨𝑸𝑘 (: , 𝑘) converge to 𝜆 where 𝑸𝑘 (: , 𝑘) is the 𝑘 𝑡ℎ column of the
orthogonal matrix 𝑸 this means that 𝜇𝑘 = 𝒆𝑘 𝐻 (𝑸𝑘 𝐻 𝑨𝑸𝑘 )𝒆𝑘 = 𝒆𝑘 𝐻 (𝑨𝑘 )𝒆𝑘 = 𝑎𝑘𝑘 .

II. The Step QR with Raleigh Shift: The step qr-algorithm has the generic form

𝐟𝐨𝐫 𝑘 = 1: maxit
𝑨0 = 𝑨
𝑨𝑘 − 𝜇𝑘 𝑰 = 𝑸𝑘 𝑹𝑘 (𝑄𝑅 factorization)
𝑨𝑘+1 = 𝑹𝑘 𝑸𝑘 + 𝜇𝑘 𝑰
𝐞𝐧𝐝

Since 𝑸𝑘 𝐻 𝑸𝑘 = 𝑰, we have 𝑹𝑘 = 𝑸𝑘 𝐻 (𝑨𝑘 − 𝜇𝑘 𝑰) and hence the iterates satisfy

𝑨𝑘+1 = 𝑹𝑘 𝑸𝑘 + 𝜇𝑘 𝑰
= 𝑸𝑘 𝐻 (𝑨𝑘 − 𝜇𝑘 𝑰)𝑸𝑘 + 𝜇𝑘 𝑰
= 𝑸𝑘 𝐻 𝑨𝑘 𝑸𝑘

In other words, 𝑨𝑘+1 and 𝑨𝑘 are orthogonally similar and thus have the same spectrum:

𝜎(𝑨𝑘 ) = 𝜎(𝑨) = {𝜆1 , 𝜆2 , … , 𝜆𝑛 } 𝑘 = 0, 1, …

For any matrices 𝑨 and suitable shifts the iterates 𝑨𝑘 converge to an upper triangular
matrix, whose diagonal entries are the eigenvalues of 𝑨. In other words, after a sufficient
number of steps, the resulting matrix 𝑨𝑘 will have very small entries below the diagonal
and hence its diagonal entries approximate the eigenvalues of 𝑨. The following Matlab
code represents a very simple implementation of this process.
clear all, clc, A=10*rand(6,6); L1=eig(A)
n = size (A,1); k=n ; it=0;

while (k>1) && (it< 100)


if norm(A(k,1:k-1),inf)<=eps
A(k,1:k-1) = 0; k=k-1;
else
it=it+1;
S=A(k,k)*eye(k); % Raleigh shift
[Q,R] = qr(A(1:k,1:k)-S); % get QR factorization
A(1:k,1:k)=R*Q+S; % qr-algorithm step
end
end
T=A;
L2=eig(T)

T =
33.4896 2.5202 -6.9617 0.1700 2.3417 -1.5333
0.0000 +5.6272 5.6835 -2.5710 4.2918 -0.1749
0.0000 -0.7382 6.1388 -0.9157 -2.5626 -2.2032
0.0000 -0.0000 0.0000 -4.3756 1.2809 2.3749
0.0000 0.0000 -0.0000 -0.2775 -4.2481 -1.7620
0 0 0 0 0 0.6008

Iteration of this form is called the simple QR iteration with single Raleigh shift, and it
forms the backbone of the most effective algorithm for computing the Schur
decomposition. Eigenvalues of the matrix 𝑨 are eigenvalues of the 2 × 2 block diagonal
matrices in 𝑻 which is the Schur decomposition.

The QR-algorithm requires a QR-factorization at each step involving 𝒪(𝑛3 ) arithmetic


operations. For all 𝑛 eigenvalues this totals to about 𝒪(𝑛4 ) operations, which is
unacceptably high. Fortunately, the algorithm can be made more efficient by using a
simple pre-processing step, which transforms the matrix into upper Hessenberg form.

clear all, clc, A=10*rand(6,6); L1=eig(A)


n = size (A,1); k=n ; it=0; H=hess(A);

while (k>1) && (it< 100)


if norm(H(k,1:k-1),inf)<=eps
H(k,1:k-1) = 0; k=k-1;
else
it=it+1;
S=H(k,k)*eye(k); % Raleigh shift
[Q,R] = qr(H(1:k,1:k)-S); % get QR factorization
H(1:k,1:k)=R*Q+S; % qr-algorithm step
end
end
T=H;
L2=eig(T)
Remark: In this case 𝑨𝑛𝑛 = 0, and so the shifted 𝑄𝑅 algorithm is the same as the
0 1
unshifted algorithm. For example, consider the matrix 𝑨 = 𝑨1 = ( ). In this example,
1 0
𝑨𝑘 = 𝑨1 for all 𝑘 and the unshift 𝑄𝑅 algorithm is stagnate. The reason for this failure is
that the eigenvalues of 𝑨, 𝜆1,2 = ±1, are located symmetric about the origin, and the
estimate 𝜇 = 0 cannot decide which way to go. A remedy is provided by a different shift
which breaks the symmetry. Below, we will fix that with Wilkinson shifts. (Note that
Rayleigh quotient shifts do not fix this example.)

III. The QR with Wilkinson’s Shift: Again the idea of the shift is to quickly make the
matrix 𝑨𝑘 (𝑘, 𝑘 − 1) converge to zero. A reasonable choice of the shift is the Rayleigh
quotient, where 𝜇𝑘 = 𝑨𝑘 (𝑘, 𝑘), because we would like 𝜇𝑘 to be an estimate for eigenvalue
of 𝑨. An even better approximation the eigenvalue, which is known as Wilkinson’s shift,
is obtained by considering the last 2 × 2 block. Wilkinson's shift is defined to be the
𝑎𝑛−1 ⋆
eigenvalue of the matrix [ 𝜀 𝑎𝑛 ] that is closer to 𝑎𝑛 .

% QR algorithm with Wilkinson shift for eigenvalues.


% Input: square matrix A
% Output: gamma sequence of approximations to one eigenvalue (vector)

clear all, clc, A=rand(5,5); A = hess(A); AA=A; L=eig(A)

for k = length(A):-1:1
%-----------------------------------------------------%
% QR iteration %
s = 0; I = eye(k);
while sum( abs(A(k,1:k-1)) ) > eps
[Q,R] = qr(A-s*eye(k));
A = R*Q + s*eye(k);
b = -(A(k-1,k-1)+ A(k,k));
c = A(k-1,k-1)*A(k,k) - A(k-1,k)*A(k,k-1);
mu = roots([1 b c]);
[temp,j] = min(abs(mu-A(k,k)));
s = mu(j);
end
%-----------------------------------------------------%
% Deflation %
d(k) = A(k,k);
A = A(1:k-1,1:k-1);
end

d' % all eigenvalue


% Alternative of Hessenberg QR algorithm with Rayleigh quotient shift

% Reduce a matrix to Hessenberg form.


% Input: A is n-by-n matrix % Output: H is m*n upper Hessenberg form
clear all, clc, A=10*rand(7,7); n=size(A,1); H=hess(A); L=eig(A)
for k = n:-1:1
% QR iteration
while sum(norm(A(k,1:k-1)))>0.001
s = eig(A(k-1:k,k-1:k)); % Eigenvalues of trailing 2-by-2 matrix
[Q,R] = qr(A - s(2)*eye(k));
A = R*Q + s(2)*eye(k);
end
% Deflation
d(k) = A(k,k);
A = A(1:k-1,1:k-1);
end
d = (flipud(sort(d)))'

The LR algorithm (based on triangular decompositions)


was developed in the early 1950s by Heinz Rutishauser, who worked at that time as a
research assistant of Eduard Stiefel at ETH Zurich. Stiefel suggested that Rutishauser
use the sequence of moments 𝒉 = 𝒄𝑇 𝑨𝑘 𝒃, 𝑘 = 0, 1, … (where 𝒄 and 𝒃 are arbitrary vectors)
to find the eigenvalues of 𝑨. Rutishauser took an algorithm of Alexander Aitken for this
task and developed it into the one called quotient–difference algorithm or 𝑞𝑑-algorithm.
After arranging the computation in a suitable shape, he discovered that the 𝑞𝑑-algorithm
is in fact the iteration 𝑨𝑘 = 𝑳𝑘 𝑹𝑘 (i.e. 𝐿𝑈 decomposition), 𝑨𝑘+1 = 𝑹𝑘 𝑳𝑘 , applied on a
tridiagonal matrix, from which the 𝐿𝑅 algorithm follows.

The LR (Nowadays the LU) and QR algorithms have proved to be two of the most
important general purpose methods for solving the unsymmetric eigenvalue problem.
Existing proofs of their convergence depend on rather sophisticated determinantal theory
and are to some extent incomplete (J. H. Wilkinson).

The LR decomposition is based on the fact that virtually any matrix can be factorized
into a product of lower and upper triangular matrices. Thus if we denote the original
matrix 𝑨 as 𝑨1 we can write 𝑨1 = 𝑳1 𝑹1 And the upper triangular matrix can be expressed
as 𝑹1 = 𝑳1−1 𝑨1. If we now multiply this equation on the right by 𝑳1 we get 𝑹1 𝑳1 = 𝑳1−1 𝑨1 𝑳1.
In other words, the reverse multiplication 𝑹1 𝑳1 is a similarity transformation of 𝑨1 and
thus preserve the eigenvalues of 𝑨1 . Now let we define a new matrix 𝑯2 = 𝑹1 𝑳1 and
decompose as was done with 𝑨1 : 𝑨2 = 𝑳2 𝑹2 then compute 𝑨3 : 𝑨3 = 𝑹2 𝑳2 = 𝑳3 𝑹3 continuing
the process 𝑨𝑘 = 𝑳−1 𝑘 𝑨𝑘−1 𝑳𝑘 . Under certain conditions on the original matrix 𝑨 as 𝑘 → ∞
the matrix 𝑨𝑘 will approaches an upper triangular matrix with the eigenvalues in
decreasing order of magnitude on the main diagonal.

𝑨𝑘 = 𝑳−1 −1
𝑘 𝑨𝑘−1 𝑳𝑘 ⟺ 𝑨𝑘 = (𝑳𝑘 … 𝑳2 𝑳1 ) 𝑨1 (𝑳𝑘 … 𝑳2 𝑳1 )

When 𝑘 → ∞ we have 𝑨𝑘 = 𝑨𝑘−1 = constant, means that lim𝑘→∞ 𝑳𝑘 = 𝑰 or equivalently

lim 𝑨𝑘 = lim 𝑳𝑘 𝑹𝑘 = 𝑹𝑘 = constant


𝑘→∞ 𝑘→∞
Finally we deduce that 𝑨𝑘 will tend to an upper triangular matrix, and from linear
algebra we know that the eigenvalues of an upper triangular matrix are the elements of
the main diagonal. The eigenvalues of 𝑨1 appear as the diagonal terms of this upper-
triangular matrix because 𝑨𝑘 and 𝑨1 are similar to each other.

The LR Process: Starting from the orthogonal iteration method by supposing that 𝑟 = 𝑛
and we define the matrices 𝑻𝑘 = 𝑳𝑘 −1 𝑨𝑳𝑘 then

𝑻0 = 𝑳0 −1 𝑨𝑳0
𝐟𝐨𝐫 𝑘 = 1: 𝑛
𝒀𝑘 = 𝑨𝑳𝑘−1 𝐟𝐨𝐫 𝑘 = 1: 𝑛
⟺ 𝑻𝑘−1 = 𝑳𝑘 𝑹𝑘 (𝐿𝑅 factorization)
𝑳𝑘 𝑹𝑘 = 𝒀𝑘 (𝑄𝑅 factorization)
𝑻𝑘 = 𝑹𝑘 𝑳𝑘
𝐞𝐧𝐝
𝐞𝐧𝐝

Under reasonable assumptions, the matrix 𝑻𝑘 converge to upper triangular form. To


successfully implement the method, it is necessary to pivot. (See Wilkinson 1965, p.602).
The process of triangular decomposition is discussed by Faddeev and Faddeeva, Ralston,
and Wilkinson. Several difficulties are encountered when attempting to find eigenvalues
by the orthodox LR algorithm. The convergence of the subdiagonal elements to zero is
often very slow if no origin shifts are used. Although the LR algorithm does not actually
break down, it becomes numerically unstable. To aviod numerical problem in the LR
algorithm some modification are introduced, unfortunately even this modified LR
transformation is much easier to apply, but because of its possible numerical instability
and other restrictions, its value seems limited. Therefore, the double QR algorithm was
found to be the more successful method for dealing with the real unsymmetric
eigenvalue problem. See Susan Clara Hanson 1966.

%------------------------------------------------------%
% Determination of Eigenvalues and Eigenvectors Part I
%------------------------------------------------------%
clear all, clc, M=10*rand(6,6); A=M*diag([6 5 4 3 2 1])*inv(M);
A1=A; n=size(A,1); tol=1e-8;
for i=1:100;
[Q R] = lu(A1); % you can use any global algorithm such QR or others
A1=R*Q;
end
d=diag(R)
%----%determination of Eigenvectors using SVD%----%
X=[];
for k=1:n
N=d(k)*eye(n,n)-A;
[U,S,V] = svd(N,0); % run svd
s = diag(S);
r = sum(s>tol); %r=nnz(find(abs(s)>1e-10));
x=10*V(:,r+1:n); X=[X x];
end
X
Diago=inv(X)*A*X
Symmetric eigenvalue problems are posed as
follows: given an n-by-n real symmetric or complex Hermitian matrix 𝑨, find the
eigenvalues 𝜆 and the corresponding eigenvectors 𝒛 that satisfy the equation 𝑨𝒛 = 𝜆𝒛 (or,
equivalently, 𝒛𝐻 𝑨 = 𝜆𝒛𝐻 ). In such eigenvalue problems, all n eigenvalues are real not only
for real symmetric but also for complex Hermitian matrices 𝑨, and there exists an
orthonormal system of n eigenvectors. If 𝑨 is a symmetric or Hermitian positive-definite
matrix, all eigenvalues are positive.

To solve a symmetric eigenvalue problem with LAPACK or any other Numerical Linear
Algebra software, you usually need to reduce the matrix to tridiagonal form 𝑻 and then
solve the eigenvalue problem with the tridiagonal matrix obtained. LAPACK includes
routines for reducing the matrix to a tridiagonal form by an orthogonal (or unitary)
similarity transformation 𝑨 = 𝑸𝑻𝑸𝐻 as well as for solving tridiagonal symmetric
eigenvalue problems.

There are three primary algorithms for computing eigenvalues and eigenvectors of
symmetric problems:

■ Tridiagonal QR iteration: This algorithm finds all the eigenvalues, and optionally all the
eigenvectors, of a symmetric matrix. It can be implemented efficiently, it is currently the
fastest practical method to find all the eigenvalues of a symmetric tridiagonalized
matrices, taking 𝒪(𝑛2 ) flops. It is only the fastest algorithm for small matrices, up to
about 𝑛 = 25.
■ Divide-and-conquer method: This is currently the fastest method to find all the
eigenvalues and eigenvectors of symmetric tridiagonal matrices larger than 𝑛 = 25.
■ Jacobi’s method: This method is historically the oldest method for the eigenproblem,
dating to 1846. It is usually much slower than any of the above methods, taking 𝒪(𝑛3 )
flops with a large constant. But it is sometimes much more accurate than the above
methods. It does not require tridiagonalization of the matrix.

Tridiagonal QR iteration: the Strategy for finding the eigenvalues of a real symmetric
matrix 𝑨 = 𝑨𝑇 by using QR iteration is as follow:

The first step is to unitarily transform 𝑨 to tridiagonal form with a unitary matrix 𝑸1 , i.e.,
𝑨 = 𝑸1 𝑹 𝑸1𝑇 . This is what we called Hessenberg reduction, second step is to transform 𝑹
to diagonal form 𝚲 using a sequence of unitary matrices and deflation (i.e., 𝑄𝑅 iteration).

𝑹 = hess(𝑨); 𝑻 = 𝑰;
for i=1,2,… do;
[𝑸 𝜦] = qr(𝑹);
𝑹 = 𝚲 ⋆ 𝑸; 𝑻 = 𝑻 ⋆ 𝑸;
end

Thus 𝚲 = 𝑻𝑇 𝑹𝑻 with 𝑻 = 𝑻1 … 𝑻𝑘 . Note that this is equivalent to 𝑨 = (𝑻𝑸1 )𝚲(𝑻𝑸1 )𝑇 = 𝑽𝚲𝑽𝑇


where 𝑽 and 𝚲 contain accurate approximations to the eigenvectors and eigenvalues of 𝑨,
respectively.
Tridiagonalization Processes can be done by means of too many algorithms such as
Lanczos method, the Householder transformation, the Givens rotations, the Golub-
Kahan method etc… here we give without proof a very powerful algorithm.

clear all, clc, A=rand(8,8); A=(A'+A)/2; [n,n]=size(A); AA=A;


for k= 1:n-2
x=A(k+1:n,k); s=norm(x);
if (A(k+1,k)<0); s=-s; end
r= sqrt(2*s*(A(k+1,k)+s)); W(1:k)=zeros(1,k); W(k+1)= (A(k+1,k)+s)/r;
W(k+2:n)=(A(k+2:n,k))'/r;
V(1:k)=zeros(1,k); V(k+1:n)=A(k+1:n,k+1:n)*W(k+1:n)';
c=W(k+1:n)*V(k+1:n)'; Q(1:k)=zeros(1,k); Q(k+1:n)=V(k+1:n)-c*W(k+1:n);
A(k+2:n,k)=zeros(n-k-1,1); A(k,k+2:n)=zeros(1,n-k-1);
A(k+1,k)=-s; A(k,k+1)=-s;
A(k+1:n,k+1:n)=A(k+1:n,k+1:n)-2*W(k+1:n)'*Q(k+1:n)-2*Q(k+1:n)'*W(k+1:n);
end
T=A % you can verify the result by using: hess(A)

%-----------------------------------------------------%
% Symmetric Eigenvalues and Eigenvectors
%-----------------------------------------------------%
clear all, clc, A=10*rand(4,6); A1=A*A';[Q1 R]= hess(A1); n=size(A1,2);
T=eye(n);
for i=1:500;
[Q D] = qr(R);
R=D*Q; T=T*Q;
end
V=Q1*T
DD= tril(triu(V'*A1*V))

The
generalized eigenvalue problem is that of finding nontrivial solutions of the equation
𝑨𝐱 = 𝜆𝑩𝐱 where 𝑨 and 𝑩 are of order 𝑛. The problem reduces to the ordinary eigenvalue
problem when 𝑩 = 𝑰 which is why it is called a generalized eigenvalue problem. The QR
algorithm can be adapted to compute the generalized eigenvalue problem. The resulting
algorithm is called the QZ algorithm (generalized Schur decomposition). The set of all
matrices of the form 𝑨 − 𝜆𝑩 with 𝜆 ∈ ℂ is said to be a pencil. The eigenvalues of the pencil
are elements of the set 𝜆(𝑨, 𝑩) defined by 𝜆(𝑨, 𝑩) = {𝜆 ∈ ℂ: det(𝑨– 𝜆𝑩) = 0 }. If 𝜆 ∈ 𝜆(𝑨, 𝑩)
and 𝑨𝐱 = 𝜆𝑩𝐱 then 𝐱 is referred to as an eigenvector of 𝑨 − 𝜆𝑩.

The first thing to observe about the generalized eigenvalue problem is that there are 𝑛
eigenvalues if and only if rank(𝑩) = 𝑛. If 𝑩 is rank deficient then 𝜆(𝑨, 𝑩) may be finite,
empty, or infinite. Note that if 0 ≠ 𝜆 ∈ 𝜆(𝑨, 𝑩) then 1⁄𝜆 ∈ 𝜆(𝑩, 𝑨). Moreover, if 𝑩 is
nonsingular then 𝜆(𝑨, 𝑩) = 𝜆(𝑩−1 𝑨, 𝑰) = 𝜆(𝑩−1 𝑨). This last observation suggests one
method for solving the 𝑨 − 𝜆𝑩 problem when 𝑩 is nonsingular: find 𝜆(𝑪) such that 𝑨 = 𝑩𝑪.
If 𝑩 is ill-conditioned, then this can rule out the possibility of computing any generalized
eigenvalue accurately—even those eigenvalues that may be regarded as well-conditioned.
One idea to such problem is to compute well-conditioned 𝑸 and 𝒁 such that the matrices

𝑨1 = 𝑸−1 𝑨𝒁 and 𝑩1 = 𝑸−1 𝑩𝒁

are each in canonical form (e.g. quasi-triangular). Note that 𝜆(𝑨, 𝑩) = 𝜆(𝑨1 , 𝑩1 ) since

𝑨𝐱 = 𝜆𝑩𝐱 ⟺ 𝑨1 𝐲 = 𝜆𝑩1 𝐲 with 𝐱 = 𝒁𝐲

We say that the pencils 𝑨– 𝜆𝑩 and 𝑨1 – 𝜆𝑩1 are equivalent if 𝑨1 = 𝑸−1 𝑨𝒁 and 𝑩1 = 𝑸−1 𝑩𝒁
holds with nonsingular 𝑸 and 𝒁. If matrices 𝑸 and 𝒁 are orthonormal and unitary then
they are called unitarily equivalent.

Given square matrices 𝑨 and 𝑩, the


generalized Schur decomposition factorizes both matrices as 𝑨 = 𝑸𝑺𝒁𝐻 and 𝑩 = 𝑸𝑻𝒁𝐻 .
where 𝑸 and 𝒁 are unitary, and 𝑺 and 𝑻 are upper triangular. The generalized Schur
decomposition is also sometimes called the 𝑄𝑍-decomposition.

The generalized eigenvalues 𝜆 that solve the generalized eigenvalue problem 𝑨𝐱 = 𝜆𝑩𝐱
(where 𝐱 is an unknown nonzero vector) can be calculated as the ratio of the diagonal
elements of 𝑺 to those of 𝑻. That is, using subscripts to denote matrix elements, the 𝑖 𝑡ℎ
generalized eigenvalue 𝜆𝑖 satisfies 𝜆𝑖 = 𝑺𝑖𝑖 /𝑻𝑖𝑖 .

𝑨𝐱 = 𝜆𝑩𝐱 ⟺ (𝑨 − 𝜆𝑩)𝐱
⟺ 𝐱 ∈ 𝒩(𝑨 − 𝜆𝑩)
⟺ det(𝑨 − 𝜆𝑩) = 0
⟺ det(𝑸) det(𝑺 − 𝜆𝑻) det(𝒁𝐻 ) = 0
⟺ 𝜆𝑖 = 𝑺𝑖𝑖 /𝑻𝑖𝑖 if 𝑻𝑖𝑖 ≠ 0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Generalized Schur decomposition Or QZ decomposition
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
clear all, clc, M1=10*rand(5,5); M2=10*rand(5,5);
A=M1*diag([-1 -2 -3 -4 -5])*inv(M1);
B=M2*diag([1 2 3 4 5])*inv(M2);
n=size(A,1); [AA,BB,Q,Z] = qz(A,B) % MATLAB instruction
lambda=zeros(n,n);
for i=1:n
lambda(i,i)=AA(i,i)/BB(i,i);
end
lambda
%---------------- verifications by MATLAB instructions ----------------%
[V, lambda]=eig(A,B);
lambda

Theorem: (Generalized Real Schur Decomposition) If 𝑨 and 𝑩 are in ℝ𝑛×𝑛 then there
exist orthogonal matrices 𝑸 and 𝒁 such that 𝑸𝑇 𝑨𝒁 is upper quasi-triangular and 𝑸𝑇 𝑩𝒁 is
upper triangular.

Proof: (See Stewart 1972). ■


If 𝑺 is triangular, the diagonal elements of 𝑺 and 𝑻: 𝚲𝛼 = diag(diag(𝑺)), 𝚲𝛽 = diag(diag(𝑻))
are the generalized eigenvalues that satisfy 𝑨𝑽𝚲𝛽 = 𝑩𝑽𝚲𝛼 and 𝚲𝛽 𝑾𝑨 = 𝚲𝛼 𝑾𝑩. The
eigenvalues produced by 𝜆 = eig(𝑨, 𝑩) are the element-wise ratios of 𝚲𝛼 (𝑖, 𝑖) and 𝚲𝛽 (𝑖, 𝑖).

𝜆𝑖 = 𝚲𝛼 (𝑖, 𝑖)/𝚲𝛽 (𝑖, 𝑖)

If 𝑺 is not triangular, it is necessary to further reduce the 2-by-2 blocks to obtain the
eigenvalues of the full system.

clear all, clc,


A =[2 4 3 4 1;2 4 3 1 5;2 2 5 2 3;4 5 5 3 3;5 2 4 1 3];
B =[3 5 2 2 3;4 5 5 5 3;1 4 5 2 3;5 2 2 5 5;2 4 4 1 4];
[AA,BB,Q,Z,V,W] = qz(A,B);
alpha = diag(AA)
beta = diag(BB)
zero1=A*V*diag(beta)- B*V*diag(alpha)
zero2=diag(beta)*W'*A - diag(alpha)*W'*B
lambda = alpha./beta
lambda = eig(A,B) % verifications

In mathematics, in the field of control theory, a Sylvester


equation is a matrix equation of the form: 𝑨𝑿 + 𝑿𝑩 = 𝑪. Then given matrices 𝑨, 𝑩, and 𝑪,
the problem is to find the possible matrices 𝑿 that obey this equation. 𝑨 and 𝑩 must be
square matrices of sizes 𝑛 and 𝑚 respectively, and then 𝑿 and 𝑪 both have 𝑛 rows and 𝑚
columns.

A Sylvester equation has a unique solution for 𝑿 exactly when there are no common
eigenvalues of 𝑨 and −𝑩. More generally, the equation 𝑨𝑿 + 𝑿𝑩 = 𝑪 has been considered
as an equation of bounded operators on a (possibly infinite-dimensional) Banach space.

If we introduce the Sylvester operator 𝑺: ℂ𝑛×𝑚 ⟶ ℂ𝑛×𝑚 defined by 𝑺𝑿 = 𝑨𝑿 + 𝑿𝑩, then we


may write Sylvester's equation in the form 𝑺𝑿 = 𝑪.

The operator 𝑺 is linear in 𝑿. Hence, our problem reduces to that of determining when 𝑺
is nonsingular. It turns out that the nonsingularity depends on the spectra of 𝑨 and 𝑩.
Specifically, we have the following theorem.

Theorem: Given matrices 𝑨 ∈ ℂ𝑛×𝑛 and 𝑩 ∈ ℂ𝑚×𝑚 , the Sylvester equation 𝑨𝑿 + 𝑿𝑩 = 𝑪


has a unique solution 𝑿 ∈ ℂ𝑛×𝑚 for any 𝑪 ∈ ℂ𝑛×𝑚 if and only if 𝑨 and −𝑩 do not share any
eigenvalue.

Proof: To prove the necessity of the condition, let (𝜆, 𝐱) be a right eigenpair of 𝑨 and let
(𝜇, 𝐲) be a left eigenpair of −𝑩. Let 𝑿 = 𝐱𝐲 𝐻 . Then

𝑺𝑿 = 𝑨𝐱𝐲 𝐻 + 𝐱𝐲 𝐻 𝑩 = 𝜆𝐱𝐲 𝐻 − 𝜇𝐱𝐲 𝐻 = (𝜆 − 𝜇)𝑿 = 𝑪

Hence if we can choose 𝜆 − 𝜇 = 0 that is, if 𝑨 and −𝑩 have a common eigenvalue— then
𝑿 is a "null vector" of 𝑺, and 𝑺 is singular. This proves the "if" part of the theorem.
To prove the converse, we use the fact that an operator is nonsingular if every linear
system in the operator has a solution. Consider the system 𝑺𝑿 = 𝑪. The first step is to
transform this system into a more convenient form. Let 𝑻 = 𝑽𝐻 𝑩𝑽 be a Schur
decomposition of 𝑩. Then Sylvester's equation can be written in the form

𝑨(𝑿𝑽) + 𝑿𝑽(𝑽𝐻 𝑩𝑽) = 𝑪𝑽

If we set 𝒀 = 𝑿𝑽 and 𝑫 = 𝑪𝑽, we may write the transformed equation in the form

𝑨𝒀 + 𝒀𝑻 = 𝑫
Let us partition this system in the form

𝑡11 𝑡12 𝑡13 ⋯


𝑡22 𝑡23
𝑨(𝒚1 , 𝒚2 , … ) + (𝒚1 , 𝒚2 , … ) ( 0 ⋯) = (𝒅 , 𝒅 , … )
⋯ 1 2
0 0 𝑡33
⋮ ⋮ ⋮

In general, suppose that we have found 𝒚1 , 𝒚2 , … 𝒚𝑘−1 From the 𝑘 𝑡ℎ column of this
equation we get
𝑘−1
(𝑨 + 𝑡𝑘𝑘 𝑰)𝒚𝑘 = 𝒅𝑘 − ∑ 𝑡𝑖𝑘 𝒚𝑖
𝑖=1

The right-hand side of this equation is well defined and the matrix on the left is non-
singular. Hence 𝒚𝑘 is well defined. We have found a solution 𝒀 of the equation
𝑨𝒀 + 𝒀𝑻 = 𝑫. Hence 𝑿 = 𝒀𝑽𝐻 is a solution of 𝑺𝑿 = 𝑪. Justifying the "only if" part of the
theorem. Q.E.D. ■

Alternative proof: Assume that 𝜎 is an Eigenvalue of the operator 𝑺 that is 𝑺𝑿 = 𝜎𝑿 or


equivalently the Sylvester operator becomes 𝑿 ⟼ (𝑨 − 𝜎𝑰)𝑿 + 𝑿𝑩 = 𝑨𝑿 + 𝑿(𝑩 − 𝜎𝑰) = 0
which means that 𝜎 = 𝜆 − 𝜇 where 𝜆 and 𝜇 are the Eigenvalue of 𝑨 and – 𝑩 respectively.

𝑿 ⟼ (𝑨 − 𝜆𝑰)𝑿 + 𝑿(𝑩 + 𝜇𝑰) = 0

Now operator 𝑺 is singular only if 𝜎 = 0 means that 𝑨 and −𝑩 share a common


eigenvalues. ■

Corollary: The eigenvalues of 𝑺 are the sum of the eigenvalues of 𝑨 and 𝑩.

7.2.1 Numerical solutions of Sylvester's Equation: Certain matrix equations arise


naturally in linear control and system theory. Among those frequently encountered in
the analysis and design of linear systems are the Lyapunov equations

𝑨𝑿 + 𝑿𝑨𝑇 = 𝑪 continuous − time systems


𝑨𝑿𝑨𝑇 − 𝑿 = 𝑪 discrete − time system𝑠

Which are special cases of another classical matrix equation, known as the Sylvester
equations:
𝑨𝑿 + 𝑿𝑩 = 𝑪 continuous − time systems
𝑨𝑿𝑩 − 𝑿 = 𝑪 discrete − time system𝑠
The Sylvester equations also arise in a wide variety of applications. For example, the
numerical solution of elliptic boundary value problems can be formulated in terms of the
solution of the Sylvester equation (Starke and Niethammer 1991). The solution of the
Sylvester equation is also needed in the block diagonalization of a matrix by a similarity
transformation (see Datta 1995) and Golub and Van Loan (1996). Once a matrix is
transformed to a block diagonal form using a similarity transformation, the block
diagonal form can then be conveniently used to compute the matrix exponential 𝑒 𝑨𝑡 .

From the algebraic point of view it is well-known that if the matrices 𝑨 and 𝑩
are stable then the unique solution of the continuous-time Sylvester equation is given by
∞ ∞ ∞ ∞

𝑿 = − ∫ 𝑒 𝑪𝑒 𝑑𝑡 ⟺ 𝑿 = − ∫ 𝑫(𝑡)𝑑𝑡 = − ∑ 𝑫(𝑘𝑇𝑠 ) 𝑇𝑠 ≅ − ∑(𝑰 + 𝑨𝑇𝑠 )𝑘 (𝑪𝑇𝑠 )(𝑰 + 𝑩𝑇𝑠 )𝑘


𝑨𝑡 𝑩𝑡
0 0 𝑘=0 𝑘=0

clear all, clc, Ts=0.01; M1=randi(5,3); M2=randi(7,3); M3=randi(10,3);


A =M1*diag([-1 -2 -3])*inv(M1); B =M2*diag([-4 -5 -6])*inv(M2);
C =M3*diag([1 2 3])*inv(M3); n=size(A,1); I=eye(n,n); N=1000;
EA=I + Ts*A; EB=I + Ts*B; CT=Ts*C; X=zeros(n,n);
for k=0:N
X=X - (EA^k)*(CT)*(EB^k);
end
X
Zero=A*X+X*B-C % self-related verifications
XMATLAB = sylvester(A,B,C) % verifications using MATLAB

Another MATLAB code:

clear all, clc, Ts=0.001;


M1=0.1*randi(5,3); M2=0.1*randi(7,3); M3=0.1*randi(10,3);
A =M1*diag([-1 -2 -3])*inv(M1); B =M2*diag([-4 -5 -6])*inv(M2);
C =M3*diag([1 2 3])*inv(M3); n=size(A,1); I=eye(n,n); N=20;
t=0:Ts:N; X=zeros(n,n);
for k=1:length(t)
X=X - (expm(A*t(k)))*(Ts*C)*(expm(B*t(k)));
end
X
XMATLAB = sylvester(A,B,C) % verifications using MATLAB

Remark: This method is not practical, since it is very costly due to numerical integration
and it relies on the stability of matrices 𝑨 and 𝑩.

Alternatively, it is also possible to rearrange the equation in a linear form


using the KRON function as follows:

𝑨𝑿 + 𝑿𝑩 = 𝑪 ⟺ ((𝑰 ⊗ 𝑨) + (𝑩𝑇 ⊗ 𝑰))vec(𝑿) = vec(𝑪)

Where 𝐱 = vec(𝑿) and 𝐜 = vec(𝑪), contain the concatenated columns of 𝑿 and 𝑪. The
solution of this equation exists and unique if and only if the matrix 𝑮 = (𝑰 ⊗ 𝑨) + (𝑩𝑇 ⊗ 𝑰)
is nonsingular. The following lines of code provides an example in MATLAB:
clear all, clc, Ts=0.001;
A=[1 1;2 3]; B=[1 2;3 4]; C=-[1 3;1 1];
G=kron(eye(2),A)+kron(B',eye(2));
c=reshape(C,4,1); x=inv(G)*c;
X=reshape(x,2,2)
XMATLAB = sylvester(A,B,C) % verifications using MATLAB

Remark: This method remains not recommended considering the cost of the inverse
matrix and the cost of transforming the system into a linear equation.

A classical algorithm for the numerical solution of the Sylvester equation is


the Bartels–Stewart algorithm 1971, which consists of transforming 𝑨 and 𝑩 into Schur
form by 𝑄𝑅-algorithm, and then solving the resulting triangular system via back-
substitution. The Bartels–Stewart algorithm is of computational cost equal to
𝒪(𝑛3 ) arithmetical operations. It was the first numerically stable method that could be
systematically applied to solve such equations. In 1979, G. Golub, C. Van Loan and S.
Nash introduced an improved version of the algorithm, known as the Hessenberg–Schur
algorithm.

The Bartels–Stewart algorithm computes 𝑿 by applying the following steps:

▪ Compute the real Schur decompositions 𝑹 = 𝑼𝐻 𝑨𝑼 and 𝑻 = 𝑽𝐻 𝑩𝑽


▪ Matrices 𝑹 and 𝑻 are block-upper triangular matrices, with diagonal blocks of size 1 × 1
(corresponding to real eigenvalues) or 2 × 2 (corresponding to complex eigenvalues).

▪ Set 𝑫 = 𝑼𝑇 𝑪𝑽
▪ Solve the simplified system 𝑹𝒀 + 𝒀𝑻 = 𝑫 where 𝒀 = 𝑼𝐻 𝑿𝑽. This can be done using
forward substitution on the blocks. Specifically, if 𝑡𝑘+1,𝑘 = 0, then
𝑘−1
(𝑹 + 𝑡𝑘𝑘 𝑰)𝒚𝑘 = 𝒅𝑘 − ∑ 𝑡𝑖𝑘 𝒚𝑖
𝑖=1

▪ In case when: 𝑡𝑘+1,𝑘 ≠ 0, columns [𝒚𝑘 ; 𝒚𝑘+1 ] should be concatenated and solved for
simultaneously.
(𝑹 + 𝑡𝑘𝑘 𝑰) 𝒚𝑘 𝑘−1 𝑡𝑖𝑘 𝒚𝑖
𝑡𝑚𝑘 𝑰 𝒅𝑘
( )( ) = ( ) − ∑ ( ) with 𝑚 = 𝑘 + 1
𝑡𝑘𝑚 𝑰 (𝑹 + 𝑡𝑚𝑚 𝑰) 𝒚𝑚 𝒅𝑚 𝑖=1 𝑡𝑖𝑚 𝒚𝑖
Algorithm: Bartels–Stewart
input: 𝑨 ∈ ℝ𝑛×𝑛 , 𝑩 ∈ ℝ𝑚×𝑚 , 𝑪 ∈ ℝ𝑛×𝑚
output: 𝑿 ∈ ℝ𝑛×𝑚 , the solution of 𝑨𝑿 + 𝑿𝑩 = 𝑪
▪ Compute the Schur reduction 𝑨 = 𝑼𝐻 𝑹𝑼 and 𝑩 = 𝑽𝐻 𝑻𝑽;
▪ Compute 𝑫 = 𝑼𝐻 𝑪𝑽 ;
▪ if 𝑡𝑘+1,𝑘 = 0, for all 𝑘 then find 𝒀 using (𝑹 + 𝑡𝑘𝑘 𝑰)𝒚𝑘 = 𝒅𝑘 − ∑𝑘−1
𝑖=1 𝑡𝑖𝑘 𝒚𝑖
▪ else
▪ Find 𝒀 using
(𝑹 + 𝑡𝑘𝑘 𝑰) 𝑡𝑚𝑘 𝑰 𝒚𝑘 𝒅𝑘 𝑡𝑖𝑘 𝒚𝑖
𝑘−1
( ) ( ) = ( ) − ∑𝑖=1 ( ) with 𝑚 = 𝑘 + 1
𝑡𝑘𝑚 𝑰 (𝑹 + 𝑡𝑚𝑚 𝑰) 𝒚𝑚 𝒅𝑚 𝑡 𝒚 𝑖𝑚 𝑖
▪ end
▪ Compute 𝑿 = 𝑼𝒀𝑽𝐻 ;

Example: write a MATLAB code to solve the Sylvester equation by Bartels–Stewart


algorithm. Assume that the matrices 𝑨 and −𝑩 are of distinct real eigenvalues.

clear all, clc, % Dr. BEKHITI Belkacem 21/10/2020


M1=10*rand(5,5); A=M1*diag([-1 -2 -3 -4 -5])*inv(M1);
M2=10*rand(5,5); B=M2*diag([-1 -2 -3 -4 -5])*inv(M2);
M3=10*rand(5,5); C=M3*diag([1.5 2.5 3.5 4.5 5.5])*inv(M3);
[U R]=schur(A,'real'); [V T]=schur(B,'real'); F=U'*C*V; T
n=size(A,1); I=eye(n,n); d=zeros(n,1); Y=zeros(n,n);
for k=1:n
Y(:,k)=inv(R+T(k,k)*I)*(F(:,k)-Y(:,1:k-1)*T(1:k-1,k));
end
Zero=R*Y + Y*T – F, X=U*Y*V', Zero=A*X + X*B – C,

Example: write a MATLAB code to solve the Sylvester equation by Bartels–Stewart


algorithm. Assume that 𝑨 and −𝑩 are of distinct eigenvalues (May be complex conjugate).

clear all, clc, % Dr. BEKHITI Belkacem 21/10/2020


A=10*rand(5,5); B=10*rand(5,5); C=10*rand(5,5); n=size(A,1); I=eye(n,n);
[U R]=schur(A,'real'); [V T]=schur(B,'real'); F=U'*C*V; T
d=zeros(n,1); Y=zeros(n,n); W = [diag(T,-1);0];
for k=1:n
if W(k)==0
Y(:,k)=inv(R+T(k,k)*I)*(F(:,k)-Y(:,1:k-1)*T(1:k-1,k));
else
H=[R+T(k,k)*I T(k+1,k)*I;T(k,k+1)*I R+T(k+1,k+1)*I];
G=[Y(:,1:k-1)*T(1:k-1,k);Y(:,1:k-1)*T(1:k-1,k+1)];
Z=inv(H)*([F(:,k);F(:,k+1)]-G);
Y(:,k)=Z(1:n); Y(:,k+1)=Z(n+1:end);
k=k+2
end, end
Zero1=R*Y + Y*T – F, X=U*Y*V', Zero2=A*X + X*B - C
(Proposed Algorithm) The best-known and very widely used numerical
method for small and dense problems is the Hessenberg-Schur method by Golub, Nash,
and Van Loan. The Hessenberg-Schur method is an efficient implementation of the
Bartels-Stewart method proposed earlier, based on the reductions of both matrices to
Schur forms. Unfortunately, these methods are not practical for large and sparse
problems. In this section, we introduce a new iterative scheme based on fixed point
iteration. The scheme requires solution of a linear systems with multiple right-hand
sides at each iteration, which can be solved by using block Krylov subspace methods for
linear systems (see C. Brezinski (2002), Y. Saad (1996)). Our method method does not
require solution of a low-dimensional Sylvester equation at every iteration.

The main idea to solve the Sylvester equations is to write this equation as a block linear
system and then use some suitable iterative scheme. This can be accomplished by the
following change of variable: 𝑨𝑿 = 𝒁 where 𝒁 = 𝑪 − 𝑿𝑩. This possibility generates the
iterative method:

Algorithm: Block iterative method


input: 𝑨 ∈ ℝ𝑛×𝑛 , 𝑩 ∈ ℝ𝑚×𝑚 , 𝑪 ∈ ℝ𝑛×𝑚 𝑿0 ∈ ℝ𝑛×𝑚
output: 𝑿 ∈ ℝ𝑛×𝑚 , the solution of 𝑨𝑿 + 𝑿𝑩 = 𝑪

▪ for 𝑘 = 1,2, … until convergence do


if ‖𝑨−1 ‖‖𝑩‖ < 1 then Solve 𝑨𝑿𝑘+1 = 𝑪 − 𝑿𝑘 𝑩;
−1
elseif ‖𝑨 ‖‖𝑩‖ > 1 then Solve 𝑿𝑘+1 𝑩 = 𝑪 − 𝑨𝑿𝑘 ;
else display ('Try other method');
▪ end
Convergence: For the convergence we proceed as follows. From Algorithm we obtain that
𝑿𝑘 = 𝑨−1 (𝑪 − 𝑿𝑘−1 𝑩) and from SE we know that 𝑿 = 𝑨−1 (𝑪 − 𝑿𝑩) Combining these two
equations it follows that 𝑿 − 𝑿𝑘 = 𝑨−1 ( 𝑿 − 𝑿𝑘 )𝑩 and so,

𝑿 − 𝑿𝑘 = 𝑨−𝑘 ( 𝑿 − 𝑿0 )𝑩𝑘

Therefore ‖𝑿 − 𝑿𝑘 ‖ ≤ (‖𝑨−1 ‖ × ‖𝑩‖)𝑘 × ‖𝑿 − 𝑿0 ‖. Since ‖𝑨−1 ‖ × ‖𝑩‖ < 1 then the sequence
𝑿𝑘 converges to 𝑿 when 𝑘 tends to infinity. On the other hand if 𝑬𝑘 = 𝑿 − 𝑿𝑘 then
‖𝑬𝑘 ‖ ≤ ‖𝑨−1 ‖ × ‖𝑩‖ × ‖𝑬𝑘−1 ‖, hence the sequence 𝑿𝑘 converges q-linearly to the solution.

Now if ‖𝑨−1 ‖ × ‖𝑩‖ > 1 we should use the following iteration 𝑿𝑘+1 𝑩 = 𝑪 − 𝑨𝑿𝑘 .

clear all, clc, % Dr. BEKHITI Belkacem 21/10/2020


M1=10*rand(2,2); A=M1*diag([-1 -2])*inv(M1);
M2=10*rand(2,2); B=2*M2*diag([-3 -4])*inv(M2);
M3=10*rand(2,2); C=M3*diag([1.5 2.5])*inv(M3);
n=size(A,1); X=rand(n,n);
for k=1:10
if norm(A)>norm(B), X=inv(A)*(C-X*B); else X=(C-A*X)*inv(B); end,
end
X
Zero=A*X + X*B - C
XMATLAB = sylvester(A,B,C)
(The Best Proposed Algorithm) Here we present a new iterative scheme for
large-scale solution of the well-known Sylvester equation. The proposed scheme is based
on Broyden's method (which is a quasi-Newton method) and can make good use of the
recently developed methods for solving block linear systems. For the detail of
mathematical proof of the method see Chapter IV.

Algorithm: Broyden's method for Sylvester equations


input: 𝑨1 ∈ ℝ𝑛×𝑛 , 𝑨2 ∈ ℝ𝑚×𝑚 , 𝑨3 ∈ ℝ𝑛×𝑚 , 𝑿0 ∈ ℝ𝑛×𝑚 , 𝑩, 𝜀
output: 𝑿 ∈ ℝ𝑛×𝑚 , the solution of 𝑨1 𝑿 + 𝑿𝑨2 = 𝑨3
▪ while toll ≤ 𝜀 do
𝑭0 = 𝑨1 𝑿0 + 𝑿0 𝑨2 − 𝑨3 ;
𝐟0 = vec(𝑭0 ); 𝐱 0 = vec(𝑿0 ); 𝐱1 = 𝐱 0 − 𝑩𝐟0
𝐗1 = vec −1 (𝐱1 );
𝑭1 = 𝑨1 𝑿1 + 𝑿1 𝑨2 − 𝑨3 ;
𝐟1 = vec(𝑭1 ); 𝐱1 = vec(𝑿1 ); 𝐬 = 𝐱1 − 𝐱 0 ; 𝐲 = 𝐟1 − 𝐟0 ;
(𝐬 − 𝑩𝐲)(𝐬𝑇 𝑩)
𝑩=𝑩+ ;
𝐬𝑇 𝑩𝐲
𝐗 0 = 𝐗1 ; 𝑘 = 𝑘 + 1;
toll = ‖𝐬‖;
▪ end

clear all, clc, % Dr. BEKHITI Belkacem 21/10/2020


A1=10*rand(5,5); A2=10*rand(5,5); A3=10*rand(5,5); n=size(A1,1); k=1;
toll=1; I=eye(n,n); X0=I-0.1*rand(n,n);
B=inv(100*eye(n^2,n^2)); % initialization
while toll>1e-15
y0=X0*A1 + A2*X0 - A3;
f0=reshape(y0,n^2,1); % f0=[y0(:,1);y0(:,2)]; vectorisation of y0
x0=reshape(X0,n^2,1); % x0=[X0(:,1); X0(:,2)]; vectorization of X0
x1=x0-B*f0;
X1=reshape(x1,n,n); % X1=[x1(1:2,:) x1(3:4,:)]; Inverse of vec
y1=X1*A1 + A2*X1 - A3;
f1=reshape(y1,n^2,1); % f1=[y1(:,1);y1(:,2)]; vectorisation of y1
x1=reshape(X1,n^2,1); % x1=[X1(:,1); X1(:,2)]; vectorisation of X1
y=f1-f0; s=x1-x0;
B = B + ((s-B*y)*(s'*B))/(s'*B*y);
X0=X1; k=k+1; toll=norm(s);
end
X1, ZERO1=y1=X1*A1 + A2*X1 - A3

Example: (Application) Considered for the computational comparison of the described


methods will be the discretization of the well-known Poisson equation on the unit
square. 𝜕 2 𝑢⁄𝜕𝑥 2 + 𝜕 2 𝑢⁄𝜕𝑦 2 = 𝑓(𝑥, 𝑦) (𝑥, 𝑦) ∈ [−1,1], With homogeneous Dirichlet
boundary conditions, such that 𝑢(±1,0) = 𝑢(0, ±1) = 0, and the right-hand side 𝑓(𝑥, 𝑦)
chosen such that the reference solution 𝑢(𝑥, 𝑦) is given by

𝑢(𝑥, 𝑦) = (1 − 𝑥 2 )(1 − 𝑦 2 ) cos(2𝑥𝑦)


%Dr. BEKHITI Belkacem 21/10/2020

clear all, clc,

[x,y] = meshgrid([-1:.05:1]);

z=(1-x.^2).*(1-y.^2).*cos(20.*x.*y);

s=surf(x,y,z),

axis equal

as visualised in Figure. The most common approach is the discretization by a five-point


stencil on an (𝑛 + 1) × (𝑛 + 1) equi-spaced grid, therefore this will be considered first. The
discretization results in a sparse Sylvester (Lyapunov) equation.

The detailed results of the comparison found in Gerhardus Petrus Kirsten 2018.

Example: (Roth's removal rule) Given two square matrices 𝑨 and 𝑩, of size 𝑛 and 𝑚,
and a matrix 𝑪 of size 𝑛 × 𝑚, then one can ask when the following two square matrices of
𝑨 𝑪 𝑨 𝟎
size 𝑛 + 𝑚 are similar to each other: ( ) & ( ). The answer is that these two
𝟎 𝑩 𝟎 𝑩
matrices are similar exactly when there exists a matrix 𝑿 such that 𝑨𝑿 − 𝑿𝑩 = 𝑪. In other
words, 𝑿 is a solution to a Sylvester equation. This is known as Roth's removal rule.

𝑰 𝑿 𝑨 𝑪 𝑰 −𝑿 𝑨 𝟎
One easily checks one direction: If 𝑨𝑿 − 𝑿𝑩 = 𝑪 then ( )( )( )=( ).
𝟎 𝑰 𝟎 𝑩 𝟎 𝑰 𝟎 𝑩
7.2.2 Numerical solutions of Generalized Sylvester's Equation: The term generalized
refers to a very wide class of equations, which includes systems of matrix equations,
bilinear equations and problems where the coefficient matrices are rectangular. We start
with the most common form of generalized Sylvester equation, namely 𝑨𝑿𝑩 + 𝑪𝑿𝑫 = 𝑬.
Which differs from the previous equation for the occurrence of coefficient matrices on
both sides of the unknown solution 𝑿. If 𝑩 and 𝑪 are both nonsingular, left
multiplication by 𝑪−1 and right multiplication by 𝑩−1 lead to a standard Sylvester
equation, with the same solution matrix 𝑿. In case either 𝑩 or 𝑪 are ill-conditioned, such
a transformation may lead to severe instabilities. The case of singular 𝑩 and 𝑪,
especially for 𝑩 = 𝑪⊤ and 𝑨 = 𝑫⊤ has an important role in the solution of differential-
algebraic equations and descriptor systems.

A natural extension of the Bartels-Stewart method can be implemented for numerically


solving 𝑨𝑿𝑩 + 𝑪𝑿𝑫 = 𝑬 when dimensions are small, and this was discussed in Peter
Benne 2013 and T. Penzl 1998, where the starting point is a QZ decomposition of the
pencils (𝑨, 𝑪) and (𝑫, 𝑩), followed by the solution of a sequence of small (1 × 1 or 2 × 2)
generalizedSylvester equations, which is performed by using their Kronecker form.

The problem can be recast as a standard Sylvester equation even if ill-conditioned 𝑩 and
𝑪, one could consider using a specifically selected 𝛼 ∈ ℝ (or 𝛼 ∈ ℂ) such that the two
matrices 𝑪 + 𝛼𝑨 and 𝑩 − 𝛼𝑫 are better conditioned and the solution uniqueness is
ensured, and rewrite 𝑨𝑿𝑩 + 𝑪𝑿𝑫 = 𝑬 as the following equivalent generalized Sylvester
matrix equation, 𝑨𝑿(𝑩 − 𝛼𝑫) + (𝑪 + 𝛼𝑨)𝑿𝑫 = 𝑬 ⟺ 𝑨1 𝑿 + 𝑿𝑨2 = 𝑨3 . Other generalizations of
the Sylvester equation have attracted the attention of many researchers. In some cases
the standard procedure for their solution consists in solving a (sequence of) related
standard Sylvester equation(s), so that the computational core is the numerical solution
of the latter by means of some of the procedures discussed in previous sections. We thus
list here some of the possible generalizations more often encountered and employed in
real applications. We start by considering the case when the two coefficient matrices can
be rectangular. This gives 𝑨𝑿 + 𝒀𝑩 = 𝑪 where 𝑿, 𝒀 are both unknown, and 𝑨, 𝑩 and 𝑪 are
all rectangular matrices of conforming dimensions. Equations of this type arise in control
theory, for instance in output regulation with internal stability, where the matrices are in
fact polynomial matrices (see, e.g., H. K. Wimmer 1996).

The two-sided version of previous equation is given by 𝑨𝑿𝑩 + 𝑪𝒀𝑫 = 𝑬 and this is an
example of more complex bilinear equations.

A typical generalization is given by the following bilinear equation:

𝑨𝑿𝑩 + 𝑪𝑿𝑫 = 𝑬𝒀 + 𝑭

where the pair (𝑿, 𝒀) is to be determined, and 𝑿 occurs in two different terms.

Example: write a MATLAB code to solve the generalized Sylvester equation by the
proposed Broyden's algorithm.

clear all, clc, % Dr. BEKHITI Belkacem 21/10/2020


A1=10*rand(6,6); A2=10*rand(6,6); A3=10*rand(6,6); A4=10*rand(6,6);
A5=10*rand(6,6); n=size(A1,1); I=eye(n,n); k=1; toll=1;
X0=I-0.1*rand(n,n); B=inv(100*eye(n^2,n^2)); % initialization
while toll>1e-15
y0= A1*X0*A2 + A3*X0*A4 - A5;
f0=reshape(y0,n^2,1); % f0=[y0(:,1);y0(:,2)]; vectorisation of y0
x0=reshape(X0,n^2,1); % x0=[X0(:,1); X0(:,2)]; vectorization of X0
x1=x0-B*f0;
X1=reshape(x1,n,n); % X1=[x1(1:2,:) x1(3:4,:)]; Inverse of vec
y1= A1*X1*A2 + A3*X1*A4 - A5;
f1=reshape(y1,n^2,1); % f1=[y1(:,1);y1(:,2)]; vectorisation of y1
x1=reshape(X1,n^2,1); % x1=[X1(:,1); X1(:,2)]; vectorisation of X1
y=f1-f0; s=x1-x0;
B = B + ((s-B*y)*(s'*B))/(s'*B*y);
X0=X1; k=k+1;toll=norm(s);
end
X1, ZERO1= A1*X1*A2 + A3*X1*A4 - A5
7.2.3 Numerical solutions of Lyapunov's Equation: The Lyapunov equation occurs in
many branches of control theory, such as stability analysis and optimal control. This
and related equations are named after the Russian mathematician Aleksandr Lyapunov.

The continuous Lyapunov equation: 𝑨𝑿 + 𝑿𝑨⊤ = 𝑸 or 𝑿𝑨 + 𝑨⊤ 𝑿 = 𝑸


The discrete Lyapunov equation: 𝑨𝑿𝑨⊤ − 𝑿 = 𝑸 or 𝑨⊤ 𝑿𝑨 − 𝑿 = 𝑸

where 𝑸 is a Hermitian (symmetric) matrix. Since the Lyapunov equation is a special


case of the Sylvester equation, the following corollary is immediate.

Corollary: Let 𝜆1 , 𝜆2 , . . . , 𝜆𝑛 be the eigenvalues of 𝑨. Then the Lyapunov equation


𝑿𝑨 + 𝑨⊤ 𝑿 = 𝑸 has a unique solution 𝑿 if and only if 𝜆𝑖 + 𝜆𝑗 ≠ 0, 𝑖 = 1, . . . , 𝑛; 𝑗 = 1, . . . , 𝑛.

The following result on the uniqueness of the solution X of the discrete Lyapunov
Equation 𝑨⊤ 𝑿𝑨 − 𝑿 = 𝑸.

Let 𝜆1 , . . . , 𝜆𝑛 be the eigenvalues of 𝑨. Then the discrete Lyapunov equation 𝑨⊤ 𝑿𝑨 − 𝑿 = 𝑸


has a unique solution 𝑿 if and only if 𝜆𝑖 𝜆𝑗 ≠ 1, 𝑖 = 1, . . . , 𝑛; 𝑗 = 1, . . . , 𝑛.

If 𝑨 is stable, the analytical integral solution 𝑿 can be written as


∞ ∞
⊤ ⊤
Continuous Lyapunov equation 𝑿 = − ∫ 𝑒 𝑨𝑡 𝑸𝑒 𝑨 𝑡 𝑑𝑡 or 𝑿 = − ∫ 𝑒 𝑨 𝑡 𝑸𝑒 𝑨𝑡 𝑑𝑡
0 0
∞ ∞

Discrete Lyapunov equation 𝑿 = − ∑(𝑨)𝑘 𝑸(𝑨⊤ )𝑘 or 𝑿 = − ∑(𝑨⊤ )𝑘 𝑸𝑨𝑘


0 0

Example: write a MATLAB code to solve the Continuous Lyapunov equation by the
numerical integration algorithm.

clear all, clc,


M1=10*rand(3,3); A =M1*diag([-1 -2 -3])*inv(M1);
M2=10*rand(3,3); C =M2*diag([3 4 5])*inv(M2) ; Q= C*C';
n=size(A,1); I=eye(n,n); Ts=0.001;
EA=I + Ts*A; QT=Ts*Q; X=zeros(n,n); k=0; e=1;

while e>1e-15
X=X - (EA^k)*(QT)*((EA')^k);
e=norm(A*X+X*A'-Q);
k=k+1;
end

X
Zero = A*X+X*A'-Q
XMATLAB = lyap(A,-Q) % verifications using MATLAB
Zero=A*XMATLAB + XMATLAB*A'-Q
Example: write a MATLAB code to solve the Discrete Lyapunov equation by the
numerical integration algorithm.

clear all, clc,


M1=10*rand(3,3); A =M1*diag([0.1 0.2 0.3])*inv(M1);
M2=10*rand(3,3); C =M2*diag([3 4 5])*inv(M2) ; Q= C*C';
n=size(A,1); X=zeros(n,n); k=0; e=1;

while e>1e-10
X=X - (A^k)*(Q)*((A')^k);
e=norm(A*X*A'-X-Q);
k=k+1;
end

X
Zero = A*X*A'-X-Q
XMATLAB = dlyap(A,-Q) % verifications using MATLAB
Zero=A*XMATLAB*A' - XMATLAB -Q

Remark: the numerical integration algorithm is not recommended because it is costly


numerically unstable, and is applicable only for stable matrices.

All the previous algorithms introduced before are applicable to Lyapunov equation.

An algebraic Riccati
equation is a type of nonlinear equation that arises in the context of infinite-horizon
optimal control problems in continuous time or discrete time.

A typical algebraic Riccati equation is similar to one of the following:

The continuous time algebraic Riccati equation ∶ 𝑨⊤ 𝑿 + 𝑿𝑨 − 𝑿𝑩𝑹−1 𝑩⊤ 𝑿 + 𝑸 = 𝟎


The discrete time algebraic Riccati equation ∶ 𝑿 = 𝑨⊤ 𝑿𝑨 − (𝑨⊤ 𝑿𝑩)(𝑹 + 𝑩⊤ 𝑿𝑩)−1 (𝑩⊤ 𝑿𝑨) + 𝑸

𝑿 is the unknown 𝑛 by 𝑛 symmetric matrix and 𝑨, 𝑩, 𝑸, 𝑹 are known real coefficient


matrices. Though generally this equation can have many solutions, it is usually specified
that we want to obtain the unique stabilizing solution, if such a solution exists.

The name Riccati is given to these equations because of their relation to the Riccati
differential equation.

Remark: Riccati's equation is a nonlinear equation that appeared in the optimal control
domain of continuous and discrete linear systems. In practice, it needs a parametric
solution rather than a numerical one. And some limitations in the optimal control
impose to find an analytical parametrization, but we try to suggest some numerical
algorithms through which we can solve this equation within the limits of some
conditions.
7.3.1 Riccati Equation by Newton’s method: The Newton’s method can be used also
for solving equations of the kind 𝑭(𝑿) = 𝟎, where 𝑭: 𝓥 → 𝓥 is a differentiable operator in a
Banach space (we are interested only in the case in which 𝓥 is ℂ𝑚×𝑛 ). The sequence is
defined by
−1
𝑿𝑘+1 = 𝑿𝑘 + (𝑭′𝑿𝑘 ) 𝑭(𝑿𝑘 ) 𝑿𝑘 ∈ 𝓥

Where 𝑭′𝑿𝑘 is the Fréchet derivative of 𝑭 at the point 𝑿.

The Fréchet derivative is a derivative defined on Banach spaces. Named after Maurice
Fréchet, it is commonly used to generalize the derivative of a real-valued function of a
single real variable to the case of a vector-valued function of multiple real variables, and
to define the functional derivative used widely in the calculus of variations.

The Fréchet derivative a point 𝑿 ∈ ℂ𝑚×𝑛 is a linear mapping


𝐿(𝑿)
ℂ𝑚×𝑛 → ℂ𝑚×𝑛
𝑬 ⟶ 𝑳(𝑿, 𝑬) = 𝑭′𝑿𝑘 𝑯𝑘

such that for all 𝑬 ∈ ℂ𝑚×𝑛 : 𝑭(𝑿 + 𝑬) − 𝑭(𝑿) − 𝑳(𝑿, 𝑬) = 𝒪(‖𝑬‖), and it therefore describes
the first order effect on 𝑭 of perturbations in 𝑿. In the practical computation it is
preferable to avoid constructing and inverting explicitly 𝑭′𝑿𝑘 . Thus, a better way to
compute one step of Newton’s method is to define the Newton increment 𝑯𝑘 ∶= 𝑿𝑘+1 − 𝑿𝑘
and to solve the linear matrix equation in the unknown 𝑯𝑘 in order to get 𝑿𝑘+1 :

𝑭′𝑿𝑘 𝑯𝑘 = −𝑭(𝑿𝑘 ) and 𝑿𝑘+1 = 𝑿𝑘 + 𝑯𝑘

The convergence of the method in Banach spaces is less straightforward than in the
scalar case, and it is described by the Newton–Kantorovich theorem.

Consider the Riccati operator: 𝓡(𝑿) = 𝑿𝑨 + 𝑨⊤ 𝑿 − 𝑿𝑩𝑿 + 𝑪. The Fréchet derivative of 𝓡(𝑿)
at a point 𝑿 is

𝓡′ (𝑿)𝑯 = 𝓡(𝑿 + 𝑯) − 𝓡(𝑿) = (𝑨⊤ − 𝑿𝑩)𝑯 + 𝑯(𝑨 − 𝑩𝑿)

Thus, the 𝑘 𝑡ℎ step of Newton’s method for continuous time algebraic Riccati equation
consists in solving the Sylvester equation.

𝓡′ (𝑿𝑘 )𝑯𝑘 = −𝓡(𝑿𝑘 ) ⇔ (𝑨⊤ − 𝑿𝑘 𝑩)𝑯𝑘 + 𝑯𝑘 (𝑨 − 𝑩𝑿𝑘 ) = −𝓡(𝑿𝑘 )

in the unknown 𝑯𝑘 and setting 𝑿𝑘+1 = 𝑿𝑘 + 𝑯𝑘 . Observe that if 𝑿𝑘 is a Hermitian matrix,


then the last equation is a Lyapunov equation. Thus, if 𝑿0 is Hermitian and the sequence
{𝑿𝑘 }𝑘 is well defined, then 𝑿𝑘 is Hermitian for each 𝑘.

The corresponding code is reported in Listing below; this code makes use of the function
sylvester for the solution of a matrix sylvester equation described in before.

The standard results on the convergence of Newton’s method in Banach spaces yield
locally quadratic convergence in a neighborhood of the stabilizing solution 𝑿+ . This
property guarantees that the method is self-correcting, that is, a small perturbation
introduced at some step 𝑘 of the iteration does not affect the convergence.
clear all, clc,
% This algorithm solves: C+XA+A'X-XBX=0 by means of Newton’s method.
% A,B & C: matrix coefficients.
% X0: initial approximation & X: the solution
A=10*rand(4,4); B=10*rand(4,4); C=10*rand(4,4); X0= rand(4,4);
tol = 1e-13; kmax = 80; X = X0; err = 1; k=0;
while err>tol && k<kmax
RX = C + X*A + A'*X - X*B*X;
H =sylvester(A'-X*B,A-B*X,-RX) ;
X=X+H;
err =norm(H,1)/norm(X,1); k=k+1;
end
if k == kmax
disp('Warning: reached the maximum number of iterations')
end
Zero=C + X*A + A'*X - X*B*X

7.3.2 Riccati Equation by Matrix Sign Function Method: The function sign(𝑧) is
defined for a nonimaginary complex number 𝑧 as the nearest square root of unity. Its
matrix counterpart can be defined for any matrix 𝑾 with no purely imaginary
eigenvalues relying on the Jordan canonical form of 𝑾,

𝑱+ 𝟎
𝑱 = 𝑽−1 𝑾𝑽 = ( )
𝟎 𝑱−

where we have grouped the Jordan blocks so that the eigenvalues of 𝑱+ have positive real
part, while the eigenvalues of 𝑱− have negative real part. We define the matrix sign of 𝑾
as
𝑰𝑝 𝟎
sign(𝑾) = 𝑽 ( ) 𝑽−1
𝟎 −𝑰𝑞

where 𝑝 is the size of 𝑱+ and 𝑞 is the size of 𝑱− . Observe that according to this definition,
sign(𝑾) is a matrix function. From the last equation it follows that sign(𝑾) − 𝑰 has rank
𝑞, while sign(𝑾) − 𝑰 has rank 𝑝;

Theorem: Let the continuous time algebraic Riccati equation 𝓡(𝑿) = 𝑪 + 𝑿𝑨 + 𝑫𝑿 − 𝑿𝑩𝑿
have a stabilizing solution 𝑿+ , namely 𝜎(𝑨 − 𝑩𝑿+ ) ⊂ ℂ− or sign(𝑨 − 𝑩𝑿+ ) = −𝑰, and let
𝑨 −𝑩
𝑯=( ) be the corresponding Hamiltonian matrix. Partition sign(𝑯) + 𝑰 = [𝑾1 𝑾2 ],
−𝑪 −𝑫
where 𝑾1 , 𝑾2 ∈ ℂ2n×n . Then 𝑿+ such that 𝓡(𝑿+ ) = 0 is the unique solution of the
overdetermined system 𝑾2 𝑿+ = −𝑾1 .

Proof: from the fact that


𝑨𝑽 = 𝑽𝚲 ⟺ f(𝑨)𝑽 = 𝑽f(𝚲)
{ 𝑰 𝑰
𝑯 [ ] = [ ] (𝑨 − 𝑩𝑿+ )
𝑿+ 𝑿+
If we let f(𝑧) = sign(𝑧) + 1 one has
𝑰 𝑰 𝑰
[𝑾1 𝑾2 ] [ ] = (sign(𝑯) + 𝑰) [ ] = [ ] (sign(𝑨 − 𝑩𝑿+ ) + 𝑰) = 0 ⟹ 𝑾2 𝑿+ = −𝑾1
𝑿+ 𝑿+ 𝑿+
𝑰 𝟎
On the other hand, (sign(𝑯) + 𝑰) [ ] = [𝟎 𝑾2 ] and since the left-hand side of the
𝑿+ 𝑰
latter equality has rank 𝑛, 𝑾2 has full rank and 𝑿+ is the unique solution of Riccati
equation. ■

Once the sign of 𝑯 is computed, in order to get the required solution it is enough to solve
the overdetermined system. This task can be accomplished by using the standard
algorithms for solving an overdetermined system, such as the QR factorization of 𝑾2 .

Computing the matrix sign function: The simpler iteration is obtained by Newton’s
2
method applied to 𝑿2 − 𝑰 = 𝟎, which is appropriate since sign(𝑾) yields (sign(𝑾)) − 𝑰 = 𝟎.
1
The resulting iteration is 𝑿𝑘+1 = 2 (𝑿𝑘 + 𝑿−1
𝑘 ), whose convergence properties are
synthesized in the following result.

Theorem: If 𝑾 ∈ ℂ𝑛×𝑛 has no purely imaginary eigenvalues, then the sequence


1
𝑿𝑘+1 = (𝑿𝑘 + 𝑿𝑘−1 ), 𝑘 = 0,1, …
2
With 𝑿0 = 𝑾 converges quadratically to 𝑺 = sign(𝑾). Moreover,

1 −1
‖𝑿𝑘+1 − 𝑺‖ ≤ ‖𝑿 ‖‖𝑿𝑘 − 𝑺‖2
2 𝑘
for any operator norm.
1
Iteration 𝑿𝑘+1 = 2 (𝑿𝑘 + 𝑿−1
𝑘 ) together with the termination criterion ‖𝑿𝑘+1 − 𝑿𝑘 ‖ ≤ 𝜀, for
some norm and a tolerance ε, provides a rough algorithm for the sign function.

There is a scaling technique which dramatically accelerates the convergence. The idea is
simple but effective: Since sign(𝑾) = sign(𝑐𝑾) for any 𝑐 > 0, the limit of the sequence
{𝑿𝑘 }𝑘 does not change if at each step one “scales” the matrix 𝑿𝑘 in the following way:

clear all, clc,


% This Algo solves: C+XA+DX-XBX=0 by means of the matrix sign function
% A, B, C, D: matrix coefficients & X is the solution of: C+XA+DX-XBX=0
A=10*rand(4,4); B=10*rand(4,4); C=10*rand(4,4); D=10*rand(4,4);
H=[A,-B;-C,-D]; n=size(H,1); tol=1e-13; kmax=1000; err=1; SH=H; k=0;
while err>tol && k<kmax
[L,U,P] =lu(SH);
c=abs(prod(diag(U)))^(-1/n); % Byers’ determinantal scaling method
SH = SH*c;
Z = L\P;
Z = (c*U)\Z;
err =norm(SH-Z,1)/norm(SH,1);
SH =(SH + Z)*0.5;
k=k+1;
end
[n,m] =size(B); W = SH + eye(m+n);
X = -W(1:m+n,n+1:n+m)\W(1:m+n,1:n)
Zero=C + X*A + D*X - X*B*X
Remark: According to the convergence of the matrix sign function the algorithm does not
always converge. Accordingly, it should be executed too many times.

Reliable software for solving matrix equations has been available for a long time, due to
its fundamental role in control applications; in particular, the SLICE Library was made
available already in 1986, the LAPACK ("Linear Algebra Package") was initially released
in 1992. Most recent versions of MATLAB also rely on calls to SLICOT routines within
the control-related Toolboxes. SLICOT includes a large variety of codes for model
reduction and nonlinear problems on sequential and parallel architectures;

[1] V.N. Faddeeva (1959). Computational Methods of Linear Algebra, Dover, New York.
[2] A.S. Householder (1964). Theory of Matrices in Numerical Analysis, Blaisdeli, New York.
Reprinted in 1974 by Dover, New York.
[3] L. Fox A964). An Introduction to Numerical Linear Algebra, Oxford University Press, Oxford,
England.
[4] J.H. Wilkinson (1965). The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, England.
[5] G.W. Stewart (1973). Introduction to Matrix Computations, Academic Press, New York.
[6] T.F. Coleman and C.F. Van Loan A988). Handbook for Matrix Computations, SIAM
Publications, Philadelphia, PA.
[7] B.N. Datta (1995). Numerical Linear Algebra and Applications. Brooks/Cole Publishing
Company, Pacific Grove, California.
[8] C.F. Van Loan (1997). Introduction to Scientific Computing: A Matrix-Vector Approach Using
Matlabt Prentice Hall, Upper Saddle River, NJ.
[9] N.J. Higham (1996). Accuracy and Stability of Numerical Algorithmst SIAM Publications,
Philadelphia, PA.
[10] F.R. Gantmacher (1959). The Theory of Matrices Vol. 1, and Vol. 2 Chelsea, New York.
[11] G.W. Stewart and J. Sun 1990). Matrix Perturbation Theory, Academic Press, San Diego.
[12] R. Horn and C Johnson (1985). Matrix Analysis, Cambridge University Press, New York.
[13] R. Horn and C. Johnson (1991). Topics in Matrix Analysis, Cambridge University Press, New
York.
[14] Y. Saad (1996). Iterative Methods for Sparse Linear Systems, PWS Publishing Co., Boston.
[15] B.N. Parlett (1980). The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood Cliffe, NJ.
[16] Y. Saad (1992). Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms,
John WUey and Sons, New York.
[17] G.H. Golub and J.M. Ortega (1993). Scientific Computing: An Introduction with Parallel
Computing, Academic Press, Boston.
[18] G.H. Golub and P. Van Dooren, eds. (1991). Numerical Linear Algebra, Digital Signal
Processing, and Parallel Algorithms. Springer-Verlag, Berlin.
[19] Athanasios C. Antoulas, (2005). Approximation of Large-Scale Dynamical Systems, by the
Society for Industrial and Applied Mathematics.
CHAPTER III:
Inversion of Matrix Polynomials
Inversion of Matrix Polynomials
Introduction: The study of a constant regular matrix 𝑨 ∈ ℂ𝑚×𝑛 have been attracted the
attention of many mathematical researchers, and various methods are used for
computing the usual inverse, One of such algorithms is the Leverrier-Faddeev scheme
(also called Souriau-Frame algorithm) too many alternatives of the Leverrier-Faddeev
algorithm are presented in J.S. Frame et.al 1949, D.K. Faddeev et.al 1963, S. Barnett
et.al 1989 and Guorong. Wang et.al 1993. A more general algorithm for computing the
generalized inverses (Moore-Penrose) of a given rectangular or singular constant matrix
𝑨 ∈ ℂ𝑚×𝑛 , based on the Leverrier-Faddeev algorithm, is originated in H.P. Decell et.al
1965. Also, in S. Barnett et.al 1989, a new derivation of the Leverrier-Faddeev algorithm
is utilized to produce a computational scheme for the inverse of a 2nd degree matrix
polynomial 𝑨(𝜆) = 𝜆2 𝑰𝑛 + 𝜆𝑨1 + 𝑨2 . Additionally, an extension of Leverrier’s algorithm for
computing the inverse of an arbitrary degree matrix polynomial is introduced in G. Wang
et.al 1993 but it is so complicated and not suitable for implementation. In N.P.
Karampetakis et.al 1997 it is derived a representation and an algorithm for computation
of the Moore-Penrose inverse of a non-regular polynomial matrix of an arbitrary degree.
In J. Jones et.al 1998 it is described an algorithm for computing the Moore-Penrose
inverse of a singular rational matrix and its implementation in the symbolic
computational language MAPLE. In this chapter we have introduced a new way and a
new proof of an efficient and powerful scheme for the computation of the inverse of an
arbitrary degree matrix polynomial.

Before starting introducing the proposed algorithms and developments it is


recommended to give a brief introduction and clear picture about the Leverrier-Faddeev
algorithms. We start by constant matrices and thereafter we will give some extensions.

Leverrier-Faddeev Algorithm It is well-known that the characteristic polynomial of an


𝑛 × 𝑛 matrix 𝑨 is given by 𝑝(𝑠) = det(𝑠𝑰 − 𝑨). So, it is of great importance to deal with 𝑒 𝐴𝑡 .
𝑵(𝑠) 1
It is very well − known that: {𝑒𝐴𝑡 } = (𝑠𝑰 − 𝑨)−1 = = (𝑵 𝑠 𝑛−1 + 𝑵2 𝑠 𝑛−2 + ⋯ + 𝑵𝑛 )
𝑝(𝑠) 𝑝(𝑠) 1

where 𝑝(𝑠) = 𝑠 𝑛 + 𝑎1 𝑠 𝑛−1 + 𝑎2 𝑠 𝑛−2 + ⋯ + 𝑎𝑛 and the adjugate matrix 𝑵(𝑠) is a matrix
polynomial in 𝑠 of degree 𝑛 − 1 with 𝑛 × 𝑛 constant coefficient matrices 𝑵1 , … , 𝑵𝑛 .

The above equation can be re-written as: (𝑠𝑰 − 𝑨)𝑵(𝑠) = 𝑝(𝑠)𝑰, expanding this
multiplication we get:

𝑵1 𝑠 𝑛 + (𝑵2 − 𝑨𝑵1 )𝑠 𝑛−1 + (𝑵3 − 𝑨𝑵2 )𝑠 𝑛−2 + ⋯ + (𝟎 − 𝑨𝑵1 ) = (𝑠 𝑛 + 𝑎1 𝑠 𝑛−1 + ⋯ + 𝑎𝑛 )𝑰.

By equating identical powers we obtain:

𝑵1 = 𝑰
𝑵2 = 𝑨𝑵1 + 𝑎1 𝑰
𝑵3 = 𝑨𝑵2 + 𝑎2 𝑰

{ 𝟎 = 𝑨𝑵𝑛 + 𝑎𝑛 𝑰
If the coefficients 𝑎1 , … , 𝑎𝑛 of the characteristic polynomial 𝑝(𝑠) were known, last equation
would then constitute an algorithm for computing the matrices 𝑵1 , … , 𝑵𝑛 . Leverrier-
Faddeev proposed a recursive algorithm which will compute 𝑵𝑖 and 𝑎𝑖 in parallel, even if
the coefficients 𝑎𝑖 are not known in advance.

Important Note: Albeit that the Leverrier-Faddeev method has been extensively covered
in most books of linear system theory, but unfortunately, the majority of these books do
not give a proof of the coefficient formulas.

To complete the proof let we consider


𝑒 𝜆1 𝑡 𝑛
𝑒 𝐴𝑡 = 𝑉 ( ⋱ ) 𝑉 −1 ⟹ trace(𝑒 𝐴𝑡 ) = ∑ 𝑒 𝜆𝑖 𝑡
𝑖=1
𝑒 𝜆𝑛 𝑡
Using the Laplace transform we get
𝑛 1 𝑝′ (𝑠)
trace ( {𝑒𝑨𝑡 }) = ∑ ( )=
𝑖=1 𝑠 − 𝜆𝑖 𝑝(𝑠)
From the other hand we know that:
𝑑 𝑨𝑡
𝑒 = 𝑨𝑒 𝑨𝑡
𝑑𝑡
⟹ 𝑠 {𝑒𝑨𝑡 } − 𝑰 = 𝑨 {𝑒𝑨𝑡 }
𝑑
{ 𝑓(𝑡)} = 𝑠𝐹(𝑠) − 𝑓(0)
{ 𝑑𝑡

Using the trace of the last equation we get:

𝑝′ (𝑠) 𝑵(𝑠) 1
𝑠 − 𝑛 = trace (𝑨 {𝑒𝑨𝑡 }) = trace (𝑨 )= trace(𝑨𝑵(𝑠))
𝑝(𝑠) 𝑝(𝑠) 𝑝(𝑠)

Which can be written as: 𝑠𝑝′ (𝑠) − 𝑛𝑝(𝑠) = trace(𝑨𝑵(𝑠)). This is equivalent to

−𝑎1 𝑠 𝑛−1 − 2𝑎2 𝑠 𝑛−2 − ⋯ − (𝑛 − 1)𝑎𝑛−1 𝑠 − 𝑛𝑎𝑛 = tr(𝑨𝑵1 )𝑠 𝑛−1 + tr(𝑨𝑵2 )𝑠 𝑛−2 + ⋯ + tr(𝑨𝑵𝑛 )

Comparing coefficients in the equation yields the relations

1
𝑎𝑘 = − tr(𝑨𝑵𝑘 ) 𝑘 = 1,2, … , 𝑛
𝑘
Finally we obtain:
𝑵1 𝑠 𝑛−1 + 𝑵2 𝑠 𝑛−2 + ⋯ + 𝑵𝑛
(𝑠𝑰 − 𝑨)−1 =
𝑠 𝑛 + 𝑎1 𝑠 𝑛−1 + 𝑎2 𝑠 𝑛−2 + ⋯ + 𝑎𝑛
where
𝑵1 = 𝑰 𝑎1 = −tr(𝑨𝑵1 )
1
𝑵2 = 𝑨𝑵1 + 𝑎1 𝑰 𝑎2 = − tr(𝑨𝑵2 )
2
1
𝑵3 = 𝑨𝑵2 + 𝑎2 𝑰 𝑎3 = − tr(𝑨𝑵3 )
3
⋮ ⋮
𝑵𝑛 = 𝑨𝑵𝑛−1 + 𝑎𝑛−1 𝑰 1
𝑎𝑛 = − tr(𝑨𝑵𝑛 )
𝑛
{ 𝟎 = 𝑨𝑵𝑛 + 𝑎𝑛 𝑰
Remark: Back-substitution and recursive evaluation of the Leverrier-Faddeev algorithm
will lead to the Cayley-Hamilton theorem.

Inverse of Constant Matrix: We now present a variation of the method proposed by D.


K. Faddeev which, besides simplifying the computation of the coefficients of the
characteristic polynomial, permits us to determine the inverse matrix and the
eigenvectors of the matrix.

Assuming that the characteristic polynomial of 𝑨 is

∆(𝜆) = 𝛼0 𝜆𝑛 + 𝛼1 𝜆𝑛−1 + ⋯ + 𝛼𝑛−1 𝜆 + 𝛼𝑛 with 𝛼0 = 1

By the Cayley-Hamilton theorem 𝛼0 𝑨𝑛 + 𝛼1 𝑨𝑛−1 + ⋯ + 𝛼𝑛−1 𝑨 + 𝛼𝑛 𝑰 = 𝟎 this implise that

1
𝑨−1 = − (𝑨𝑛−1 + 𝛼1 𝑨𝑛−2 + ⋯ + 𝛼𝑛−1 𝑰)
𝛼𝑛

To avoid more power computations we follow the Faddeev algorithm

𝛼0 = 1 𝑩0 = 𝑰
𝛼1 = −trace(𝑨𝑩0 ) 𝑩1 = 𝑨𝑩0 + 𝛼1 𝑰
1
𝛼2 = − trace(𝑨𝑩1 ) 𝑩2 = 𝑨𝑩1 + 𝛼2 𝑰
2
1
𝛼3 = − trace(𝑨𝑩2 ) 𝑩3 = 𝑨𝑩2 + 𝛼3 𝑰
3

1
𝛼𝑛 = − trace(𝑨𝑩𝑛−1 ) 𝑩𝑛 = 𝑨𝑩𝑛−1 + 𝛼𝑛 𝑰 = 𝟎
𝑛
if 𝑨 is a non-singular matrix, then 𝑨−1 = −𝑩𝑛−1 /𝛼𝑛 .

Remark: If we do a back-substitution of 𝑩𝑖 we obtain 𝑩1 = 𝑨 + 𝛼1 𝑰 ⟹ 𝑩2 = 𝑨2 + 𝛼1 𝑨 + 𝛼2 𝑰


etc … 𝑩𝑛 = ∑𝑛𝒊=0 𝛼𝑖 𝑨𝑛−𝑖 and from the Cayley-Hamilton theorem 𝑩𝑛 = 𝟎 ⟹ 𝑩𝑛−1 = −𝛼𝑛 𝑨−1 .

Eigenvectors of Constant Matrix: We now pass on to determining the eigenvectors of


matrix 𝑨. Let the eigenvalues be already computed and distinct. We construct the
matrix
𝑸𝑘 = 𝜆𝑛−1 𝑛−2
𝑘 𝑰 + 𝜆𝑘 𝑩1 + ⋯ + 𝜆𝑘 𝑩𝑛−2 + 𝑩𝑛−1

where 𝑩𝑖 are matrices computed while finding the inverse matrix and 𝜆𝑘 is the 𝑘 𝑡ℎ
eigenvalue of the matrix 𝑨.

Assuming that all 𝜆1 , … , 𝜆𝑛 are distinct, one may prove that the matrix 𝑸𝑘 is non-zero. We
shall show that every column of 𝑸𝑘 consists of components of the eigenvector belonging
to the eigenvalue 𝜆𝑘 .

In fact,
(𝜆𝑘 𝑰 − 𝑨)𝑸𝑘 = (𝜆𝑘 𝑰 − 𝑨)(𝜆𝑛−1 𝑛−2
𝑘 𝑰 + 𝜆𝑘 𝑩1 + ⋯ + 𝜆𝑘 𝑩𝑛−2 + 𝑩𝑛−1 )
= 𝜆𝑛𝑘 𝑰 + (𝑩1 − 𝑨)𝜆𝑛−1
𝑘 + (𝑩2 − 𝑨𝑩1 )𝜆𝑛−2
𝑘 + ⋯ (𝑩𝑛 − 𝑨𝑩𝑛−1 )
= 𝜆𝑛𝑘 𝑰 + 𝛼1 𝜆𝑛−1 𝑛−2
𝑘 𝑰 + 𝛼2 𝜆 𝑘 𝑰 + ⋯ + 𝛼 𝑛 𝑰
= ∆(𝜆𝑘 )𝑰 = 𝟎
From here it follows that (𝜆𝑘 𝑰 − 𝑨)𝒖 = 𝟎, where 𝒖 is any column of the matrix 𝑸𝑘 , i.e. it
follows that 𝑨𝒖 = 𝜆𝑘 𝒖. This equality shows that 𝒖 is an eigenvector.

Note 1: When computing the eigenvectors in the manner described it is not necessary, of
course, to find all the columns of matrix 𝑸𝑘 . It should be limited to the computation of
one column; its elements are obtained as a linear combination of the analogous columns
of the matrices 𝑩𝑖 with the previous coefficients.

Note 2: To compute column 𝒖 of matrix 𝑸𝑘 it is convenient to use the recurrence


formula: 𝒖0 = 𝒆𝑘 ; 𝒖𝑖 = 𝜆𝑘 𝒖𝑖−1 + 𝒃𝑖 𝑖 = 1,2, … , 𝑛. Where 𝒃𝑖 = 𝑩𝑖 (: , 𝑘) is the 𝑘 𝑡ℎ column
of matrix 𝑩𝑖 and 𝒆𝑘 is the 𝑘 𝑡ℎ column of an identity matrix

%-----------------------------------------------------%
% Inverse of matrix (D.K. Faddeev):
%-----------------------------------------------------%
clear all, clc, M=rand(4,4); A=M*diag([-1 -2 -3 -4])*inv(M);
n=length(A); A1=A; I=eye(n,n);

for k=1:n
a(k)=-(1/(k))*trace(A1);
B(:,:,k)=A1+a(k)*I;
A1=A*B(:,:,k);
end
AI=-(B(:,:,n-1)/a(n));
%-----------------------------------------------------%
% Eigenvectors of matrix (D.K. Faddeev):
%-----------------------------------------------------%
B1=B(:,:,1); B2=B(:,:,2); B3=B(:,:,3);
s=[-1 -2 -3 -4]; % The set of distinct Eigenvalues

for k=1:n
Q(:,:,k)=s(k)^(n-1)*I+B1*s(k)^(n-2)+B2*s(k)^(n-3)+B3; % n=4
end

m=1; % Select the mth column of the matrix Q


for k=1:n
V(:,k)=Q(:,m,k);
end
V % The Eigenvector matrix
D=inv(V)*A*V % Diagonal form of D

Reflexive Generalized Inverse: Suppose 𝑨 is a square matrix of order 𝑛, with


characteristic polynomial ∆(𝜆) = 𝛼0 𝜆𝑛 + 𝛼1 𝜆𝑛−1 + ⋯ + 𝛼𝑛−1 𝜆 + 𝛼𝑛 with 𝛼0 = 1 where
∆(𝜆) = 0 has roots 𝜆𝑖 (𝑖 = 1,2 , . . . , 𝑛). The modified Faddeev-Leverrier’s method

1
𝛼𝑘 = − trace(𝑩𝑘−1 ) , 𝑩𝑘 = 𝑨𝑩𝑘−1 + 𝛼𝑘 𝑨 with 𝑩0 = 𝑨 and 𝑘 = 1,2, … , 𝑛
𝑘
𝛼0 = 1 𝑩0 = 𝑨
𝛼1 = −trace(𝑩0 ) 𝑩1 = 𝑨𝑩0 + 𝛼1 𝑨
1
𝛼2 = − trace(𝑩1 ) 𝑩2 = 𝑨𝑩1 + 𝛼2 𝑨
2
1
𝛼3 = − trace(𝑩2 ) 𝑩3 = 𝑨𝑩2 + 𝛼3 𝑨
3

1
𝛼𝑛 = − trace(𝑩𝑛−1 ) 𝑩𝑛 = 𝑨𝑩𝑛−1 + 𝛼𝑛 𝑨 = 𝟎
𝑛
𝑩𝑛−1 = −𝛼𝑛 𝑰 and 𝑩𝑛−1 = 𝑨𝑩𝑛−2 + 𝛼𝑛−1 𝑨 ⟹ 𝑨−1 = −(𝑩𝑛−2 − 𝛼𝑛−1 𝑰)/𝛼𝑛

Faddeev and Faddeeva state that when the roots 𝜆𝑖 are distinct, then for any root 𝜆

If we define 𝑽 = (∑𝑛𝑖=1 𝜆𝑛−𝑖 𝑩𝑖−1 ) then 𝑽 is a non-null matrix all of whose columns satisfy
the eigenvector relationship 𝑨𝑽 = 𝜆𝑽. It follows that since the eigenvalues of A are
distinct, then 𝑟𝑎𝑛𝑘(𝑽) = 1 and every non-null column of 𝑽 is a multiple of the unique
right eigenvector corresponding to 𝑨. Because 𝑽 is a polynomial in 𝑨 we have 𝑨𝑽 = 𝑽𝑨, so
that every row of 𝑽 is a multiple of the unique left eigenvector corresponding to 𝜆.

Theorem: If 𝑨 is of rank 𝑟 and 𝛼𝑟 ≠ 0, then


1 𝛼𝑟−1
𝑨⋕ = − (𝑩𝑟−2 − 𝑩 )
𝛼𝑟 𝛼𝑟 𝑟−1

is a reflexive generalized inverse of 𝑨, that is 𝑨⋕ 𝑨𝑨⋕ = 𝑨⋕ and 𝑨𝑨⋕ 𝑨 = 𝑨.

Proof: if 𝑟𝑎𝑛𝑘(𝑨) = 𝑟 Then the characteristic equation becomes

∆(𝜆) = 𝜆𝑛−𝑟 (𝜆𝑟 + 𝛼1 𝜆𝑟−1 + ⋯ + 𝛼𝑟−1 𝜆 + 𝛼𝑟 )

where each 𝛼𝑖 can be zero, and the special form taken by the Cayley-Hamilton theorem is

𝑨𝑟+1 + 𝛼1 𝑨𝑟 + 𝛼2 𝑨𝑟−1 + ⋯ + 𝛼𝑟−1 𝑨2 + 𝛼𝑟 𝑨 = 𝟎

Thus 𝑩𝑖 = 𝟎, so that the Faddeev-Leverrier sequence now stops at the 𝑟 𝑡ℎ step, with the
final two relationships
𝑩 = 𝑨𝑩𝑟−2 + 𝛼𝑟−1 𝑨 𝑩 = 𝑨𝑩𝑟−2 + 𝛼𝑟−1 𝑨
{ 𝑟−1 ⟺ { 𝑟−1
𝑩𝑟 = 𝑨𝑩𝑟−1 + 𝛼𝑟 𝑨 = 𝟎 𝑩𝑟−1 = −𝛼𝑟 𝑰

It follows that when 𝛼𝑟 ≠ 0, then 𝑨⋕ 𝑨𝑨⋕ = 𝑨⋕

1 𝛼𝑟−1 𝛼𝑟−1
𝑨⋕ 𝑨𝑨⋕ = (𝑩𝑟−2 − 𝑩𝑟−1 ) 𝑨 (𝑩𝑟−2 − 𝑩 )
𝛼𝑟 2 𝛼𝑟 𝛼𝑟 𝑟−1
1 𝛼𝑟−1 1 𝛼𝑟−1
= 2 (𝑩𝑟−2 − 𝑩𝑟−1 ) 𝑩𝑟−1 = − (𝑩𝑟−2 − 𝑩𝑟−1 ) = 𝑨⋕
𝛼𝑟 𝛼𝑟 𝛼𝑟 𝛼𝑟

Now let we prove the second formula 𝑨𝑨⋕ 𝑨 = 𝑨

1 𝛼𝑟−1 1 1
𝑨𝑨⋕ 𝑨 = − 𝑨 (𝑩𝑟−2 − 𝑩𝑟−1 ) 𝑨 = − (𝑨𝑩𝑟−2 + 𝛼𝑟−1 𝑨) = − 𝑩𝑟−1 𝑨 = 𝑨
𝛼𝑟 𝛼𝑟 𝛼𝑟 𝛼𝑟
%-----------------------------------------------------%
% Generalized Inverse of Matrix (J. C. Gower):
%-----------------------------------------------------%
clear all, clc, M=rand(4,4); A=M*diag([0 -2 0 -4])*inv(M);
n=length(A); A1=A; I=eye(n,n); r=rank(A); B(:,:,1)=A;
for k=1:r
a(k)=-(1/(k))*trace(B(:,:,k)); B(:,:,k+1)=A*B(:,:,k)+a(k)*A;
end
AA=-(1/a(r))*(B(:,:,r-1)-(a(r-1)/a(r))*B(:,:,r))
Zero1=AA-AA*A*AA, Zero2=A-A*AA*A

Moore-Penrose inverse: The Faddeev’s modification of Leverrier’s method can be further


modified to describe a computing algorithm for the generalized inverse of rectangular 𝑨.
We construct the sequence 𝑨0 , 𝑨1 , … 𝑨𝑘 in the following way:

𝑨0 = 𝟎 𝛼0 = 1 𝑩0 = 𝑰
𝑨1 = 𝑨𝑨𝐻 𝛼1 = −trace(𝑨1 ) 𝑩1 = 𝑨1 + 𝛼1 𝑰
1
𝑨2 = 𝑨𝑨𝐻 𝑩1 𝛼2 = − trace(𝑨2 ) 𝑩2 = 𝑨2 + 𝛼2 𝑰
2
⋮ ⋮ ⋮
1
𝑨𝑘−1 = 𝑨𝑨𝐻 𝑩𝑘−2 𝛼𝑘−1 = − trace(𝑨𝑘−1 ) 𝑩𝑘−1 = 𝑨𝑘−1 + 𝛼𝑘−1 𝑰
𝑘−1
1
𝑨𝑘 = 𝑨𝑨𝐻 𝑩𝑘−1 𝛼𝑘 = − trace(𝑨𝑘 ) 𝑩𝑘 = 𝑨𝑘 + 𝛼𝑘 𝑰
𝑘

Theorem: If 𝑨 is any 𝑛 × 𝑚 complex matrix and let: ∆(𝜆) = (−1)𝑛 ∑𝑛𝑖=0 𝛼𝑖 𝜆𝑛−𝑖 with 𝛼0 = 1,
be the characteristic polynomial of 𝑨𝑨𝐻 . If 𝑘 ≠ 0 is the largest integer such that 𝛼𝑘 ≠ 0,
then the generalized inverse of 𝑨 is given by 𝑨+ = −(𝑨𝐻 𝑩𝑘−1 )/𝛼𝑘 . If 𝑘 = 0 is the largest
integer such that 𝛼𝑘 = 0, then 𝑨+ = 𝟎

%--------------------------------------------------------%
% Moore-Penrose Inverse of Matrix (HENRY P. DECELL, JR):
%--------------------------------------------------------%
clear all, clc, A=10*rand(5,2); n=length(A*A'); a(1)=-trace(A*A');
B(:,:,1)= A*A'+ a(1)*eye(n,n); AA(:,:,1)= A*A'; k=1;
while k<n && abs(a(k))>0.1
X=B(:,:,k);
a(k+1)=-(1/(k+1))*trace(A*A'*X);
B(:,:,k+1)= A*A'*X+a(k+1)*eye(n,n);
r=k+1;
end
Ap=-(A')*(B(:,:,r-1))/a(r) % Penrose Inverse of Matrix
Zero1=Ap-Ap*A*Ap, Zero2=A-A*Ap*A

Remark: This algorithm is inefficient and inaccurate and is clearly unsuitable for
numerical work. Since is subjected to unacceptable rounding errors, and work only for
small size matrices.
Roots of Matrix Polynomial in Complex Plane: In the theory of complex analysis it is
well-known that, if a complex function f(𝑧) is analytic in a region 𝒟 and does not vanish
identically, then the function f ′ (𝜆)/f(𝜆) is called the logarithmic derivative of f(𝜆). The
isolated singularities of the logarithmic derivative occur at the isolated singularities of
f(𝜆) and, in addition, at the zeros of f(𝜆). The principle of the argument results from an
application of the residue theorem to the logarithmic derivative. The contour integral of
this logarithmic derivative (𝑖. 𝑒. f ′ (𝜆)/f(𝜆)) is equal to difference between the number of
zeros and poles of a complex rational function f(𝜆), and this is known as the Cauchy's
argument principle (see Peter Henrici 1974). Specifically, if a complex rational function
f(𝑧) is a meromorphic function inside and on some closed contour 𝒞, and it has no zeros
or poles on 𝒞, then
1 f ′ (𝜆) 1 f ′ (𝜆)
𝑍−𝑃 = ∮ 𝑑𝜆 = ∮ Π(𝜆)𝑑𝜆 with Π(𝜆) = ( )
2𝜋𝑖 𝒞 f(𝜆) 𝒞 2𝜋𝑖 f(𝜆)

Where the variables 𝑍 and 𝑃 indicates respectively the number of zeros and poles of the
function f(𝑧) inside the contour 𝒞, with each pole 𝑝 and zero 𝑧 counted with its
multiplicity. The argument principle theorem states that the contour 𝒞 is a counter-
clockwise and is simple, that is, without self-intersections.

If the complex function f(𝑧) is not rational then the number of zeros inside the contour 𝒞
is given by 𝑍 = ∮𝒞 Π(𝜆) 𝑑𝜆. Now, by mean of matrix theory we can extend this result to
matrix polynomials case.

Theorem: The number of latent roots of the regular matrix polynomial 𝑨(𝜆) in the
domain 𝒟 enclosed by a counter 𝒞 is given by
1 𝑑
𝑍= ∮ trace(𝑨−1 (𝜆)𝑨′ (𝜆))𝑑𝜆 with 𝑨′ (𝜆) = 𝑨(𝜆)
2𝜋𝑖 𝒞 𝑑𝜆

Proof: Let we put ∆(𝜆) = det(𝑨(𝜆)) and let 𝑐𝑖𝑗 be the cofactor of the element 𝑑𝑖𝑗 in ∆(𝜆), so

𝒄𝑇𝑖 = 𝒆𝑇𝑖 Adj(𝑨(𝜆)) = [𝑐𝑖1 ⋮ 𝑐𝑖2 ⋮ ⋯ ⋮ 𝑐𝑖𝑚 ] 𝑖 = 1,2, … , 𝑚

det(𝑨(𝜆)) = Adj(𝑨(𝜆))𝑨(𝜆) ⇔ 𝒆𝑇𝑖 Adj(𝑨(𝜆))𝑨(𝜆) = 𝒆𝑇𝑖 det(𝑨(𝜆))


⇔ 𝒄𝑇𝑖 𝑨(𝜆) = ∆(𝜆)𝒆𝑇𝑖

where 𝒆𝑖 has a one for its 𝑖 𝑡ℎ element and zeros elsewhere. We also have
𝑚
′ (𝜆)
𝑑
∆ = ∆(𝜆) = ∑ ∆𝒊 (𝜆)
𝑑𝜆
𝑖=1

where ∆𝒊 (𝜆) is the determinant whose 𝑖 𝑡ℎ column is 𝑨′⋆𝒊 (𝜆) and the remaining columns are
those of ∆(𝜆). Expanding ∆𝒊 (𝜆) by the 𝑖 𝑡ℎ column we have ∆𝒊 (𝜆) = 𝒄𝑇𝑖 𝑨′⋆𝒊 (𝜆).

Now 𝑨(𝜆)𝑨−1 (𝜆) = 𝑰 implies that, provided ∆(𝜆) ≠ 0, 𝑨′ (𝜆) = −𝑨(𝜆)(𝑨−1 (𝜆)) 𝑨(𝜆) and hence

𝑨′⋆𝒊 (𝜆) = −𝑨(𝜆) {(𝑨−1 (𝜆)) 𝑨(𝜆)} this leads to
⋆𝒊

∆𝒊 (𝜆) = −𝒄𝑇𝑖 𝑨(𝜆) {(𝑨−1 (𝜆)) 𝑨(𝜆)}
⋆𝒊
−1 (𝜆))′
= −∆(𝜆)𝒆𝑇𝑖 {(𝑨 𝑨(𝜆)}
⋆𝒊
We then find from that
𝑚 𝑚
′ ′
∆′ (𝜆) = ∑ ∆𝒊 (𝜆) = −∆(𝜆) ∑ 𝒆𝑇𝑖 {(𝑨−1 (𝜆)) 𝑨(𝜆)} = ∆(𝜆)trace ((𝑨−1 (𝜆)) 𝑨(𝜆))
⋆𝒊
𝑖=1 𝑖=1


and from the matrix derivative properties 𝑨−1 (𝜆)𝑨′ (𝜆) = −(𝑨−1 (𝜆)) 𝑨(𝜆) we have
∆′ (𝜆) 𝑑 𝑑𝑨(𝜆)
= trace(𝑨−1 (𝜆)𝑨′ (𝜆)) ⟺ det(𝑨(𝜆)) = trace (Adj(𝑨(𝜆)) )
∆(𝜆) 𝑑𝜆 𝑑𝜆

Finally if we let ∆(𝜆) = det(𝑨(𝜆)) being analytic in any domain in the complex plane then
the number of its roots inside a closed contour is

1 ∆′ (𝜆) 1
𝑍= ∮ 𝑑𝜆 = ∮ trace(𝑨−1 (𝜆)𝑨′ (𝜆))𝑑𝜆 ■
2𝜋𝑖 𝒞 ∆(𝜆) 2𝜋𝑖 𝒞

Problem Statement & Proposition: Given an ℓ𝑡ℎ degree regular matrix polynomial
𝑨(𝜆) = 𝜆ℓ 𝑰𝑚 + ∑ℓ𝑖=1 𝑨𝑖 𝜆ℓ−𝑖 with 𝑨𝑖 ∈ ℝ𝑚×𝑚 we are going to find an another matrix polynomial
𝑵(𝜆) = ∑𝑛𝑖=ℓ 𝑵𝑖 𝜆𝑛−𝑖 and ∆(𝜆) = 𝜆𝑛 + ∑𝑛𝑖=1 𝛼𝑖 𝜆𝑛−𝑖 such that

𝑵(𝜆)
𝑨−1 (𝜆) = ⟺ 𝑨(𝜆)𝑵(𝜆) = ∆(𝜆)𝑰𝑚
∆(𝜆)

Where 𝑵𝑖 ∈ ℝ𝑚×𝑚 , 𝛼𝑖 ∈ ℝ and 𝑛 = 𝑚ℓ. Expanding the last equation we obtain:

𝟎 when 𝑘 = 0,1,2, … , ℓ − 1

𝑵𝑘 = {𝛼𝑘−ℓ 𝑰𝑚 − ∑ 𝑨𝑖 𝑵𝑘−𝒊 when 𝑘 = ℓ, ℓ + 1, … , 𝑛
𝑖=1
𝟎 when 𝑘 = 𝑛 + 1, … , 𝑛 + ℓ

If the coefficients 𝛼1 … 𝛼𝑛 of the characteristic polynomial ∆(𝜆) were known, last equation
would then constitute an algorithm for computing the matrices 𝑵𝑖 . Here in this chapter
we proposed a recursive algorithm which will compute 𝑵𝑖 and 𝛼𝑖 in parallel, even if the
coefficients 𝛼𝑖 are not known in advance.

According to D. F. Davidenko 1960 and Peter Lancaster 1964 (Jacobi's formula), we write

∆′ (𝜆) −1 (𝜆)𝑨′ (𝜆)) ′ (𝜆)𝑨−1 (𝜆))


∆′ (𝜆)
= trace(𝑨 = trace(𝑨 ⟺𝜆 = trace(𝜆𝑨−1 (𝜆)𝑨′ (𝜆))
∆(𝜆) ∆(𝜆)

And it is evidently that ℓ × trace(𝑰𝑚 ) = 𝑛 = trace(ℓ × 𝑨−1 (𝜆)𝑨(𝜆)), after the combination of
the obtained equations we get

𝜆∆′ (𝜆) − 𝑛∆(𝜆) = trace(𝑵(𝜆){𝜆𝑨′ (𝜆) − ℓ𝑨(𝜆)}) = trace(𝑵(𝜆)𝑩(𝜆)) = trace(𝑩(𝜆)𝑵(𝜆))

where 𝑩(𝜆) = 𝜆𝑨′ (𝜆) − ℓ𝑨(𝜆) = − ∑ℓ𝑖=0 𝑖𝑨𝑖 𝜆ℓ−𝑖 and 𝜆∆′ (𝜆) − 𝑛∆(𝜆) = − ∑𝑛𝑖=0 𝑖𝛼𝑖 𝜆𝑛−𝑖 . Expanding
the equation and equating identical powers of 𝜆 we obtain:

1 ℓ
𝛼𝑘−ℓ = trace (∑ 𝑖𝑨𝑖 𝑵𝑘−𝑖 ) with 𝑘 = ℓ + 1, … , 𝑛 + ℓ and 𝛼0 = 1
𝑘−ℓ 𝑖=1
The Generalized Leverrier-Faddeev Algorithm: In this section, a new efficient
algorithm for computing the inverse of regular matrix polynomial is developed. The
results have application to linear control systems theory, since it is useful in various
analysis and synthesis problems for state-space systems. The above developments are
summarized in the following algorithm.

Algorithm:1 (1st Generalized Leverrier-Faddeev Algorithm)

Initialization: Give the matrix coefficients 𝑨0 , 𝑨1 , … , 𝑨ℓ , 𝑵1 = 𝑰, and 𝛼0 = 1


Result: 𝑵(𝜆) = 𝜆𝑛−ℓ (∑𝑛−ℓ −𝑖 𝑛 𝑛
𝑖=0 𝑵𝑖+1 𝜆 ) and ∆(𝜆) = 𝜆 + ∑𝑖=1 𝛼𝑖 𝜆
𝑛−𝑖

Begin:
for 𝑘 = 1, 2, . . . , 𝑛 do
𝛼𝑘 = trace(∑ℓ𝑖=1 𝑖𝑨𝑖 𝑵𝑘−𝑖+1 )/𝑘

if 1 ≤ 𝑘 ≤ 𝑛 − ℓ

𝑵𝑘+1 = 𝛼𝑘 𝑰𝑚 − ∑ℓ𝑖=1 𝑨𝑖 𝑵𝑘−𝒊+1

else

𝑵𝑘 = 0

end

end

Example:1 Given a monic matrix polynomial 𝑨(𝜆) = 𝑰3 𝜆3 + 𝑨1 𝜆2 + 𝑨2 𝜆 + 𝑨3 with 𝑨𝑖 ∈ ℝ3×3


by using the above algorithm we can get

−1 𝑵(𝜆) 𝑵1 𝜆6 + 𝑵2 𝜆5 + 𝑵3 𝜆4 + 𝑵4 𝜆3 + 𝑵5 𝜆2 + 𝑵6 𝜆 + 𝑵7
(𝑨(𝜆)) = =
∆(𝜆) 𝛼0 𝜆9 + 𝛼1 𝜆8 + 𝛼2 𝜆7 + 𝛼3 𝜆6 + 𝛼4 𝜆5 + 𝛼5 𝜆4 + 𝛼6 𝜆3 + 𝛼7 𝜆2 + 𝛼8 𝜆 + 𝛼9

where ℓ = 𝑚 = 3, 𝑛 = 9.

𝑵1 = 𝑰 𝛼1 = trace(𝑨1 𝑵1 )
1
𝑵2 = 𝛼1 𝑰 − 𝑨1 𝑵1 𝛼2 = trace(𝑨1 𝑵2 + 2𝑨2 𝑵1 )
2
1
𝑵3 = 𝛼2 𝑰 − (𝑨1 𝑵2 + 𝑨2 𝑵1 ) 𝛼3 = trace(𝑨1 𝑵3 + 2𝑨2 𝑵2 + 3𝑨3 𝑵1 )
3
1
𝑵4 = 𝛼3 𝑰 − (𝑨1 𝑵3 + 𝑨2 𝑵2 + 𝑨3 𝑵1 ) 𝛼4 = trace(𝑨1 𝑵4 + 2𝑨2 𝑵3 + 3𝑨3 𝑵2 )
4
1
𝑵5 = 𝛼4 𝑰 − (𝑨1 𝑵4 + 𝑨2 𝑵3 + 𝑨3 𝑵2 ) 𝛼5 = trace(𝑨1 𝑵5 + 2𝑨2 𝑵4 + 3𝑨3 𝑵3 )
5
1
𝑵6 = 𝛼5 𝑰 − (𝑨1 𝑵5 + 𝑨2 𝑵4 + 𝑨3 𝑵3 ) 𝛼6 = trace(𝑨1 𝑵6 + 2𝑨2 𝑵5 + 3𝑨3 𝑵4 )
6
1
𝑵7 = 𝛼5 𝑰 − (𝑨1 𝑵6 + 𝑨2 𝑵5 + 𝑨3 𝑵4 ) 𝛼7 = trace(𝑨1 𝑵7 + 2𝑨2 𝑵6 + 3𝑨3 𝑵5 )
7
1 1
𝛼8 = trace(2𝑨2 𝑵7 + 3𝑨3 𝑵6 ) 𝛼9 = trace(3𝑨3 𝑵7 )
8 9
Numerical applications:

𝑨1 =[10.3834 7.9702 -7.3731; 𝑨2 =[31.1427 25.0780 -28.6260;


0.3884 14.2775 -4.0121; 0.5948 43.9776 -14.7398;
1.5983 7.0882 2.3391]; 7.6854 23.1468 -1.6814];

𝑨3 =[34.8866 -1.9819 -25.9976;


3.2351 26.7417 -12.1195;
14.4444 0.2493 -3.6046];
The result will be

𝑵2 =[16.6166 -7.9702 7.3731; 𝑵3 =[104.1316 -95.9827 101.9174;


-0.3884 12.7225 4.0121; -7.9160 65.5337 53.5358;
-1.5983 -7.0882 24.6609]; -27.7524 -84.0073 220.2736];
𝑵4 =[299.346 -416.846 540.859; 𝑵5 =[365.23 -773.18 1368.96;
-58.367 189.088 274.614; -195.77 359.87 673.87;
-181.258 -360.000 948.424]; -550.16 -666.03 2105.41];
𝑵6 =[80.717 -521.835 1634.103; 𝑵7 =[-93.372 -13.625 719.241;
-298.468 442.373 783.585; -163.397 249.768 338.704;
-765.723 -468.273 2287.089]; -385.461 -37.324 939.339];

a0=1; a1=27; a2=316.5; a3=2110.5; a4=8805; a5=23786; a6=41496; a7=44951; a8=27343;


a9=7087.5;

Connection to the Block Companion Forms: In multivariable control systems it is


well-known that the transfer matrix 𝑯(𝜆) can be reached either by state space or by the
matrix fraction description. In order to obtain the inverse of 𝑨(𝜆) assume that we are
dealing with MIMO system whose denominator is the identity, that is
−1 −1
𝑯(𝜆) = 𝑩(𝜆)(𝑨(𝜆)) = (𝑨(𝜆)) where 𝑩(𝜆) = 𝑰

From the other hand the rational complex function 𝑯(𝜆) can be written as
−1
𝑯(𝜆) = (𝑨(𝜆)) = 𝑪𝑐 (𝜆𝑰𝑛 − 𝑨𝑐 )−1 𝑩𝑐
Where
𝟎 𝑰𝑚 𝟎 𝟎
𝑰𝑝 𝟎
𝟎 𝟎 ⋮ ⋮ 𝟎
𝑪𝑐 𝑇
= ( 𝟎 ), 𝑨𝑐 = ⋮ ⋮ ⋱ 𝑰𝑚 ⋮ , 𝑩𝑐 = ( )
⋮ ⋮ 𝑰𝑚 ⋮
⋮ 𝟎 𝑰𝑚
𝟎 −𝑨 −𝑨ℓ−1 −𝑨2 −𝑨1 )
( ℓ
Now, let we define that
−1 𝑵(𝜆) Adj(𝜆𝑰 − 𝑨𝑐 ) 𝑹(𝜆)
(𝑨(𝜆)) = = and (𝜆𝑰 − 𝑨𝑐 )−1 = =
∆(𝜆) det(𝜆𝑰 − 𝑨𝑐 ) ∆(𝜆)

With: 𝑹(𝜆) = ∑𝑛𝑖=1 𝑹𝑖 𝜆𝑛−𝑖 , 𝑵(𝜆) = ∑𝑛𝑖=1 𝑵𝑖 𝜆𝑛−𝑖 , ∆(𝜆) = ∑𝑛𝑖=0 𝛼𝑖 𝜆𝑛−𝑖 , 𝛼0 = 1 & 𝑵𝑖 = 𝟎 for 𝑖 < ℓ

Then the following results are obtained


−1 −1
𝑪𝑐 𝑹(𝜆)𝑩𝑐 ∑𝑛𝑖=1(𝑪𝑐 𝑹𝑖 𝑩𝑐 )𝜆𝑛−𝑖
(𝑨(𝜆)) = 𝑪𝑐 (𝜆𝑰 − 𝑨𝑐 ) 𝑩𝑐 = = ⟹ 𝑵𝑖 = 𝑪𝑐 𝑹𝑖 𝑩𝑐
∆(𝜆) ∆(𝜆)

From the usual Leverrier-Faddeev algorithm we have


𝛼0 = 1
𝑹 = 𝛼𝑖 𝑰 + 𝑨𝑐 𝑹𝑖 with 1 < 𝑖 < 𝑛 − 1
{ 𝑖+1 and { 1
𝟎 = 𝛼𝑛 𝑰 + 𝑨𝑐 𝑹𝑛 for 𝑖 = 𝑛 𝛼𝑖 = − trace(𝑨𝑐 𝑹𝑖 )
𝑖
a back-substitutions and recursive evaluation of these formulas give as
𝑘−1 𝑘−1
1
𝛼𝑘 = − trace (∑ 𝛼𝑖 𝑨𝑘−𝑖
𝑐 ) 𝑵𝑘 = 𝑪𝑐 (∑ 𝛼𝑖 𝑨𝑘−𝑖−1
𝑐 ) 𝑩𝑐
𝑘
𝑖=0 𝑖=0

The above developments are summarized in the following algorithm.

Algorithm:2 (2nd Generalized Leverrier-Faddeev Algorithm)

Initialization: Give the matrix coefficients 𝑨0 , 𝑨1 , … , 𝑨ℓ , 𝑨0 = 𝑰, and 𝛼0 = 1


Result: 𝑵(𝜆) = ∑𝑛
𝑖=1 𝑵𝑖 𝜆
𝑛−𝑖
and ∆(𝜆) = ∑𝑛
𝑖=0 𝛼𝑖 𝜆
𝑛−𝑖

Begin:
Construct a companion matrices 𝑨𝑐 , 𝑩𝑐 , and 𝑪𝑐
for 𝑘 = 1, 2, . . . , 𝑛 do
𝛼𝑘 = −trace(∑𝑘−1 𝑘−𝑖
𝑖=0 𝛼𝑖 𝑨𝑐 )/𝑘

𝑵𝑘 = 𝑪𝑐 (∑𝑘−1 𝑘−𝑖−1
𝑖=0 𝛼𝑖 𝑨𝑐 )𝑩𝑐
end

Example:2 Given a matrix polynomial 𝑨(𝜆) = 𝑨0 𝜆3 + 𝑨1 𝜆2 + 𝑨2 𝜆 + 𝑨3 with 𝑨𝑖 ∈ ℝ3×3 by


using the above algorithm we get

−1 𝑵(𝜆) 𝑵1 𝜆8 + 𝑵2 𝜆7 + 𝑵3 𝜆6 + 𝑵4 𝜆5 + 𝑵5 𝜆4 + 𝑵6 𝜆3 + 𝑵7 𝜆2 + 𝑵8 𝜆 + 𝑵9
(𝑨(𝜆)) = =
∆(𝜆) 𝛼0 𝜆9 + 𝛼1 𝜆8 + 𝛼2 𝜆7 + 𝛼3 𝜆6 + 𝛼4 𝜆5 + 𝛼5 𝜆4 + 𝛼6 𝜆3 + 𝛼7 𝜆2 + 𝛼8 𝜆 + 𝛼9

Where: ℓ = 𝑚 = 3, 𝑛 = 9, with ℓ is the degree of 𝑨(𝜆) and 𝑚 is the size of 𝑨𝑖 and 𝑛 = 𝑚ℓ

𝑨1 =[26.6527 3.3590 -14.0226; 𝑨2 =[140.295 32.414 -98.168;


15.2183 11.1736 -12.2758; 100.753 49.619 -86.468;
25.3291 3.9304 -10.8264]; 166.468 41.315 -114.010];

𝑨3 =[170.645 69.366 -153.679;


131.606 77.383 -135.327;
215.960 92.055 -195.753];

The result of this will be 𝑵1 = 𝑵2 = 𝟎, 𝑵3 = 𝑰 and

𝑵4 =[0.3473 -3.3590 14.0226; 𝑵5 =[-137.111 -51.163 213.617;


-15.2183 15.8264 12.2758; -246.930 92.912 200.253;
-25.3291 -3.9304 37.8264]; -389.671 -60.991 436.604];
𝑵6 =[-1082.44 -300.66 1258.01; 𝑵7 =[-3447.80 -846.59 3564.41;
1539.45 -238.14 -1255.40; -4582.17 202.80 3757.73;
2308.07 364.44 -2306.40]; -6552.32 -1042.16 6167.00];
𝑵8 =[4984.78 1132.44 -4837.53; 𝑵9 =[-2690.46 -568.28 2505.05;
6474.08 135.50 -5337.94; -3463.08 -215.80 2867.93;
8885.48 1417.62 -8069.01]; -4596.73 -728.42 4076.10];

a0=1; a1=27; a2=316.5; a3=2110.5; a4=8805; a5=23786; a6=41496; a7=44951; a8=27343;


a9=7087.5;
Matrix Polynomials in Descriptor Form: Differential-algebraic systems are dynamic
ones that can only be described by a mixture of algebraic and differential equations
together. In another regard, it can be said that algebraic equations are those constraints
that control the differential solution. These dynamics are also known as descriptor and
singular systems and arise naturally as a linear approximation of system models, in
many applications such as electrical networks, dynamics of aeronautical systems,
neutral delay systems, chemical and thermal processes and diffusion, range systems,
interconnected systems, economics, optimization problems, Feedback systems, robotics,
biology, etc. (Dragutin Lj. Debeljković et.al 2004)

The matrix transfer function of a MIMO system can be described by generalized state-
space or polynomial fraction description as
−1
𝑯(𝜆) = 𝑩(𝜆)(𝑨(𝜆)) = 𝑪(𝜆𝑬 − 𝑨)−1 𝑩, and 𝑟𝑎𝑛𝑘(𝑬) < 𝑛

Where: 𝑨, 𝑬 ∈ ℝ𝑛×𝑛 , 𝑪 ∈ ℝ𝑝×𝑛 and 𝑩 ∈ ℝ𝑛×𝑚 . The index 𝑚 stands for the number of inputs,
𝑝 for the number of outputs and 𝑛 for the number of states.

To get to the transfer function we need to calculate the inverse of the matrix pencil
(𝜆𝑬 − 𝑨) or the inverse of 𝑨(𝜆). As it is known, calculating inversions of polynomial
matrices in general is not an easy task and requires a lot of complex calculations.
Especially, it goes more difficult if we are dealing with generalized systems. That is why
we propose an algorithm that makes this process easier for computation.

Now, assume that we are dealing with the problem of inverting the matrix
polynomial 𝑨(𝜆) = ∑ℓ𝑖=0 𝑨𝑖 𝜆𝑛−𝑖 with 𝑟𝑎𝑛𝑘(𝑨0 ) < 𝑚, as what we have done before, we let

−1 𝑪𝑐 𝑹(𝜆)𝑩𝑐 𝑵(𝜆)
𝑯(𝜆) = (𝑨(𝜆)) = 𝑪𝑐 (𝜆𝑬𝑐 − 𝑨𝑐 )−1 𝑩𝑐 = =
det(𝜆𝑬𝑐 − 𝑨𝑐 ) ∆(𝜆)

Where 𝑵(𝜆) = ∑𝑛𝑖=1 𝑵𝑖 (𝜆 + 𝜇)𝑛−𝑖 , ∆(𝜆) = det(𝜆𝑬𝑐 − 𝑨𝑐 ) = ∑𝑛𝑖=0 𝛼𝑖 (𝜆 + 𝜇)𝑛−𝑖 and

𝑰𝑚 ⋯ 𝟎 𝟎 𝑰𝑚 𝟎 𝟎
𝟎 𝑰𝑝 𝟎
⋮ ⋱ ⋮ 𝟎 𝟎 ⋮ ⋮
𝑬𝑐 = (

) , 𝑨𝑐 = ⋮ ⋮ ⋱ 𝑰𝑚 ⋮ , 𝑪𝑐 𝑇 = ( 𝟎 ) , 𝑩𝑐 = ( 𝟎 )
𝟎 … 𝑰𝑚 𝟎 𝑰𝑚 ⋮ ⋮
⋮ ⋮ 𝟎
𝟎 … 𝟎 𝑨0 −𝑨 𝟎 𝑰𝑚
( ℓ −𝑨ℓ−1 −𝑨2 −𝑨1 )

Note: The variable 𝜇 is called a regularization parameter, and is introduced to make


simplifications in calculations.

The Adjugate matrix 𝑹(𝜆) can be calculated using the method proposed by P.N.
Paraskevopoulos et.al 1983. Once the Adjugate is obtained we do a back-substitutions to
get a recursive formula for 𝑵𝑖 and 𝛼𝑖 .

To compute the inverse of matrix (𝜆𝑬𝑐 − 𝑨𝑐 ) the following technique will be used. Find a 𝜇
so that matrix pencil (𝜇𝑬𝑐 + 𝑨𝑐 ) is regular. It should be noted that (𝜇𝑬𝑐 + 𝑨𝑐 ) is
polynomial in 𝜇 of degree at most 𝑛.
(𝜆𝑬𝑐 − 𝑨𝑐 )−1 = (𝜆𝑬𝑐 + 𝜇𝑬𝑐 − 𝜇𝑬𝑐 − 𝑨𝑐 )−1
−1
= ((𝜆 + 𝜇)𝑬𝑐 − (𝜇𝑬𝑐 + 𝑨𝑐 ))
−1
= ((𝜆 + 𝜇)𝑴 − 𝑰) 𝑸

Where 𝑸 = (𝜇𝑬𝑐 + 𝑨𝑐 )−1 and 𝑴 = (𝜇𝑬𝑐 + 𝑨𝑐 )−1 𝑬𝑐 which can be easily be evaluated, since
for constant 𝜇 the matrix (𝜇𝑬𝑐 + 𝑨𝑐 ) is given known constant matrix of appropriate
dimension. If we introduce the following change of variable we obtain (𝜆 + 𝜇) = 1/𝑠

(𝜆𝑬𝑐 − 𝑨𝑐 )−1 = −𝑠(𝑠𝑰 − 𝑴)−1 𝑸


𝑹𝑛 𝑠 𝑛−1 + ⋯ + 𝑹2 𝑠 + 𝑹1
(𝜆𝑬𝑐 − 𝑨𝑐 )−1 −1
= −𝑠(𝑠𝑰 − 𝑴) 𝑸 = −𝑠 { }𝑸
𝛼𝑛 𝑠 𝑛 + 𝛼𝑛−1 𝑠 𝑛−1 + ⋯ + 𝛼1 𝑠 + 𝛼0
𝑹1 (𝜆 + 𝜇)𝑛−1 + ⋯ + 𝑹𝑛−1 (𝜆 + 𝜇) + 𝑹𝑛
= −{ }𝑸
𝛼0 (𝜆 + 𝜇)𝑛 + 𝛼1 (𝜆 + 𝜇)𝑛−1 + ⋯ + 𝛼𝑛−1 (𝜆 + 𝜇) + 𝛼𝑛

Next the Sourian Frame-Faddev-algorithm will be used to compute the term(𝑠𝑰 − 𝑴)−1

𝛼𝑛 = 1 𝑹𝑛 = 𝑰
𝛼𝑛−1 = −trace(𝑴𝑹𝑛 ) 𝑹𝑛−1 = 𝛼𝑛−1 𝑰 + 𝑴𝑹𝑛
1
𝛼𝑛−2 = − trace(𝑴𝑹𝑛−1 ) 𝑹𝑛−2 = 𝛼𝑛−2 𝑰 + 𝑴𝑹𝑛−1
2

1
𝛼1 = − trace(𝑴𝑹2 ) 𝑹1 = 𝛼1 𝑰 + 𝑴𝑹2
𝑛−1
1
𝛼0 = − trace(𝑴𝑹1 ) 𝑹0 = 𝟎 = 𝛼0 𝑰 + 𝑴𝑹1
𝑛

In compact form we write 𝛼𝑛 = 1, 𝑹0 = 𝟎 & 𝑹𝑛 = 𝑰 and

1 𝑘−1 𝑘
𝛼𝑛−𝑘 = − trace (∑ 𝛼𝑛−𝑖 𝑴𝑘−𝑖 ) & 𝑹𝑛−𝑘 = ∑ 𝛼𝑛−𝑖 𝑴𝑘−𝑖 for 𝑘 = 1, … , 𝑛
𝑘 𝑖=0 𝑖=0

The above developments are summarized in the following algorithm.

Algorithm:3 (3rd Generalized Leverrier-Faddeev Algorithm)

Initialization: Give the matrix coefficients 𝑨0 , 𝑨1 , … , 𝑨ℓ , and 𝛼𝑛 = 1.


Result: 𝑵(𝜆) = ∑𝑛
𝑖=1 𝑵𝑖 (𝜆 + 𝜇)
𝑛−𝑖
and ∆(𝜆) = ∑𝑛
𝑖=0 𝛼𝑖 (𝜆 + 𝜇)
𝑛−𝑖

Begin:
▪ Construct a companion matrices 𝑨𝑐 , 𝑩𝑐 , and 𝑪𝑐
▪ Give a scalar 𝜇 such that (𝜇𝑬𝑐 + 𝑨𝑐 ) is nonsingular.
▪ Construct a matrices 𝑴 = (𝜇𝑬𝑐 + 𝑨𝑐 )−1 𝑬𝑐 and 𝑸 = (𝜇𝑬𝑐 + 𝑨𝑐 )−1
for 𝑘 = 1, 2, . . . , 𝑛 do
𝛼𝑛−𝑘 = −trace(∑𝑘−1
𝑖=0 𝛼𝑛−𝑖 𝑴
𝑘−𝑖
)/𝑘

𝑵𝑛−𝑘 = −𝑪𝑐 (∑𝑘𝑖=0 𝛼𝑛−𝑖 𝑴𝑘−𝑖 𝑸)𝑩𝑐


end
Conclusion: In this chapter we have presented new-look to the Leverrier-Faddeev
algorithm for the computation of the inverse of regular and non-regular matrix
polynomials. The power of proposed algorithms is its simplicity for implementation and
its efficiency, especially when the degree and the order of the matrix polynomial gets
bigger. This algorithm is a helpful and functional for the computation of multivariable
transfer functions of large-scale systems.

References

[1] P. Lancaster, Algorithms for Lambda-Matrices, Numerische Mathematik 6, 388-394 (1964)


[2] D. F. Davidenko, Inversion of matrices by the method of variation of parameters, Dokl. Soviet
Math, 1960, Volume 131, Number 3, 500–502
[3] P. Lancaster, Some applications of the Newton-Raphson method to non-linear matrix
problems. Proc. Roy. Soc. A 271, 324-33t (1963)
[4] P. Lancaster, A generalised Rayleigh-quotient iteration for lambda-matrices. Arch. Rat. Mech.
Anal. 8, 309-322 (1961)
[5] Shui-Hung Hou, A Simple Proof of the Leverrier-Faddeev Characteristic Polynomial Algorithm,
SIAM REV. Vol. 40, No. 3, pp. 706-709, September 1998.
[6] A. S. Householder, The Theory of Matrices in Numerical Analysis, Dover, New York, 1975.
[7] D. K. Faddeev and V. N. Faddeeva, Computational Methods of Linear Algebra, Freeman, San
Francisco, 1963.
[8] F. R. Gantmacher, The Theory of Matrices, Vol. I, Chelsea Publishing Co., New York, 1960.
[9] Peter Henrici, Applied and Computational Complex Analysis, 1974 John Wiley & Sons, Inc.
[10] S. Barnett, Leverrier’s algorithm: a new proof and extensions, SIAM J. Matrix Anal.
Appl.10(1989), 551–556.
[11] H.P. Decell, An application of the Cayley-Hamilton theorem to generalized matrix inversion,
SIAM Review7No 4 (1965), 526–528.
[12] D.K. Faddeev and V.N. Faddeeva, Computational Methods of Linear Algebra, Freeman, San
Francisko, 1963.
[13] J.S. Frame, A simple recursion formula for inverting a matrix, Bull. Amer. Math. Soc.
55(1949), 19–45.
[14] J. Jones, N.P. Karampetakis and A.C. Pugh, The computation and application of the
generalized inverse vai Maple, J. Symbolic Computation 25(1998), 99–124.
[15] N.P. Karampetakis, Computation of the generalized inverse of a polynomial matrix and
applications, Linear Algebra Appl. 252(1997), 35–60.
[16] G. Wang and Y. Lin, A new extension of the Leverrier’s algorithm, Linear Algebra
Appl.180(1993), 227–238.
CHAPTER IV:
Latent Structures, Standard
Triples and Solvents in Matrix
Polynomials
Latent Structures, Standard Triples and
Solvents in Matrix Polynomials
Introduction: Matrix polynomials (𝜆-matrices) are an important part of linear algebra,
and have important applications in differential equations, boundary value problems,
numerical analysis and other areas. Problems concerning matrix polynomials initially
introduced in early 1933, later the matrix polynomial appeared as a topic of research in
1976 by Lancaster, Gohberg, and Rodman. Since 1976 the matrix polynomial has
received the attention of mathematicians. However, the theory of matrix polynomial has
been extensively developed in the last 40 years. Also in fact there are few specialized
books appeared in the subject, such as Gohberg, Lancaster and Rodman (1982). An
important algebraic structure in matrix polynomials is the Standard Triple, the notion of
standard triples introduced and developed by Gohberg, Lancaster and Rodman, and
plays a central role in the theory of matrix polynomials. It is based on the Eigen-
structure and matrices in companion forms (i.e. in state space).

Origin of Eigen-systems and State space models: Eigenvectors (but not the word for
them!) gradually appeared in 18s century in solving differential equations which we write
now as y ′ = 𝑎y describing all sorts of oscillatory phenomena in the nature (mechanical
vibrations, light, sound, etc.) Of course this was long before the words “matrix” and
“vector” appeared. The prefix Eigen- is adopted from the German word eigen for "proper",
"inherent"; "own", "individual", "special"; "specific", "peculiar", or "characteristic". It was
David Hilbert who introduced the terms Eigenwert and Eigenfunktion 1904. In fact, over
the past two centuries the words proper, latent, characteristic, secular, and singular have
all been used as alternatives to our perplexing prefix.

The term “state space” originated in 1960s in the area of control engineering (R Kalman,
1960). State space model (SSM) provides a general framework for analyzing deterministic
and stochastic dynamical systems that are measured or observed through a stochastic
process. The SSM framework has been successfully applied in engineering, statistics,
computer science and economics to solve a broad range of dynamical systems problems.

In this section we introduce the definition of the matrix polynomial and some other basic
definitions.

Definition: By a matrix polynomial we mean a matrix-valued function of a complex


variable of the form: 𝑨(𝜆) = ∑ℓ𝑖=0 𝑨𝑖 𝜆ℓ−𝑖 , 𝐴𝑖 ∈ ℝ𝑚×𝑚 Where 𝑨0 , 𝑨1 , … , 𝑨ℓ are 𝑚 × 𝑚 matrices
of complex numbers, 𝑚 called the order of 𝑨(𝜆), and 𝐴0 is called the leading coefficient.
Also, we can write a matrix polynomial as a matrix with polynomial entries.

Definition: A matrix polynomial 𝑨(𝜆) is said to be invertible if there exists a matrix


polynomial 𝑩(𝜆) such that 𝑨(𝜆)𝑩(𝜆) = 𝑩(𝜆)𝑨(𝜆) = 𝑰 and we call 𝑩(𝜆) the inverse of
𝑨(𝜆) denoted by 𝑨−1 (𝜆).

Definition: The degree of a matrix polynomial 𝑨(𝜆) is defined to be the greatest power of
the polynomials appearing as entries of 𝑨(𝜆) and is denoted by deg 𝑨(𝜆), and the number
𝑚 is called the order of the matrix polynomial.
Definition: The determinant of a matrix polynomial is defined in usual way for matrices
and is denoted by det 𝑨(𝜆) . A matrix polynomial 𝑨(𝜆) is said to be unimodular, if the
determinant of 𝑨(𝜆) is a nonzero constant.

The basic operations of addition, subtraction and multiplication of two or more matrix
polynomials are defined in exactly the same way for scalar matrices. Also, there are
many theorems in usual matrices that still true in matrix polynomials but with new
notations. For example in usual matrices we know the condition on a matrix to have an
inverse is the non-singularity of the matrix, and in matrix polynomials we have
approximately the same condition as shown in the following theorem.

Theorem: An 𝑚 × 𝑚 matrix polynomial 𝑨(𝜆) is invertible if and only if 𝑨(𝜆) is unimodular.

Proof: Let 𝑨(𝜆) be 𝑚 × 𝑚 unimodular matrix polynomial then det(𝑨(𝜆)) = 𝑐 ≠ 0, for some
constant 𝑐, then by Adjugate theorem 𝑨−1 (𝜆) exists so that: det(𝑨(𝜆)) 𝑨−1 (𝜆) = Adj(𝑨(𝜆))

Conversely, if 𝑨(𝜆)is invertible, then there exists another matrix polynomial 𝑩(𝜆) such
that 𝑨(𝜆)𝑩(𝜆) = 𝑰, so that det(𝑨(𝜆))det(𝑩(𝜆)) = 1. Thus the product of the determinants of
the matrix polynomials 𝑨(𝜆) and 𝑩(𝜆) is a nonzero constant. And this occur if they are
each nonzero constants, so that 𝑨(𝜆) has a nonzero constant determinant, and hence it
is unimodular matrix polynomial, and this complete the proof ■

Definition: A nonzero matrix polynomial 𝑨(𝜆) of dimension 𝑛 × 𝑚 is said to be of rank 𝑟,


if 𝑟 is the largest positive integer such that not all minors of 𝑨(𝜆) of order 𝑟 are identically
zero. A zero matrix polynomial is said to be of rank zero.

Matrix polynomials can be classified into many types, where the classifications depend
on determinant, leading coefficient and other properties.

Definition: Matrix polynomial is monic if 𝑨0 = 𝑰𝑚 , where 𝑰𝑚 is the 𝑚 × 𝑚 identity matrix.


Otherwise 𝑨(𝜆) is said to be nonmonic. Matrix polynomial 𝑨(𝜆) is said to be regular
matrix polynomial if det(𝑨(𝜆)) ≠ 0, except for finitely many 𝜆 ∈ ℂ. And a matrix polynomial
is said to be self-adjoint if 𝑨(𝜆) = 𝑨𝐻 (𝜆), where 𝑨𝐻 (𝜆) = ∑ℓ𝑖=0 𝑨𝐻
𝑖 𝜆
ℓ−𝑖
.

Definition: Two matrix polynomials 𝑨(𝜆) and 𝑩(𝜆) are said to be equivalent if there exists
unimodular matrices 𝑷(𝜆) and 𝑸(𝜆) such that 𝑩(𝜆) = 𝑷(𝜆)𝑨(𝜆)𝑸(𝜆) and we write 𝑨(𝜆)~𝑩(𝜆)

Theorem: Any unimodular matrix polynomial is equivalent to the identity matrix.

Proof: If 𝑨(𝜆) is unimodular, then 𝑨(𝜆)𝑨−1 (𝜆) = 𝑰 and 𝑨−1 (𝜆)𝑨(𝜆) = 𝑰 , where 𝑷(𝜆) = 𝑰 and
𝑸(𝜆) = 𝑨−1 (𝜆) which are unimodular. So that 𝑨(𝜆)~𝑰 ■

Definition: Let 𝑨0 (𝜆) = diag(𝑎1 (𝜆), 𝑎2 (𝜆), … , 𝑎𝑚 (𝜆)) Where 𝑎𝑖 (𝜆) is a zero or monic
polynomial, 𝑖 = 1, … , 𝑛, and 𝑎𝑖 (𝜆) is divisible by 𝑎𝑖−1 (𝜆), 𝑖 = 2,3 … , 𝑛, then 𝑨0 (𝜆) is called
diagonal matrix polynomial and a matrix polynomial with these properties is called a
canonical matrix polynomial (also Smith Canonical Form or Smith normal form).

Theorem: Any matrix polynomial over 𝔽 of order 𝑚 and degree ℓ is equivalent to a


canonical matrix polynomial. (see Matrix Poly by BEKHITI 2020)
There are also definitions (e.g. the Hermite normal forms, elementary divisors, invariant
polynomials) which are very useful in the study of 𝜆-matrices. However, because we will
not use those concepts in the rest of the presentation, we will not present them here. The
interested reader should consult the appropriate literature on matrix theory.

Latent Structure of Matrix Polynomials: The complex number 𝜆𝑖 is called a latent root
if it is a solution (also zero or root) of the scalar polynomial equation det 𝑨(𝜆) ≠ 0. The
nontrivial vector 𝐯, solution of 𝑨(𝜆𝑖 )𝐯 = 𝟎 is called a primary right latent vector associated
with 𝜆𝑖 .

Definition: Let 𝑨(𝜆) ∈ ℝ𝑚×𝑚 be matrix polynomial, then we define the zeroes of det(𝑨(𝜆))
to be the latent roots (eigenvalues) of 𝑨(𝜆), and the set of latent roots of 𝑨(𝜆) is called the
spectrum of 𝑨(𝜆) denoted by 𝜎(𝑨(𝜆)). And if a nonzero 𝐯 ∈ ℝ𝑚 is such that 𝑨(𝜆𝑖 )𝐯 = 𝟎,
then we say that 𝐯 is a right latent (or eigen) vector of 𝑨(𝜆), a nonzero 𝐰 𝑇 ∈ ℝ𝑚 is such
that 𝐰 𝑇 𝑨(𝜆𝑖 ) = 𝟎, then we say that 𝐰 is a left latent (or eigen) vector of 𝑨(𝜆).

From the definition we can see that a latent problem of a matrix polynomial is a
generalization of the concept of eigen-problem for square matrices. Indeed, we can
consider the classical eigenvalue/vector problem as finding the latent root/vector of a
linear matrix polynomial 𝜆𝑰 − 𝑨. An interesting problem is the number of latent roots in a
given region of the complex plane. This is answered by the following theorem.

Theorem: The number of latent roots of the regular matrix polynomial 𝑨(𝜆) in the
1
domain 𝒟 enclosed by a contour 𝛤 is given by: 𝑁 = 2𝜋𝑖 ∮𝛤 𝑡𝑟𝑎𝑐𝑒(𝑨−1 (𝜆)𝑨′ (𝜆)) 𝑑𝜆 each latent
root being counted according to its multiplicity. 𝑨′ (𝜆) is the derivative of 𝑨(𝜆).

Proof: see the previous chapter ■

At this point, we can also define the spectrum of a matrix polynomial 𝑨(𝜆) as being the
set of all its latent roots [notation 𝜎(𝑨(𝜆))]. It is essentially the same definition as the one
of the spectrum of a square matrix.

Definition: Let 𝑨(𝜆) be an 𝑚 × 𝑚 matrix polynomial. A matrix 𝑹 ∈ ℂ𝑚×𝑚 is called a (right)


solvent for 𝑨(𝜆) if satisfies the relation: 𝑨(𝑹) = ∑ℓ𝑖=0 𝑨𝑖 𝑹ℓ−𝑖 = 𝟎. A matrix 𝑳 ∈ ℂ𝑚×𝑚 is called
a (left) solvent for 𝑨(𝜆) if satisfies the relation: 𝑨(𝑳) = ∑ℓ𝑖=0 𝑳ℓ−𝑖 𝑨𝑖 = 𝟎.

An equivalent representation for 𝑨(𝑹) = 𝟎 (or 𝑨(𝑳) = 𝟎 ) that uses the contour integral is
1 1
as follows 𝑨(𝑹) = 2𝜋𝑖 ∮𝛤 𝑨(𝜆)(𝜆𝑰 − 𝑹)−1 𝑑𝜆 = 𝟎 or 𝑨(𝑳) = 2𝜋𝑖 ∮𝛤(𝜆𝑰 − 𝑳)−1 𝑨(𝜆) 𝑑𝜆 = 𝟎 for any
closed contour 𝛤 ⊆ ℂ with the spectrum of 𝑹 (or 𝑳) in its interior.

The relation between eigenvalues of 𝑨(𝜆) and solvents is highlighted in [Matrix Poly by
BEKHITI 2020], a corollary of the generalized Bezout theorem states that: if the matrix 𝑹
is a right solvent of 𝑨(𝜆), then 𝑨(𝜆) = 𝑷(𝜆)(𝜆𝑰 − 𝑹) and if 𝑳 is a left solvent of 𝑨(𝜆), then
𝑨(𝜆) = (𝜆𝑰 − 𝑳)𝑸(𝜆), where 𝑷(𝜆) and 𝑸(𝜆) is a matrix polynomial of degree ℓ − 1. As a result
of this corollary we can say that any eigenpair of the solvent 𝑹 (or 𝑳) is an eigenpair of
𝑨(𝜆).
det(𝑨(𝜆)) = det(𝑷(𝜆)) det(𝜆𝑰 − 𝑹) ⟹ 𝜎(𝜆𝑰 − 𝑹) ⊂ 𝜎(𝑨(𝜆))
{
det(𝑨(𝜆)) = det(𝜆𝑰 − 𝑳) det(𝑸(𝜆)) ⟹ 𝜎(𝜆𝑰 − 𝑳) ⊂ 𝜎(𝑨(𝜆))
Theorem: Suppose 𝑨(𝜆) has 𝑝 distinct eigenvalues {𝜆𝑖 }𝑝𝑖=1 , with 𝑚 ≤ 𝑝 ≤ 𝑚ℓ, and that the
corresponding set of 𝑝 eigenvectors {𝐯𝑖 }𝑝𝑖=1 satisfies the Haar condition (every subset of 𝑚
𝑝
of them is linearly independent). Then there are at least ( ) different solvents of 𝑨(𝜆),
𝑚
and exactly this many if 𝑝 = 𝑚ℓ, which are given by 𝑺 = 𝑾diag(𝜇𝑖 )𝑾−1 𝑾 = [𝐰1 𝐰2 … 𝐰𝑚 ]
𝑝
where the eigenpairs {𝜇𝑖 , 𝐰𝑖 }𝑚
𝑖=1 are chosen among the eigenpairs {𝜆𝑖 , 𝐯𝑖 }𝑖=1 of 𝑨(𝜆).

Note that if we have that 𝑝 = 𝑚 in Theorem, the distinctness of the eigenvalues is not
needed, and then we have a sufficient condition for the existence of a solvent.

An example which illustrates this last result is the following. Consider the quadratic
1 0 2 −1 −6 0 12
matrix solvent problem 𝑨(𝜆) = ( )𝜆 + ( )𝜆 + ( ) with 𝑚 = 2 and ℓ = 2.
0 1 2 −9 −2 14
1 0 1 1
𝑨(𝜆) has eigenpairs: (𝜆1 , 𝐯1 ) = {1, ( )} , (𝜆2 , 𝐯2 ) = {2, ( )} , (𝜆3 , 𝐯3 ) = {3, ( )} , (𝜆4 , 𝐯4 ) = {4, ( )}
0 1 1 1
Consider the subsets of eigenvectors: {𝐯1 , 𝐯2 }, {𝐯1 , 𝐯3 }, {𝐯1 , 𝐯4 }, {𝐯2 , 𝐯3 } and {𝐯2 , 𝐯4 }. Each subset
consists of vectors that are linearly independent. Therefore, the set of solvents is:

1 0 1 2 3 0 1 3 4 0
( ),( ),( ),( ) and ( )
0 2 0 3 1 2 0 4 2 2
Note that we cannot construct a solvent whose eigenvalues are 3 and 4 because the
associated eigenvectors are linearly dependent.

Remark: if the matrix 𝑹 = 𝑷diag(𝜆𝑖 )𝑷−1 is a right solvent of a matrix polynomial 𝑨(𝜆) and
𝑳 = 𝑸−1 diag(𝜆𝑖 )𝑸 is a left solvent then 𝑹 = 𝑴𝑳𝑴−1 with 𝑴 = 𝑷𝑸. In the book of algebra (by
1
BEKHITI 2020) it is proved that 𝑴 = 2𝜋𝑖 ∮𝛤 𝑨−1 (𝜆) 𝑑𝜆.

Now we are going to introduce some definitions, results related to the concept of
linearization, companion matrices of a matrix polynomial and the definition of standard
triples and pairs.

Let 𝑹𝑖 be a right solvent of a matrix polynomial 𝑨(𝜆) then


𝑘=ℓ
𝑨0 𝑹ℓ𝑖 + 𝑨1 𝑹ℓ−1
𝑖 + ⋯ + 𝑨ℓ−1 𝑹𝑖 + 𝑨ℓ = 𝟎 ⟺ [𝑨ℓ , … , 𝑨1 , 𝑨0 ]col(𝑹𝑘𝑖 )𝑘=0 = 𝟎 𝑖 = 1,2, … , ℓ

In matrix form we can write 𝑨𝑐 𝑿𝑖 = 𝑿𝑖 𝑹𝑖 or

𝑶 𝑰 ⋯ 𝑶 𝑶 𝑰 𝑰 𝑰
𝑶 𝑶 ⋱ ⋮ ⋮ 𝑹𝑖 𝑹𝑖 𝑹𝑖
⋮ ⋮ ⋮ 𝑰 = 𝑹𝑖 with 𝑿𝑖 =
𝑶 ⋱ ⋮ ⋮ ⋮
𝑶 𝑶 𝑰
−𝑨ℓ−1 −𝑨2 −𝑨1 ) (𝑹ℓ𝑖 ) (𝑹ℓ𝑖 ) ℓ
(−𝑨ℓ ⋯ (𝑹 𝑖 )
If we define the right block Vandermonde matrix 𝑽𝑅 = [𝑿1 , 𝑿2 , … , 𝑿ℓ ] then

𝑹1 𝑹2 ⋯ 𝑹ℓ 𝑰 𝑰 ⋯ 𝑰
𝑹12 𝑹22 ⋮ 𝑹2ℓ 𝑹1 𝑹1 ⋮ 𝑹ℓ 𝑹1
𝑨𝑐 𝑽𝑅 = ⋱ = ⋱ ( ⋱ ) = 𝑽𝑅 𝚲 𝑅 , 𝚲𝑅 = blkdiag(𝑹𝑖 )
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑹ℓ
ℓ+1
(𝑹1
ℓ+1 …
𝑹2 ℓ+1 𝑹
𝑹ℓ ) ( 1

𝑹ℓ2 … 𝑹ℓℓ )
Notice that 𝑨𝑐 ~𝚲𝑅 because

𝑨𝑐 𝑽𝑅 = 𝑽𝑅 𝚲𝑅 ⟺ 𝑨𝑐 = 𝑽𝑅 𝚲𝑅 𝑽−1
𝑅 ⟺ 𝑨𝑐 ~𝚲𝑅 ⟺ 𝜎(𝑨𝑐 ) = 𝜎(𝚲𝑅 ) ⟺ det(𝜆𝑰 − 𝑨𝑐 ) = det(𝜆𝑰 − 𝚲𝑅 ) = 0

det(𝜆𝑰 − 𝑨𝑐 ) = det(𝜆𝑰 − 𝚲𝑅 ) = 0 ⟺ det(𝜆𝑰 − 𝑹1 ) det(𝜆𝑰 − 𝑹2 ) … det(𝜆𝑰 − 𝑹ℓ ) = 0

⦁ If {𝜆𝑖 , 𝐯𝑖 } is an eigenpair of the solvent 𝑹 then (𝜆𝑖 𝑰 − 𝑹)𝐯𝑖 = 𝟎 and so

𝑨(𝜆𝑖 ) = 𝑸(𝜆𝑖 )(𝜆𝑖 𝑰 − 𝑹) ⇔ 𝑨(𝜆𝑖 )𝐯𝑖 = 𝑸(𝜆𝑖 )(𝜆𝑖 𝑰 − 𝑹)𝐯𝑖 = 𝟎 ⇔ 𝑨(𝜆𝑖 )𝐯𝑖 = 𝟎

⦁ All eigenvalues of 𝑹𝑖 ∈ ℝ𝑚×𝑚 are latent roots of 𝑨(𝜆) ∈ ℂ𝑚×𝑚 or equivalently eigenvalues
of 𝑨𝑐 ∈ ℝ𝑛×𝑛 . All eigenvectors of 𝑹𝑖 are latent vectors of 𝑨(𝜆), but not eigenvectors of 𝑨𝑐 .

⦁ The set {𝑹1 , 𝑹2 , … , 𝑹ℓ } is a complete set of right solvents if and only if 𝑽𝑅 is nonsingular,
or equivalently {⋃𝜎(𝑹𝑖 ) = 𝜎(𝑨𝑐 ) and 𝜎(𝑹𝑖 )⋂𝜎(𝑹𝑗 ) = ∅}.

⦁ The spectrum of the matrix polynomial 𝑨(𝜆) and the companion matrix 𝑨𝑐 are the same.
In other word the pencil (𝜆𝑰 − 𝑨𝑐 ) and 𝑨(𝜆) are equivalent. The process of transforming
ℓ𝑡ℎ degree matrix polynomial to 1st degree matrix polynomial (𝜆𝑰 − 𝑨𝑐 ) is called
linearization.

The word "Linearization" to a matrix polynomial, in fact, comes from the linearization of
differential equations. Consider the following system of differential equation with
constant coefficients

𝑑ℓ 𝐱 𝑑 ℓ−𝑖 𝐱
𝑨(𝜆)𝐱(𝜆) = 𝐟(𝜆) ⟺ ℓ + ∑ 𝑨𝑖 ℓ−𝑖 = 𝐟(𝑡) with − ∞ ≤ 𝑡 ≤ ∞
𝑑𝑡 𝑑𝑡
𝑖=1

Where: 𝐟(𝑡) is a given 𝑚-dimensional vector function and 𝐱(𝑡) is the unknown 𝑚-
dimensional vector function. Then we can reduce this equation to a first order
differential equation by using the substitution 𝐱 0 = 𝐱, and 𝐱 𝑖+1 = 𝑑𝐱 𝑖 /𝑑𝑡 we get the
equivalent first order differential equation

𝑑𝐱 ℓ−1 𝑑𝐱 ℓ−1 ℓ
+ 𝑨1 𝐱 ℓ−1 + ⋯ + 𝑨ℓ−1 𝐱1 + 𝑨ℓ 𝐱 0 = 𝐟(𝑡) ⟺ = 𝐟(𝑡) − ∑ 𝑨𝑖 𝐱 ℓ−𝑖
𝑑𝑡 𝑑𝑡 𝑖=1

𝑇
If we let 𝑿 = [𝐱 0𝑇 𝐱1𝑇 … 𝐱 ℓ−1 ]𝑇 and 𝐅(𝑡) = [𝟎 𝟎 … 𝐟 𝑇 (𝑡)]𝑇 then

𝑑
𝑿(𝑡) = 𝑨𝑐 𝑿(𝑡) + 𝐅(𝑡) ⟺ (𝜆𝑰 − 𝑨𝑐 )𝑿(𝜆) = 𝐅(𝜆)
𝑑𝑡

This operation of reducing the ℓ𝑡ℎ degree differential equation to a first order equation, is
called a linearization. (i.e. we increased the dimension of the unknown function, which
becomes 𝑛 = ℓ𝑚). 𝑨(𝜆)𝐱(𝜆) = 𝐟(𝜆) ⟺ (𝜆𝑰 − 𝑨𝑐 )𝑿(𝜆) = 𝐅(𝜆) or

𝐱(𝜆) = [𝑰 𝟎 … 𝟎]𝑿(𝜆) = 𝑨−1 (𝜆)𝐟(𝜆) ⟺ 𝐱(𝜆) = [𝑰 𝟎 … 𝟎](𝜆𝑰 − 𝑨𝑐 )−1 𝐅(𝜆)

We know that 𝐟(𝜆) = [𝟎 𝟎 … 𝑰]𝐅(𝜆) so 𝑨−1 (𝜆)[𝟎 𝟎 … 𝑰] = [𝑰 𝟎 … 𝟎](𝜆𝑰 − 𝑨𝑐 )−1, if w define a


new matrices 𝑩𝑐 = [𝟎 𝟎 … 𝑰]𝑇 and 𝑪𝑐 = [𝑰 𝟎 … 𝟎] we obtain 𝑨−1 (𝜆) = 𝑪𝑐 (𝜆𝑰 − 𝑨𝑐 )−1 𝑩𝑐 . In other
word we can say
Adj 𝑨(𝜆) Adj(𝜆𝑰 − 𝑨𝑐 )
𝑨−1 (𝜆) = 𝑪𝑐 (𝜆𝑰 − 𝑨𝑐 )−1 𝑩𝑐 ⟺ = 𝑪𝑐 𝑩
det 𝑨(𝜆) det(𝜆𝑰 − 𝑨𝑐 ) 𝑐
Since 𝑨(𝜆) and (𝜆𝑰 − 𝑨𝑐 ) have the same spectrum therefore det 𝑨(𝜆) = det(𝜆𝑰 − 𝑨𝑐 ) and
(𝜆𝑰 − 𝑨𝑐 ) & 𝑨(𝜆) are equivalent.

Definition: Let 𝑨(𝜆) ∈ ℂ𝑚×𝑚 be an ℓ𝑡ℎ degree monic matrix polynomial (i.e. with
nonsingular leading coefficient). A linear matrix polynomial (sometimes we call it a
matrix pencil), (𝜆𝑰𝑛 − 𝑨𝑐 ) ∈ ℂ𝑛×𝑛 is called a linearization of the monic matrix polynomial
𝑨(𝜆) if there exist a two unimodular matrix polynomials 𝑷(𝜆) & 𝑸(𝜆) such that

𝑨(𝜆) 0 𝑨(𝜆) 0
( ) = 𝑷(𝜆)(𝜆𝑰𝑛 − 𝑨𝑐 )𝑸(𝜆) ⟺ ( ) ~(𝜆𝑰𝑛 − 𝑨𝑐 ) ⟺ det 𝑨(𝜆) = det(𝜆𝑰𝑛 − 𝑨𝑐 )
0 𝑰𝑟 0 𝑰𝑟
Where 𝑟 = 𝑛 − 𝑚

An 𝑚 × 𝑚 matrix polynomial 𝑨(𝜆) is said to be similar to a second matrix polynomial 𝑩(𝜆)


of the same order if there exists a unimodular matrix polynomial 𝑻(𝜆) such that:

𝑨(𝜆) = 𝑻(𝜆)𝑩(𝜆)𝑻−1 (𝜆)

Theorem: Two matrix polynomials 𝑩(𝜆) and 𝑨(𝜆), are similar if and only if the matrix
polynomials 𝜆𝑰𝑛 − 𝑩𝑐 and 𝜆𝑰𝑛 − 𝑨𝑐 are equivalent.

Any matrix 𝑨 is a linearization of 𝑨(𝜆) = ∑ℓ𝑖=0 𝑨𝑖 𝜆ℓ−𝑖 if and only if 𝑨 is similar to the first
companion matrix 𝑨𝑐 of 𝑨(𝜆), that is 𝑨 = 𝑻𝑐−1 𝑨𝑐 𝑻𝑐 .

 What role do the solvents play in terms of its contribution to the solution of the
differential equation?
(𝜆𝑰𝑛 − 𝑨𝑐 )−1 = 𝑽𝑅 (𝜆𝑰𝑛 − 𝚲𝑅 )−1 𝑽−1
𝑅

𝑇 𝑇
Let we do a block partitions on 𝑽𝑅 = [𝑿𝑐1 , 𝑿𝑐2 , … , 𝑿𝑐ℓ ] & 𝑽−1 𝑇 𝑇
𝑅 = [𝒀𝑐1 , 𝒀𝑐2 , … , 𝒀𝑐ℓ ] so

(𝜆𝑰𝑚 − 𝑹1 )−1 𝒀𝑐1


(𝜆𝑰𝑛 − 𝑨𝑐 )−1 = [𝑿𝑐1 , 𝑿𝑐2 , … , 𝑿𝑐ℓ ] ( ⋱ )( ⋮ )
(𝜆𝑰𝑚 − 𝑹ℓ )−1 𝒀𝑐ℓ

=∑ 𝑿𝑐𝑖 (𝜆𝑰𝑚 − 𝑹𝑖 )−1 𝒀𝑐𝑖
𝑖=1

From the above similarity transformation it is well-known that 𝑨𝑐 = 𝑻𝑐 𝑨𝑻−1


𝑐 and

(𝜆𝑰𝑛 − 𝑨𝑐 )−1 = (𝜆𝑰𝑛 − 𝑻𝑐 𝑨𝑻−1


𝑐 )
−1
= 𝑻𝑐 (𝜆𝑰𝑛 − 𝑨)−1 𝑻−1
𝑐 ⟺ (𝜆𝑰𝑛 − 𝑨)
−1
= 𝑻𝑐−1 (𝜆𝑰𝑛 − 𝑨𝑐 )−1 𝑻𝑐

This means that (𝜆𝑰𝑛 − 𝑨)−1 = ∑ℓ𝑖=1 𝑻−1 −1 ℓ −1


𝑐 𝑿𝑐𝑖 (𝜆𝑰𝑚 − 𝑹𝑖 ) 𝒀𝑐𝑖 𝑻𝑐 = ∑𝑖=1 𝑿𝑖 (𝜆𝑰𝑚 − 𝑹𝑖 ) 𝒀𝑖 . Using
the inverse Laplace transform we get: 𝒆𝑨𝑡 = ∑ℓ𝑖=1 𝑿𝑖 𝒆𝑹𝑖 𝑡 𝒀𝑖 . Also we know that the
homogenous solution of the DE 𝑿′ (𝑡) = 𝑨𝑿(𝑡) is 𝑿(𝑡) = 𝒆𝑨𝑡 𝑿(𝑡0 ) = 𝒆𝑨𝑡 𝑪 = (∑ℓ𝑖=1 𝑿𝑖 𝒆𝑹𝑖 𝑡 𝒀𝑖 )𝑪.

 What forms can the block Vandermonde matrix take when we have some repeated
solvents (block roots)?

In case of repeated block roots assume that 𝑹𝑖 is repeated ℓ𝑖 times then we have to write

𝑑𝑘
𝑨(𝜆) = 𝑸(𝜆)(𝜆𝑰 − 𝑹𝑖 )𝑟 ⟹ 𝑨(𝜆)| = 𝟎 𝑘 = 0,1, … , 𝑟 − 1
𝑑𝜆𝑘 𝑹𝑖

Following the same procedure as in constant matrix we obtain


𝑱1 𝑹𝑖 𝑰𝑚
𝑨𝑐 𝑽𝑅 = 𝑽𝑅 𝑱𝑅 Where 𝑱𝑅 = ( ⋱ ) with 𝑱𝑖 = ( 𝑹𝑖 ⋱
𝑰𝑚 ) and
𝑱𝑠 ⋱
𝑹𝑖
ℓ ℓ𝑖 −1 𝑠
𝑘 − 1 𝑘−𝛾−1
𝑽𝑹 : = row {row [col (( ) 𝑹𝑖 ) ] }
𝛾
𝑘=1 𝛾=0
𝑖=1

Remarks: Most of the linearizations used in practice are of the first companion form

𝑰 𝑶 ⋯ 𝑶 𝑶 𝑰 ⋯ 𝑶 𝑶
𝑰 ⋮ 𝑶 𝑶 ⋱ ⋮ ⋮
𝑪1 (𝜆) = 𝜆𝑰𝑛 − 𝑨𝑐1 = 𝜆 ⋮ ⋱ ⋮ − ⋮ ⋮ ⋮ 𝑰
⋮ 𝑶 ⋱
⋮ 𝑶 𝑶 𝑶 𝑰
⋮ ⋯ 𝑰
(𝑶 𝑶 𝑶 𝑨0 ) (−𝑨ℓ −𝑨ℓ−1 ⋯ −𝑨2 −𝑨1 )


𝑶 𝑰 𝑶 𝑶 𝑶 ⋯ 𝑶 −𝑨ℓ
𝑰 ⋮ 𝑰 𝑶 ⋱ ⋮ −𝑨ℓ−1
𝑪2 (𝜆) = 𝜆𝑰𝑛 − 𝑨𝑐2 =𝜆 ⋮ ⋮ ⋱ ⋮ − ⋮ 𝑰 ⋮ ⋮
⋮ ⋮ ⋯ 𝑰 𝑶 𝑶 ⋮ ⋱ 𝑶 −𝑨2
𝑨0 ) (𝑶 𝑶 … 𝑰 −𝑨1 )
(𝑶 𝑶 𝑶

It is well known that using this approach to solve the polynomial eigenvalue problems
(PEP) via linearization can have some drawbacks. For instance:

• The linearization approach transforms the original 𝑚 × 𝑚 matrix polynomial of degree


ℓ into a larger ℓ𝑚 × ℓ𝑚 linear eigenvalue problem.

• The conditioning of the linearized problem can depend on the type of linearization used
and may be significantly worse than the conditioning of the original problem.

• If there is a special structure in the matrix coefficients 𝑨𝑖 , such as symmetry, sparsity


pattern, palindromicity, the linearization may modify it. In that case, special linearization
can be chosen to exploit the structure.

We pointed out that linearization approach not only increases the dimension of the
problem, but may also be very ill-conditioned. We recall now some approaches which
offer the possibility to handle the PEPs directly without linearization.

• Jacobi-Davidson method: This method computes selected eigenvalues and associated


eigenvectors of a matrix polynomial. It iteratively constructs approximations of certain
eigenvectors by solving a projected problem. The method finds the approximate
eigenvector as “best” approximation in some search subspace.

This approach has been used for the efficient solution of quadratic eigen-problems
associated with acoustic problems with damping.

• Arnoldi and Lanczos-type methods: These processes are developed to construct


projections of the QEP. The convergence of these methods is usually slower than a Krylov
subspace method applied to the equivalent linear eigenvalue problem.

• Contour integral based methods: in this methods use contour integral formulations to
find all the eigenvalues of PEPs, which are inside a closed contour in the complex plane.
Jordan Chain and Solution of Differential Equations: The spectral theory we are to
develop must include as a special case the classical theory for polynomials of first degree
(when we may write 𝑨(𝜆) = 𝜆𝑰 − 𝑨). Now, what we understand by spectral theory must
contain a complete and explicit description of the polynomial itself in terms of the
spectral data. When 𝑨(𝜆) is first degree matrix polynomial, this is obtained when a
Jordan form 𝑱 for 𝑨 is known together with a transforming matrix 𝑿 for which 𝑨 = 𝑿𝑱𝑿−1 ,
for we then have 𝑨(𝜆) = 𝑿(𝜆𝑰 − 𝑱)𝑿−1 . Furthermore, 𝑿 can be interpreted explicitly in
terms of the eigenvector structure of 𝑨 (or of 𝜆𝑰 − 𝑨 in our terminology). The full
generalization of this to matrix polynomials 𝑨(𝜆) of degree ℓ is presented here and is,
surprisingly, of very recent origin.

Consider the following monic matrix polynomials 𝑨(𝜆) = ∑ℓ𝑖=0 𝑨𝑖 𝜆ℓ−𝑖 then its associated
differential equation with constant coefficients is:

𝑑 𝑑ℓ 𝐱 𝑑 ℓ−𝑖 𝐱
𝑨 ( ) 𝐱(𝑡) = 𝟎 ⟺ + ∑ 𝑨𝑖 ℓ−𝑖 = 𝟎 with − ∞ ≤ 𝑡 ≤ ∞
𝑑𝑡 𝑑𝑡 ℓ 𝑑𝑡
𝑖=1

Here 𝐱(𝑡) is an 𝑚-dimensional vector-valued function to be found. We already have a


formula 𝐱(𝑡) = [𝑰 𝟎 … 𝟎]𝒆𝑨𝑐𝑡 𝐱 0 for the general solution of 𝑨(𝑑/𝑑𝑡)𝐱(𝑡) = 𝟎, but now we shall
express it through elementary vector-valued functions of the real variable 𝑡. It turns out
that such an expression is closely related to the spectral properties of 𝑨(𝜆).

It is well known that in the scalar case (𝑚 = 1) the solutions of 𝑨(𝑑/𝑑𝑡)𝐱(𝑡) = 𝟎 are linear
combinations of the simple solutions of the form 𝐱(𝑡) = 𝑡𝑗 𝑒 𝜆0 𝑡 , 𝑗 = 0,1,2, … , 𝑟 − 1

where 𝜆0 is a complex number and 𝑟 is a positive integer. It turns out that 𝜆0 must be a
root of the scalar polynomial 𝑨(𝜆), and 𝑟 is just the multiplicity of 𝜆0 as a root of 𝑨(𝜆). We
now wish to generalize this remark for monic matrix polynomials. So let us seek for a
solution of 𝑨(𝑑/𝑑𝑡)𝐱(𝑡) = 𝟎 in the form

𝑡𝑘 𝑡 𝑘−1
𝐱(𝑡) = 𝒑(𝑡)𝑒 𝜆0 𝑡 , 𝒑(𝑡) = 𝐱0 + 𝐱 + ⋯ + 𝐱𝑘 , with 𝐱𝑗 ∈ ℂ𝑛 & 𝐱 0 ≠ 0
𝑘! (𝑘 − 1)! 1

Proposition: The vector function 𝐱(𝑡) = 𝑝(𝑡)𝑒 𝜆0 𝑡 is a solution of equation 𝑨(𝑑/𝑑𝑡)𝐱(𝑡) = 𝟎 if


1
and only if the following equalities hold: ∑𝑖𝑗=0 𝑗! 𝑨(𝑗) (𝜆0 )𝐱𝑖−𝑗 = 0, 𝑖 = 1,2, … , 𝑘. Here and
elsewhere in this work 𝑨(𝑗) (𝜆) denotes the 𝑗 𝑡ℎ derivative of 𝑨(𝜆) with respect to 𝜆.

Proof: Let 𝐱(𝑡) = 𝒑(𝑡)𝑒 𝜆0 𝑡 be given by the above expression. Computation shows that

𝑡𝑘 𝑡 𝑘−1 𝑑𝐱(𝑡) 𝑑𝒑(𝑡) 𝜆 𝑡


𝐱(𝑡) = 𝒑(𝑡)𝑒 𝜆0 𝑡 = ( 𝐱0 + 𝐱1 + ⋯ + 𝐱 𝑘 ) 𝑒 𝜆0 𝑡 ⟹ = 𝑒 0 + 𝜆0 𝒑(𝑡)𝑒 𝜆0 𝑡
𝑘! (𝑘 − 1)! 𝑑𝑡 𝑑𝑡

𝑑𝐱(𝑡) 𝑑𝒑(𝑡) 𝜆 𝑡 𝑑𝐱(𝑡) 𝑡 𝑘−1 𝑡 𝑘−2


⟹ − 𝜆0 𝐱(𝑡) = 𝑒 0 ⟹ − 𝜆0 𝐱(𝑡) = ( 𝐱 + 𝐱 + ⋯ + 𝐱 𝑘−1 ) 𝑒 𝜆0 𝑡
𝑑𝑡 𝑑𝑡 𝑑𝑡 (𝑘 − 1)! 0 (𝑘 − 2)! 1

𝑑 𝑡 𝑘−1 𝑡 𝑘−2
( − 𝜆0 𝑰) 𝐱(𝑡) = ( 𝐱0 + 𝐱 + ⋯ + 𝐱 𝑘−1 ) 𝑒 𝜆0 𝑡
𝑑𝑡 (𝑘 − 1)! (𝑘 − 2)! 1
More generally,
𝑗
𝑑 𝑡 𝑘−𝑗 𝑡 𝑘−𝑗−1
( − 𝜆0 𝑰) 𝐱(𝑡) = ( 𝐱0 + 𝐱 + ⋯ + 𝐱 𝑘−𝑗 ) 𝑒 𝜆0 𝑡 𝑗 = 0,1,2, … , 𝑘
𝑑𝑡 (𝑘 − 𝑗)! (𝑘 − 𝑗 − 1)! 1
𝑗
𝑑
( − 𝜆0 𝑰) 𝐱(𝑡) = 0 𝑗 = 𝑘 + 1, 𝑘 + 2, …
{ 𝑑𝑡

Write also the Taylor series for 𝑨(𝜆):

1 (1) 1
𝑨(𝜆) = 𝑨(𝜆0 ) + 𝑨 (𝜆0 )(𝜆 − 𝜆0 ) + ⋯ + 𝑨(ℓ) (𝜆0 )(𝜆 − 𝜆0 )ℓ
1! ℓ!
Then, replacing here 𝜆 by 𝑑/𝑑𝑡, we obtain

𝑑 1 ′ 𝑑 1 (ℓ) 𝑑
𝑨 ( ) 𝐱(𝑡) = 𝑨(𝜆0 )𝐱(𝑡) + 𝑨 (𝜆0 ) ( − 𝜆0 𝑰) 𝐱(𝑡) + ⋯ + 𝑨 (𝜆0 ) ( − 𝜆0 𝑰) 𝐱(𝑡)
𝑑𝑡 1! 𝑑𝑡 ℓ! 𝑑𝑡
Replace (𝑑/𝑑𝑡 − 𝜆0 𝑰) by it explicit formula we get
𝑖
1
∑ 𝑨(𝑗) (𝜆0 )𝐱𝑖−𝑗 = 0, 𝑖 = 1,2, … , 𝑘 
𝑗!
𝑗=0

The sequence of 𝑚-dimensional vectors 𝐱 0 , 𝐱1 … , 𝐱 𝑘 (𝐱 0 ≠ 0) for which equalities


1
∑𝑖𝑗=0 𝑨(𝑗) (𝜆0 )𝐱𝑖−𝑗 = 0 hold is called a Jordan chain of length 𝑘 + 1 for 𝑨(𝜆) corresponding
𝑗!
to the complex number 𝜆0 . Its leading vector 𝐱 0 (≠ 0) is an latent-vector, and the
subsequent vectors 𝐱1 … , 𝐱 𝑘 are sometimes known as generalized latent-vectors. A
number 𝜆0 for which a Jordan chain exists is called an eigenvalue of 𝑨(𝜆), and the set

𝜎(𝑨) = {𝜆0 ∈ ℂ| 𝜆0 is an eigenvalue of 𝑨(𝜆)}

is the spectrum of 𝑨(𝜆). This definition of a Jordan chain is a generalization of the well-
known notion of a Jordan chain for a square matrix 𝑨. Indeed, let 𝐱 0 , 𝐱1 … , 𝐱 𝑘 be the
Jordan chain of 𝑨, i.e., 𝑨𝐱 0 = 𝜆0 𝐱 0 , 𝑨𝐱1 = 𝜆0 𝐱1 + 𝐱 0 , … , 𝑨𝐱 𝑘 = 𝜆0 𝐱 𝑘 + 𝐱 𝑘−1 .

Then these equalities mean exactly that 𝐱 0 , 𝐱1 … , 𝐱 𝑘 is a Jordan chain of the matrix
polynomial 𝜆𝑰 − 𝑨 in the above sense. We stress that for a matrix polynomial of degree
greater than one, the vectors in its Jordan chains need not be linearly independent, in
contrast to the linear matrix polynomials of type 𝜆𝑰 − 𝑨 (with square matrix 𝑨). Indeed,
the zero vector is admissible as a generalized eigenvector.

What is the relationship between 𝐱 𝑖 and solvents of 𝑨(𝜆)?


1
To answer such question let we expand ∑𝑖𝑗=0 𝑗! 𝑨(𝑗) (𝜆0 )𝐱𝑖−𝑗 = 0 to get

𝑨(𝜆0 )𝐱0 = 0
𝑨(𝜆0 )𝐱1 + 𝑨′ (𝜆0 )𝐱0 = 0
𝑨(𝜆0 )𝐱 2 + 𝑨′ (𝜆0 )𝐱1 + 𝑨′′ (𝜆0 )𝐱 0 = 0

Also we have 𝑨(𝜆0 ) = 𝑸(𝜆0 )(𝜆0 𝑰 − 𝑹) so 𝑨′ (𝜆0 ) = 𝑸′ (𝜆0 )(𝜆0 𝑰 − 𝑹) + 𝑸(𝜆0 ) and therefore

𝑨(𝜆0 )𝐱1 + 𝑨′ (𝜆0 )𝐱 0 = 𝑸(𝜆0 )(𝜆0 𝑰 − 𝑹)𝐱1 + 𝑸′ (𝜆0 )(𝜆0 𝑰 − 𝑹)𝐱 0 + 𝑸(𝜆0 )𝐱0
Since 𝐱 0 is an eigenvector of 𝑹 implies that (𝜆0 𝑰 − 𝑹)𝐱 0 = 0, so

𝑨(𝜆0 )𝐱1 + 𝑨′ (𝜆0 )𝐱 0 = 𝑸(𝜆0 )(𝜆0 𝑰 − 𝑹)𝐱1 + 𝑸(𝜆0 )𝐱 0 = 𝑸(𝜆0 )((𝜆0 𝑰 − 𝑹)𝐱1 + 𝐱 0 ) = 0
⟹ 𝑹𝐱1 = 𝜆0 𝐱1 + 𝐱 0

Without loss of generality we can obtain 𝑹𝐱 𝑖 = 𝜆0 𝐱 𝑖 + 𝐱 𝑖−1 𝑖 = 1,2, … , 𝑘 ≤ 𝑚

Remark: We know that 𝑨(𝜆0 )𝐱 0 = 𝟎 ⟹ 𝐱 0 ∈ Ker(𝑨(𝜆0 )) Hence 𝜆0 is an eigenvalue of 𝑨(𝜆) if


and only if Ker(𝑨(𝜆0 )) ≠ {0}. Note that from the definition of a Jordan chain 𝐱 0 , 𝐱1 … , 𝐱 𝑘 it
follows

𝑨(𝜆0 ) 0 0 0
𝑨(𝜆0 ) 𝐱0 0
𝑨′ (𝜆0 ) 𝐱1
⋮ ⋮ ⋮ 0
𝚷(𝜆0 )𝐗 = 𝟎 or ⋮ ( )=( )
1 ⋮ ⋮ ⋮
1 (𝛼) ⋮ 𝐱𝑘
𝑨 (𝜆0 ) 𝑨(𝛼−1) (𝜆0 ) 0
(𝛼! (𝛼 − 1)! … 𝑨(𝜆0 ))

To find the Jordan chain 𝐱 0 , 𝐱1 … , 𝐱 𝑘 corresponding to 𝜆0 we need only to find the null
space of 𝚷(𝜆0 ). And as a result we can say that a generalization of the latent root/vector
is the Jordan chain which is defined before.

Proposition The vectors 𝐱 0 , 𝐱1 … , 𝐱 𝑘−1 form a Jordan chain of the matrix polynomial
𝑨(𝜆) = ∑ℓ𝑖=0 𝑨𝑖 𝜆ℓ−𝑖 corresponding to 𝜆0 if and only if 𝐱 0 ≠ {0}. And

𝑨0 𝑿0 𝑱ℓ0 + 𝑨1 𝑿0 𝑱ℓ−1
0 + ⋯ + 𝑨ℓ−1 𝑿0 𝑱0 + 𝑨ℓ 𝑿0 = 0 where 𝑿0 = [𝐱 0 , 𝐱1 … , 𝐱 𝑘−1 ]

With 𝑿0 is an 𝑚 × 𝑘 matrix, and 𝑱0 is the Jordan block of size 𝑘 × 𝑘 with 𝜆0 on the main
diagonal.

Proof: By the previous proposition 𝐱 0 , 𝐱1 … , 𝐱 𝑘−1 is a Jordan chain of 𝑨(𝜆) if and only if
the vector function
𝑘−1 𝑡 𝑘−𝑗−1
𝒖0 (𝑡) = (∑ 𝐱𝑗 ) 𝑒 𝜆0 𝑡
𝑗=0 (𝑘 − 𝑗 − 1)!

Satisfies the equation 𝑨(𝑑/𝑑𝑡)𝒖0 (𝑡) = 0. But then also


𝑗
𝑑 𝑑 𝑑
0 = ( − 𝜆0 𝑰) 𝑨 ( ) 𝒖0 (𝑡) = 𝑨 ( ) 𝒖𝑗 (𝑡) 𝑗 = 1,2, … , 𝑘 − 1
𝑑𝑡 𝑑𝑡 𝑑𝑡
Where
𝑗
𝑑 𝑘−𝑗−1 𝑡 𝑘−𝑗−𝑝−1
𝒖𝑗 (𝑡) = ( − 𝜆0 𝑰) 𝒖0 (𝑡) = (∑ 𝐱 𝑝 ) 𝑒 𝜆0 𝑡
𝑑𝑡 𝑝=0 (𝑘 − 𝑗 − 𝑝 − 1)!

Consider the 𝑚 × 𝑘 matrix 𝑼(𝑡) = [𝒖𝑘−1 (𝑡), … , 𝒖0 (𝑡)] From the definition of 𝑼(𝑡) it follows
that

1 𝑡 ⋯ 𝑡 𝑘−1
1 (𝑘 − 1)! 𝜆 𝑡
𝑼(𝑡) = 𝑿0 ⋱ 𝑒 0


𝑡
(0 1 )
Using the definition of a function of a matrix, it is easy to see that 𝑼(𝑡) = 𝑿0 𝑒 𝑱0 𝑡 Now write

ℓ ℓ
𝑑 𝑑 ℓ−𝑖
𝑨 ( ) 𝑼(𝑡) = 0 ⟺ ∑ 𝑨𝑖 ℓ−𝑖 𝑿0 𝑒 𝑱0 𝑡 = 0 ⟺ (∑ 𝑨𝑖 𝑿0 𝑱ℓ−𝑖
0 )𝑒
𝑱0 𝑡
=0
𝑑𝑡 𝑑𝑡
𝑖=0 𝑖=0

Since 𝑒 𝑱0 𝑡 is nonsingular then ∑ℓ𝑖=0 𝑨𝑖 𝑿0 𝑱ℓ−𝑖


0 = 0. ■

Let 𝑨𝑐 ∈ ℝ𝑛×𝑛 be the cpmanion form of the matrix polynomial 𝑨(𝜆) ∈ ℂ𝑚×𝑚 and let 𝜆0 be an
eigenvalue of 𝑨𝑐 and 𝑱0 be its corresponding Jordan block of size 𝑘 × 𝑘 then
ℓ ℓ
𝑨𝑐 {col(𝑿0 𝑱ℓ−𝑖 ℓ−𝑖
0 )𝑖=1 } = {col(𝑿0 𝑱0 )𝑖=1 } 𝑱0

 We conclude this section with a remark concerning the notion of a left Jordan chain of
a matrix polynomial 𝑨(𝜆). The 𝑚-dimensional row vectors 𝐲0 , 𝐲1 … , 𝐲𝑘 form a left Jordan
𝐲
chain of 𝑨(𝜆) corresponding to the eigenvalue 𝜆0 if the equalities ∑𝑖𝑗=0 𝑖−𝑗 𝑨(𝑗) (𝜆0 ) = 0 hold
𝑗!
for 𝑖 = 0,1,2, … , 𝑘. The analysis of left Jordan chains is completely similar to that of usual
Jordan chains, since 𝐲0 , 𝐲1 … , 𝐲𝑘 is a left Jordan chain of 𝑨(𝜆) if and only if the
transposed vectors 𝐲0𝑇 , 𝐲1𝑇 … , 𝐲𝑘𝑇 form a usual Jordan chain for the transposed polynomial
𝑨𝑇 (𝜆), corresponding to the same eigenvalue. Thus, we shall deal mostly with the usual
Jordan chains, while the left Jordan chains will appear only occasionally.

Jordan and Standard Triples of Matrix Polynomials: In sections before, a language


and formalism have been developed for the full description of eigenvalues, eigenvectors,
and Jordan chains of matrix polynomials. In this section, triples of matrices will be
introduced which determine completely all the spectral information about a matrix
polynomial. It will then be shown how these triples can be used to solve the inverse
problem, namely, given the spectral data to determine the coefficient matrices of the
polynomial. The Jordan normal form for complex matrices is extended to admit
“canonical triples” or “standard triples” of matrices for monic matrix polynomials on finite
dimensional linear spaces. These ideas lead to the formulation of canonical, or standard,
forms for such polynomials.

In the previous development we have seen that 𝑨−1 (𝜆) = 𝑪𝑐 (𝜆𝑰 − 𝑨𝑐 )−1 𝑩𝑐 and 𝑨𝑐 = 𝑽𝑅 𝚲𝑅 𝑽−1
𝑅
therefore
𝑨−1 (𝜆) = 𝑪𝑐 (𝜆𝑰 − 𝑽𝑅 𝚲𝑅 𝑽−1 −1 −1 −1
𝑅 ) 𝑩𝑐 = 𝑪𝑐 𝑽𝑅 (𝜆𝑰 − 𝚲𝑅 ) 𝑽𝑅 𝑩𝑐
= 𝑿𝑅 (𝜆𝑰 − 𝚲𝑅 )−1 𝒀𝑅

With 𝑿𝑅 = [𝑰 𝟎 … 𝟎]𝑽𝑅 = [𝑰 𝑰 … 𝑰] and 𝑽𝑅 𝒀𝑅 = [𝟎 𝟎 … 𝑰]𝑇 .

Now if we let 𝑺𝑖 = [𝐱 𝑖1 𝐱 𝑖2 … 𝐱 𝑖𝑚 ] where 𝐱′𝑠 are latent vectors corresponding to the solvent
𝑹𝑖 such that 𝑹𝑖 𝑺𝑖 = [𝑹𝑖 𝐱 𝑖1 𝑹𝑖 𝐱 𝑖2 … 𝑹𝑖 𝐱 𝑖𝑚 ] = [𝜆𝑖1 𝐱 𝑖1 𝜆𝑖2 𝐱 𝑖2 … 𝜆𝑖𝑚 𝐱 𝑖𝑚 ] = 𝑺𝑖 𝑱𝑖 ⟹ 𝑺−1
𝑖 𝑹𝑖 𝑺𝑖 = 𝑱𝑖 then
−1
this leads to 𝚲𝑅 = 𝑺𝑱𝑺 where 𝑺 = blkdiag(𝑺1 , 𝑺2 , … , 𝑺ℓ )

𝑹1
−1
𝑺 = blkdiag(𝑺1 , 𝑺2 , … , 𝑺ℓ ) ⟹ 𝑺 −1
= blkdiag(𝑺1−1 , 𝑺−1
2 , … , 𝑺ℓ ) ⟹ 𝚲𝑅 = ( ⋱ ) = 𝑺𝑱𝑺−1.
𝑹ℓ
Based on this information we can define the Jordan triple, by taking the following
similarity transformation 𝚲𝑅 = 𝑺𝑱𝑺−1.

𝑨−1 (𝜆) = 𝑿𝑅 (𝜆𝑰 − 𝚲𝑅 )−1 𝒀𝑅 = 𝑿𝑅 𝑺(𝜆𝑰 − 𝑱)−1 𝑺−1 𝒀𝑅 = 𝑿(𝜆𝑰 − 𝑱)−1 𝒀

𝑿 = 𝑿𝑅 𝑺 = [𝑰 𝑰 … 𝑰]𝑺 = [𝑺1 , 𝑺2 , … , 𝑺ℓ ] and 𝒀 = 𝑺−1 𝒀𝑅 = 𝑺−1 𝑽−1 𝑇


𝑅 [𝟎 𝟎 … 𝑰] ⟹ (𝑽𝑅 𝑺)𝒀 = [𝟎 𝟎 … 𝑰]
𝑇


Note: The reader is asked to check that 𝑽𝑅 𝑺 = col(𝑿𝑱ℓ−𝑖 )𝑖=1 .

The Jordan Triple: the triple of matrices (𝑿, 𝑱, 𝒀) where 𝑿 ∈ ℝ𝑚×𝑚ℓ , 𝑱 ∈ ℝ𝑚ℓ×𝑚ℓ and
𝒀 ∈ ℝ𝑚ℓ×𝑚 , such that 𝑨−1 (𝜆) = 𝑿(𝜆𝑰 − 𝑱)−1 𝒀 are called a Jordan triple of the monic matrix
polynomial 𝑨(𝜆) of degree ℓ and order 𝑚. With: 𝑱 is a block diagonal matrix composed of
Jordan blocks each corresponding to a particular latent root. Each column of 𝑿 is an
element of a Jordan chain associated with the appropriate Jordan block in 𝑱 and 𝒀 is a
matrix of left latent vectors which can be computed by:

𝑿 𝟎
𝑿𝑱 𝟎
( ⋮ )𝒀 = ( )

𝑿𝑱ℓ−1 𝑰

Remark: The set of all Jordan chains of a particular monic matrix polynomial can be
grouped in a triple (𝑿, 𝑱, 𝒀), which is called “The Jordan Triple”.

Definition: (The Standard Triple) A set of three matrices (𝒁, 𝑻, 𝑾) is called a standard
triple of the monic matrix polynomial 𝑨(𝜆) if it is related to the Jordan triple (𝑿, 𝑱, 𝒀) by
the following similarity transformation: 𝒁 = 𝑿𝑴−1 , 𝑻 = 𝑴𝑱𝑴−1 , 𝑾 = 𝑴𝒀 and that 𝑻 is
standard form.

Now if we let 𝑻 be any linearization of the operator polynomial 𝑨(𝜆) with invertible
leading coefficient, then there exists an invertible matrix 𝑸 such that 𝑸−1 𝑻𝑸 = 𝑨𝑐 .

We then deduce from the structure of 𝑨𝑐 , and the relation 𝑻𝑸 = 𝑸𝑨𝑐 that 𝑸 must have the
representation
𝑸1
𝑸1 𝑻 ℓ
𝑸=( ) = col(𝑸1 𝑻ℓ−𝑖 )𝑖=1

𝑸1 𝑻ℓ−1

for some linear operator (matrix) 𝑸1 and that: 𝑨0 𝑸1 𝑻ℓ + 𝑨1 𝑸1 𝑻ℓ−1 + ⋯ + 𝑨ℓ−1 𝑸1 𝑻 + 𝑨ℓ 𝑸1 = 0


Theorem: Let {𝑹𝑖 }ℓ𝑖=1 be a complete set of solvents for the monic matrix polynomial 𝑨(𝜆)
then this operator polynomial 𝑨(𝜆) will admit (𝒁, 𝑻, 𝑾) as a standard triple, with
𝒁 = [𝑰 𝑰 … 𝑰], 𝑻 = 𝚲𝑅 and 𝑾 = 𝑽−1 𝑇
𝑅 [𝟎 𝟎 … 𝑰] .

𝑗 𝑗 𝑗 ℓ
Proof: Notice that 𝒁𝑻𝑗 = [𝑹1 𝑹2 … 𝑹ℓ ], 𝑗 = 1,2, … , ℓ − 1 which implies that col(𝒁𝑻ℓ−𝑖 )𝑖=1 = 𝑽𝑅

therefore, one can verify that col(𝒁𝑻ℓ−𝑖 )𝑖=1 𝑾 = [𝟎 𝟎 … 𝑰]𝑇 ■
Remark: The standard triple allows a representation of a matrix polynomial using its
spectral information, and this is shown in the next theorem.

Theorem: Let 𝑨(𝜆) be a monic matrix polynomial of degree ℓ and order 𝑚 with standard
triple (𝑿, 𝑻, 𝒀), then 𝑨(𝜆) has the following representations:

⦁ 𝐫𝐢𝐠𝐡𝐭 𝐜𝐚𝐧𝐨𝐧𝐢𝐜𝐚𝐥 𝐟𝐨𝐫𝐦: 𝑨(𝜆) = 𝑰𝜆ℓ − 𝑿𝑻ℓ (𝑽1 + 𝑽2 𝜆 + ⋯ + 𝑽ℓ 𝜆ℓ−1 )

ℓ −1
where 𝑽𝑖 are 𝑚ℓ × 𝑚 matrices such that: [𝑽1 𝑽2 … 𝑽ℓ ] = (col(𝑿𝑻ℓ−𝑖 )𝑖=1 ) .

⦁ 𝐥𝐞𝐟𝐭 𝐜𝐚𝐧𝐨𝐧𝐢𝐜𝐚𝐥 𝐟𝐨𝐫𝐦: 𝑨(𝜆) = 𝑰𝜆ℓ − (𝑾1 + 𝑾2 𝜆 + ⋯ + 𝑾ℓ 𝜆ℓ−1 ) 𝑻ℓ 𝒀

−1
where 𝑾𝑖 are 𝑚 × 𝑚ℓ matrices such that: col(𝑾𝑖 )ℓ𝑖=1 = [𝒀, 𝑻𝒀, … , 𝑻ℓ−1 𝒀 ] .

And [𝑨(𝜆)]−1 = 𝑿(𝜆𝑰 − 𝑻)−1 𝒀



Proof: Notice that 𝑨(𝜆) = 𝑰𝜆ℓ + [𝑨ℓ , 𝑨ℓ−1 , … , 𝑨1 ]col(𝑰𝜆ℓ−𝑖 )𝑖=1 and previously we have seen
that if 𝑻 is any linearization of the monic operator polynomial 𝑨(𝜆) then, there exist some
linear operator 𝑿 such that 𝑨0 𝑿𝑻ℓ + 𝑨1 𝑿𝑻ℓ−1 + ⋯ + 𝑨ℓ−1 𝑿𝑻 + 𝑨ℓ 𝑿 = 0 which can be written
ℓ ℓ −1
as [𝑨ℓ , 𝑨ℓ−1 , … , 𝑨1 ] (col(𝑿𝑻ℓ−𝑖 )𝑖=1 ) = −𝑿𝑻ℓ and define [𝑽1 𝑽2 … 𝑽ℓ ] = (col(𝑿𝑻ℓ−𝑖 )𝑖=1 ) then we
obtain [𝑨ℓ , 𝑨ℓ−1 , … , 𝑨1 ] = −𝑿𝑻ℓ [𝑽1 𝑽2 … 𝑽ℓ ], which leads to
ℓ ℓ
𝑨(𝜆) = 𝑰𝜆ℓ + [𝑨ℓ , 𝑨ℓ−1 , … , 𝑨1 ]col(𝑰𝜆ℓ−𝑖 )𝑖=1 = 𝑰𝜆ℓ − 𝑿𝑻ℓ [𝑽1 𝑽2 … 𝑽ℓ ]col(𝑰𝜆ℓ−𝑖 )𝑖=1
= 𝑰𝜆ℓ − 𝑿𝑻ℓ (𝑽1 + 𝑽2 𝜆 + ⋯ + 𝑽ℓ 𝜆ℓ−1 )

Following the same procedure we can prove the rest. ■

The following standard triples associated with 𝑨(𝜆) will be used quite extensively in the
rest of the presentation:
𝑶 𝑰 ⋯ 𝑶 𝑶
𝑶 𝑶 ⋱ ⋮ ⋮ 𝟎
𝑷1 = [𝑰 𝟎 … 𝟎], 𝑨𝑐1 = ⋮ ⋮ ⋮ 𝑰 and 𝑸1 = (𝟎)
𝑶 ⋱ ⋮
𝑶 𝑶 𝑰
−𝑨ℓ−1 ⋯ −𝑨2 −𝑨1 𝑰
(−𝑨ℓ )

𝑶 𝑶 ⋯ 𝑶 −𝑨ℓ 𝑰
𝑰 𝑶 ⋱ ⋮ −𝑨ℓ−1
𝑷2 = [𝟎 𝟎 … 𝑰], 𝑨𝑐2 = ⋮ 𝑰 ⋮ ⋮ and 𝟎
𝑸2 = ( )
⋮ ⋮
𝑶 ⋱ 𝑶 −𝑨2
𝑶 … 𝑰 −𝑨1 ) 𝟎
(𝑶
The following equality is verified by direct multiplication: 𝑨𝑐2 = 𝑩𝑨𝑐1 𝑩−1 where

𝑨ℓ−1 ⋯ 𝑨2 𝑨1 𝑰
⋮ ⋰ 𝑰
𝑩 = 𝑨2 ⋰ ⋰
𝑨1 𝑰
( 𝑰 )
From the previous developments, it is clear that the Jordan structure of a matrix
polynomial 𝑨(𝜆) is directly related to the Jordan structure of its block companion
matrices. The relation between the eigenvectors of 𝑨𝑐1 and the latent vectors of 𝑨(𝜆) is
shown in the book of matrix polynomials by BEKHITI 2020.

Theorem: If 𝑨𝑘 (𝜆) are monic matrix polynomials with standard triple (𝑿𝑘 , 𝑻𝑘 , 𝒀𝑘 ) for
𝑘 = 1,2, then 𝑨(𝜆) = 𝑨2 (𝜆)𝑨1 (𝜆) has the following standard triple.

𝑻1 𝒀1 𝑿2 𝟎
𝑿 = [𝑿1 𝟎] 𝑻=( ) 𝒀=( )
𝟎 𝑻2 𝒀2

Proof: From the basic matrix theory it is very well-known that

𝑨 𝟎 −1 𝑨−1
−1
𝟎 ) and (𝑨 𝑫) = (𝑨−1 −𝑨−1 𝑫𝑩−1 )
( ) = ( −1
𝑪 𝑩 −𝑩 𝑪𝑨−1 𝑩−1 𝟎 𝑩 𝟎 𝑩−1
And from the theory of standard triples

𝜆𝑰 − 𝑻1 −𝒀1 𝑿2 −1 𝟎
𝑨−1 (𝜆) = 𝑿(𝜆𝑰 − 𝑻)−1 𝒀 = [𝑿1 𝟎] ( ) ( )
𝟎 𝜆𝑰 − 𝑻2 𝒀2
(𝜆𝑰 − 𝑻1 ) −1 (𝜆𝑰 − 𝑻1 ) 𝒀1 𝑿2 (𝜆𝑰 − 𝑻2 )−1 𝟎
−1
= [𝑿1 𝟎] ( )( )
𝟎 (𝜆𝑰 − 𝑻2 )−1 𝒀2
−1 −1
= (𝑿1 (𝜆𝑰 − 𝑻1 ) 𝒀1 )(𝑿2 (𝜆𝑰 − 𝑻2 ) 𝒀2 )
= 𝑨1−1 (𝜆)𝑨−1
2 (𝜆)

If 𝑨(𝜆) = 𝑨1 (𝜆) 𝑨2 (𝜆) is a particular factorization of the monic matrix polynomial 𝑨(𝜆),
with 𝜎(𝑨1 (𝜆)) ⋂𝜎(𝑨2 (𝜆)) = ∅, then the monic matrix polynomials 𝑨1 (𝜆) and 𝑨2 (𝜆) are
called spectral divisors of 𝑨(𝜆). Clearly, if a matrix polynomial possesses spectral
divisors, then there exists a similarity transformation that can transform the block
companion matrix 𝑨𝑐 , associated with 𝑨(𝜆) to a block diagonal one.

Remark: If the set of matrices (𝑿, 𝑻, 𝒀) is a standard (or Jordan) triple then we call the set
(𝑿, 𝑻) standard (or Jordan) pair.

Invariant Pair Problem: Jordan chains are conceptually elegant but fragile under
perturbations and, therefore, not well suited for numerical purposes; for a recent
discussion. In a computational setting, it is therefore recommended to replace Jordan
chains by the more robust concept of invariant pairs.

Definition: Let 𝑨(𝜆) ∈ ℂ𝑚×𝑚 be an ℓ𝑡ℎ degree monic matrix polynomial. We define a pair of
matrices (𝑿, 𝑻) ∈ ℂ𝑚×𝑘 × ℂ𝑘×𝑘 , and 𝑿 ≠ 0, is called an invariant pair if it satisfies the
relation: 𝑨(𝑿, 𝑻) = 𝑨0 𝑿𝑻ℓ + 𝑨1 𝑿𝑻ℓ−1 + ⋯ + 𝑨ℓ−1 𝑿𝑻 + 𝑨ℓ 𝑿 = 𝟎𝑚×𝑘 .

Where 𝑨𝑖 ∈ ℝ𝑚×𝑚 , 𝑖 = 1, … , ℓ, and 𝑘 is an integer between 1 and 𝑚.

Remark: Infinite eigenvalues can still be covered by defining invariant pairs for the
reversal polynomial rev(𝑨(𝜆)) = 𝜆ℓ 𝑨(𝜆−1 ). If a polynomial has zero and infinite
eigenvalues, they have to be handled by separate invariant pairs, one for the original and
one for the reverse polynomial. (see the book of matrix poly by BEKHITI 2020)
Definition: A pair (𝑿, 𝑻) ∈ ℂ𝑚×𝑘 × ℂ𝑘×𝑘 , is called minimal if there is 𝛼 ∈ ℕ⋆ such that:
𝑿
𝛼 𝑿𝑻
𝑽𝛼 (𝑿, 𝑻) = col(𝑿𝑻𝑖−1 )𝑖=1 = ( )

𝑿𝑻𝛼−1

has full rank. The smallest such 𝛼 is called minimality index of (𝑿, 𝑻). The invariant pairs
(𝑿, 𝑻) is said to be minimal if 𝑟𝑎𝑛𝑘 [𝜆𝑰𝑘 − 𝑻] = 𝑘 ∀𝜆 ∈ ℂ
𝑿

Definition: An invariant pair (𝑿, 𝑻) for a regular matrix polynomial 𝑨(𝜆) of degree ℓ is
called simple if (𝑿, 𝑻) is minimal and the algebraic multiplicities of the eigenvalues of 𝑻
are identical to the algebraic multiplicities of the corresponding eigenvalues of 𝑨(𝜆).

Remark: Invariant pairs are closely related to the theory of standard pairs presented in
the very well-known book (I. Gohberg, P. Lancaster, and L. Rodman 2009), and, in
particular, to Jordan pairs. If (𝑿, 𝑻) is a simple invariant pair and 𝑻 is in Jordan form,
and then (𝑿, 𝑻) is a Jordan pair. Polynomial eigenpairs and invariant pairs can also be
defined in terms of a contour integral.

Proposition: A pair (𝑿, 𝑻) ∈ ℂ𝑚×𝑘 × ℂ𝑘×𝑘 is an invariant pair if and only if satisfies the
1
relation: 𝑨(𝑿, 𝑻) ≔ 2𝜋𝑖 ∮Γ 𝑨(𝜆)𝑿(𝜆𝑰 − 𝑻)−1 𝑑𝜆 = 𝟎 Where Γ ⊂ ℂ is a contour with the spectrum
of 𝑻 in its interior. This formulation allows us to choose the contour Γ to compute 𝑻 with
eigenvalues lying in a particular region of the complex plane.

Proof: From the theory of matrix function (see the book of Algebra by BEKHITI 2020) it is
well known that if 𝐟(𝜆) is an analytic function in and on the contour Γ then

1
𝐟(𝑻) = ∮ 𝐟(𝜆)(𝜆𝑰 − 𝑻)−1 𝑑𝜆
2𝜋𝑖 Γ

Now let 𝐟(𝜆) = 𝑨(𝜆, 𝑿) = 𝑨(𝜆)𝑿 = ∑ℓ𝑖=0 𝑨𝑖 𝑿𝜆ℓ−𝑖 ⟹ 𝐟(𝑻) = ∑ℓ𝑖=0 𝑨𝑖 𝑿𝑻ℓ−𝑖 = 𝟎 hence

1
𝐟(𝑻) = 𝑨(𝑿, 𝑻) = ∮ 𝑨(𝜆)𝑿(𝜆𝑰 − 𝑻)−1 𝑑𝜆 = 𝟎 
2𝜋𝑖 Γ

Characterization of Solvents by Invariant Pairs: In this section we study the matrix


solvent problem as a particular case of the invariant pair problem, and we apply to
solvents some results we have obtained for invariant pairs.

Consider the following monic matrix polynomial 𝑨(𝜆) = ∑ℓ𝑖=0 𝑨𝑖 𝜆ℓ−𝑖 . The associate right
matrix difference equation is 𝑼𝑘 + 𝑨1 𝑼𝑘−1 + ⋯ + 𝑨ℓ 𝑼𝑘−ℓ = 𝟎 and the associate left matrix
difference equation is 𝑽𝑘 + 𝑽𝑘−1 𝑨1 + ⋯ + 𝑽𝑘−ℓ 𝑨ℓ = 𝟎 with 𝑼𝑗 ∈ ℂ𝑚×𝑚 , 𝑽𝑗 ∈ ℂ𝑚×𝑚 𝑗 = 0,1,2, ….

Theorem: Given a matrix polynomial 𝑨(𝜆) having (𝑿, 𝑻, 𝒀) as a standard triple, the
general solution of ∑ℓ𝑖=0 𝑨𝑖 𝑼𝑘−𝑖 = 𝟎 is: 𝑼𝑘 = 𝑿𝑻𝑘 𝑪 where 𝑪 ∈ ℂℓ𝑚×𝑚 and the general solution
of ∑ℓ𝑖=0 𝑽𝑘−𝑖 𝑨𝑖 = 𝟎 is: 𝑽𝑘 = 𝑫𝑻𝑘 𝒀 where 𝑫 ∈ ℂ𝑚×ℓ𝑚 .
Proof: Using the definition of a standard pair, the following identity is satisfied:

𝑿𝑻ℓ + 𝑨1 𝑿𝑻ℓ−1 + ⋯ + 𝑨ℓ−1 𝑿𝑻 + 𝑨ℓ 𝑿 = 𝟎𝑚×𝑘 ⟺ [𝑨ℓ , 𝑨ℓ−1 , … , 𝑨1 ] (col(𝑿𝑻ℓ−𝑖 )𝑖=1 ) = −𝑿𝑻ℓ

If we multiply on the right by 𝑻𝑘−ℓ 𝑪 we get 𝑿𝑻𝑘 𝑪 + 𝑨1 𝑿𝑻𝑘−1 𝑪 + ⋯ + 𝑨ℓ 𝑿𝑻𝑘−ℓ 𝑪 = 𝟎𝑚×𝑘 and
thus 𝑼𝑘 = 𝑿𝑻𝑘 𝑪 verifies the equation ∑ℓ𝑖=0 𝑨𝑖 𝑼𝑘−𝑖 = 𝟎.

The proof of ∑ℓ𝑖=0 𝑽𝑘−𝑖 𝑨𝑖 = 𝟎 can be derived by using the fact that the standard triple of
[𝑨(𝜆)]𝑇 is (𝒀𝑇 , 𝑻𝑇 , 𝑿𝑇 ). ■

Corollary: The solution of ∑ℓ𝑖=0 𝑨𝑖 𝑼𝑘−𝑖 = 𝟎 corresponding to the initial conditions:

𝑼0 = 𝑼1 = ⋯ = 𝑼ℓ−2 = 𝟎 𝑼ℓ−1 = 𝑰𝑚
𝑘
is 𝑼𝑘 = 𝑿𝑻 𝒀

Proof: Using 𝑼𝑘 = 𝑿𝑻𝑘 𝑪, we can write the following set of equations

𝑼0 = 𝑿𝑪 = 𝟎
𝑼1 = 𝑿𝑻𝑪 = 𝟎 𝑿 𝟎
𝑿𝑻 𝟎
⋮ ⟺ ( )𝒀 = ( ) 
⋮ ⋮
𝑼𝑘−2 = 𝑿𝑻𝑘−2 𝑪 = 𝟎 𝑘−1 𝑰
𝑿𝑻 𝑚
𝑼𝑘−1 = 𝑿𝑻𝑘−1 𝑪 = 𝑰𝑚

Corollary: The solution of ∑ℓ𝑖=0 𝑽𝑘−𝑖 𝑨𝑖 = 𝟎 corresponding to the initial conditions:

𝑽0 = 𝑽1 = ⋯ = 𝑽ℓ−2 = 𝟎 𝑽ℓ−1 = 𝑰𝑚
𝑘
is 𝑽𝑘 = 𝑿𝑻 𝒀

Proof: The proof of this corollary is the same as before.

Remark: We remark that for this particular set of initial conditions, the right and left
difference equations produce the same result.

Corollary: If the matrix polynomial 𝑨(𝜆) has a complete set of solvents then the solution
of ∑ℓ𝑖=0 𝑨𝑖 𝑼𝑘−𝑖 = 𝟎 subject to the initial conditions 𝑼ℓ−1 = 𝑰 and 𝑼𝑖 = 𝟎 𝑖 = 0,1, … , ℓ − 2 is:

ℓ 𝒀1
𝒀2
𝑼𝑘 = ∑ 𝑹𝑗𝑘 𝒀𝑗 𝒀=( )

𝑗=1 𝒀ℓ

Proof: we know that admit the following standard triple (𝑿, 𝑻, 𝒀) where

𝑹1 𝟎
𝑿 = [𝑰 𝑰 … 𝑰], 𝑻 = 𝚲𝑅 = ( ⋱ ) and 𝒀 = 𝑽−1
𝑅 ( ⋮)
𝑹ℓ 𝑰

Replacing in 𝑼𝑘 = 𝑿𝑻𝑘 𝒀 we get

𝒀1 𝒀1
𝑹1 𝑘 ℓ
𝒀2 𝒀
𝑼𝑘 = 𝑿𝑻𝑘 𝒀 = [𝑰 𝑰 … 𝑰] ( ⋱ ) ( ) = [𝑹1 𝑘 𝑹2 𝑘 … 𝑹ℓ 𝑘 ] ( 2 ) = ∑ 𝑹𝑗𝑘 𝒀𝑗
𝑘 ⋮ ⋮
𝑹ℓ 𝒀ℓ 𝒀ℓ 𝑗=1
Example: Consider the following matrix difference equation 𝑼𝑘+2 + 𝑨1 𝑼𝑘+1 + 𝑨2 𝑼𝑘 = 𝟎𝑚
which can be written as 𝑼𝑘 + 𝑨1 𝑼𝑘−1 + 𝑨2 𝑼𝑘−2 = 𝟎𝑚 and by using the shift operator 𝑧 −1 we
get (𝑰 + 𝑨1 𝑧 −1 + 𝑨2 𝑧 −2 )𝑼𝑘 = 𝟎𝑚 ⟹ (𝑧 2 𝑰 + 𝑨1 𝑧 + 𝑨2 )𝑼𝑘 = 𝟎𝑚 The characteristic polynomial of
the matrix DE is 𝑷(𝑧) = 𝑧 2 𝑰 + 𝑨1 𝑧 + 𝑨2 which is a monic matrix polynomial. If 𝑹1 & 𝑹2 are
right solvents of 𝑷(𝑧) then the homogenous solution is 𝑼𝑘 = 𝑹1 𝑘 𝒀1 + 𝑹2 𝑘 𝒀2 . To verify the
solution we replace this result in the difference equation

𝑹1 𝑘+2 𝒀1 + 𝑹2 𝑘+2 𝒀2 + 𝑨1 (𝑹1 𝑘+1 𝒀1 + 𝑹2 𝑘+1 𝒀2 ) + 𝑨2 (𝑹1 𝑘 𝒀1 + 𝑹2 𝑘 𝒀2 ) =


(𝑹1 2 + 𝑨1 𝑹11 + 𝑨2 ) 𝑹1 𝑘 𝒀1 + ⏟
⏟ (𝑹2 2 + 𝑨1 𝑹21 + 𝑨2 ) 𝑹1 𝑘 𝒀2 = 0
0 0

Theorem: Let 𝑨(𝜆) ∈ ℂ𝑚×𝑚 be an ℓ𝑡ℎ degree monic matrix polynomial and consider an
invariant pair (𝑿, 𝑻) ∈ ℂ𝑚×𝑘 × ℂ𝑘×𝑘 of 𝑨(𝜆) (sometimes called admissible pairs). If the
matrix 𝑿 has size 𝑚 × 𝑚, i.e. 𝑘 = 𝑚, and is invertible, then 𝑹 = 𝑿𝑻𝑿−𝟏 satisfies 𝑨(𝑹) = 𝟎 i.e.
𝑹 is a matrix solvent of 𝑨(𝜆).

Proof: As (𝑿, 𝑻) is an invariant pair of 𝑨(𝜆), we have: 𝑨(𝑿, 𝑻): = ∑ℓ𝑖=0 𝑨𝑖 𝑿𝑻ℓ−𝑖 = 0. Since 𝑿 is
invertible, we can post-multiply by 𝑿−1 . Then we get:

𝑨(𝑿, 𝑻): = 𝑨0 𝑿𝑻ℓ 𝑿−1 + ⋯ + 𝑨ℓ−1 𝑿𝑻𝑿−1 + 𝑨ℓ = 0


: = 𝑨0 𝑹ℓ + ⋯ + 𝑨ℓ−1 𝑹 + 𝑨ℓ = 0

Therefore, 𝑹 is a matrix solvent of 𝑨(𝜆), with (𝑿, 𝑻) ∈ ℂ𝑚×𝑚 × ℂ𝑚×𝑚

Remark: If invariant pair (𝑿, 𝑻) is finite, then the matrix 𝑺 = 𝑹−1 = 𝑿𝑻−1 𝑿−1 is a solvent
to the reversal matrix polynomial rev(𝑨(𝜆)).

If 𝑨(𝜆) ∈ ℂ𝑚×𝑚 be an ℓ𝑡ℎ degree monic matrix polynomial then it admit the following

standard triple (𝑿, 𝑻, 𝒀) where 𝑿 = [𝑰 𝑰 … 𝑰], 𝑻 = 𝚲𝑅 and 𝒀 = 𝑽−1 𝑇
𝑅 [𝟎 𝟎 … 𝑰] = col(𝒀𝑖 )𝑖=1 which
satisfy
𝒀1
(𝜆𝑰𝑚 − 𝑹1 )−1
𝒀
[𝑨(𝜆)]−1 = 𝑿(𝜆𝑰 − 𝑻)−1 𝒀 = [𝑰 𝑰 … 𝑰] ( ⋱ ) ( 2)
−1 ⋮
(𝜆𝑰𝑚 − 𝑹ℓ ) 𝒀ℓ

= ∑(𝜆𝑰𝑚 − 𝑹𝑖 )−1 𝒀𝑖
𝑖=1
The above result is a partial fraction expansion of the inverse of [𝑨(𝜆)]−1. Furthermore,
since 𝒀 is the last block column of 𝑽−1
𝑅 its block elements 𝒀𝑖 can be computed if the block
Vandermonde matrix is nonsingular.

Theorem: Let 𝑨(𝜆) ∈ ℂ𝑚×𝑚 be an ℓ𝑡ℎ degree monic matrix polynomial if 𝑨(𝜆) has a
complete spectral factorization: 𝑨(𝜆) = (𝜆𝑰𝑚 − 𝑸ℓ )(𝜆𝑰𝑚 − 𝑸ℓ−1 ). . . (𝜆𝑰𝑚 − 𝑸1 ) then 𝜆𝑰 − 𝑻 is
a linearization of 𝑨(𝜆), i.e. 𝜆𝑰 − 𝑻 ~ (𝑨(𝜆) 𝑶) where
𝑶 𝑰
𝑸1 𝑰𝑚 ⋯ 𝟎 𝑚 𝟎𝑚
𝟎𝑚 𝑸2 ⋱ ⋮ ⋮
𝑻= ⋮ ⋮ ⋮ 𝑰𝑚
𝟎𝑚 ⋱
𝟎𝑚 𝑸ℓ−1 𝑰𝑚
(𝟎𝑚 𝟎𝑚 ⋯ 𝟎𝑚 𝑸ℓ )
Proof: We define the following two matrices

𝑰𝑚 𝟎𝑚 𝟎𝑚 ⋯ 𝟎𝑚 𝟎𝑚
−(𝜆𝑰𝑚 − 𝑸1 ) 𝑰𝑚 𝟎𝑚 ⋯ 𝟎𝑚 𝟎𝑚
𝑭(𝜆) = ( ⋯ )
⋮ ⋮ ⋮ ⋮ ⋮
𝟎𝑚 𝟎𝑚 …
𝟎𝑚 −(𝜆𝑰𝑚 − 𝑸ℓ ) 𝑰𝑚

𝑩ℓ−1 (𝜆) 𝑩ℓ−2 (𝜆) 𝑩1 (𝜆) 𝑩0 (𝜆)



−𝑰𝑚 𝟎𝑚 … 𝟎𝑚 𝟎𝑚
𝑬(𝜆) = 𝟎𝑚 −𝑰𝑚 …
⋱ 𝟎𝑚 ⋮
⋮ ⋮ ⋮ ⋮

( 𝟎𝑚 𝟎𝑚 −𝑰𝑚 𝟎𝑚 )

𝑩0 (𝜆) = 𝑰
𝑩𝑘 (𝜆) = 𝑩𝑘−1 (𝜆)(𝜆𝑰𝑚 − 𝑸ℓ−𝑘+1 ) 𝑘 = 1,2, … , ℓ − 1

We see that det 𝑬(𝜆) = ±1 and det 𝑭(𝜆) = 1. So 𝑬 and 𝑭 are unimodular matrix
polynomials. Furthermore, by simply computing the product, we have:

(𝜆𝑰𝑚 − 𝑸ℓ ). . . (𝜆𝑰𝑚 − 𝑸1 ) 𝟎
𝑬(𝜆)(𝜆𝑰 − 𝑻) = ( ) 𝑭(𝜆)  (Q.e.d)
𝟎 𝑰
Remark: A very interesting standard triple is defined when 𝑨(𝜆) has a complete spectral
factorization: 𝑨(𝜆) = (𝜆𝑰𝑚 − 𝑸ℓ )(𝜆𝑰𝑚 − 𝑸ℓ−1 ). . . (𝜆𝑰𝑚 − 𝑸1 ) and it is the following one

𝑸1 𝑰𝑚 ⋯ 𝟎𝑚 𝟎 𝑚
𝟎𝑚 𝑸2 ⋱ ⋮
𝟎𝑚 ⋮
𝟎 ⋮ ⋮ 𝑰𝑚
𝑿 = [𝑰𝑚 𝟎𝑚 … 𝟎𝑚 ], 𝒀 = ( 𝑚 ) and 𝑻 = ⋮ ⋱
⋮ 𝟎𝑚 𝟎𝑚 𝑸ℓ−1 𝑰𝑚
𝑰𝑚 𝟎𝑚
(𝟎𝑚 ⋯ 𝟎 𝑚 𝑸ℓ )

Proof:
−1 (𝜆)
𝑿(𝜆𝑰 − 𝑻)−1 𝒀 = 𝑿𝑭−1 (𝜆) (𝑨 𝟎) 𝑬(𝜆)𝒀
𝟎 𝑰
𝑨−1 (𝜆)
= 𝑿𝑭−1 (𝜆) ( ) = 𝑨−1 (𝜆) 
𝟎
Computation of Spectral Factor by Numerical Methods: In this chapter, we are going
to present some existing algorithms that can factorize a linear term from a given matrix
polynomial. We will see later that the Q.D. algorithm can be viewed as a generalization of
these methods.

Bernoulli's Method: In this section, we are going to present a global algorithm (the
Bernoulli's iteration), that is based on the solution of the difference equation (i.e. the
exponential form). Just as in the scalar case, the matrix Bernoulli's method is based on
the "ratio" of two successive iterates of the difference equation.

In the numerical computations of solvents there are two basic approaches named: local
and global. Therefore, we have first to define what a global method is and what a local
method is.
Definition: A numerical method for solving a given problem is said to be local if it is
based on local (simpler) model of the problem around the solution.

Global methods are defined by opposition to local ones.

From the definition, we can see that in order to use a local method, one has to provide
an initial approximation of the solution. And in general, local methods are fast
converging while global ones are quite slow.

The convergence of the global methods that will be presented in this chapter is based on
the following relation of order (partial) between square matrices.

Definition: A square matrix 𝑨 is said to dominate a square matrix 𝑩 (not necessarily of


the same size) if all the eigenvalues of 𝑨 are greater, in modulus, than those of 𝑩. As a
notation, we will write 𝑨 > 𝑩. A dominant solvent is a solvent matrix whose eigenvalues
strictly dominate the eigenvalues of all other solvents.

This definition is important because of the following lemma.

Lemma: Let 𝑨 and 𝑩 be square matrices such that 𝑨 > 𝑩 then 𝑨 is nonsingular and

lim ‖𝑩𝑛 ‖‖𝑨−𝑛 ‖ = 0


𝑛→∞

Proof: The only specific property that we require is the consistency: ‖𝑨𝑩‖ ≤ ‖𝑨‖‖𝑩‖.

Remark: we can extend definition of dominance to matrix polynomials by saying that the
matrix polynomial 𝑨1 (𝜆) dominates 𝑨2 (𝜆) if 𝑻1 > 𝑻2 , 𝑻𝑘 being a linearization of 𝑨𝑘 (𝜆).

Theorem: Let 𝑨(𝜆) be a monic matrix polynomial of degree ℓ and order 𝑚. Assume that
𝑨(𝜆) has a dominant right solvent 𝑹 and a dominant left solvent 𝑳. Let 𝑼𝑘 , 𝑘 = 0,1, ... be
the solution of ∑ℓ𝑖=0 𝑨𝑖 𝑼𝑘−𝑖 = 𝟎 subject to the initial conditions 𝑼𝑖 = 𝟎 𝑖 = 0,1, … , ℓ − 2 and
𝑼ℓ−1 = 𝑰 . Then 𝑼𝑘 is not singular for 𝑘 large enough, and:

lim 𝑼𝑘+1 𝑼−1


𝑘 =𝑹 𝑎𝑛𝑑 lim 𝑼−1
𝑘 𝑼𝑘+1 = 𝑳
𝑘→∞ 𝑘→∞

Proof: Let we consider the following similarity transformations 𝑹 = 𝑿1 𝑱1 𝑿1−1 & 𝑳 = 𝒀1 𝑱1 𝒀1−1
where 𝑿1 is a matrix whose columns are right eigenvectors of 𝑹, & 𝒀1 is a matrix whose
rows are left eigenvectors of 𝑳, & 𝑱1 is the matrix Jordan form corresponding to the right
solvent 𝑹 (also to left solvent 𝑳).

We know that 𝑨(𝜆) = 𝑸1 (𝜆)(𝜆𝑰 − 𝑹) = (𝜆𝑰 − 𝑳)𝑸2 (𝜆) this implies that 𝑨(𝜆) can be factorized
as 𝑨(𝜆) = 𝑨1 (𝜆)𝑨2 (𝜆). The existence of a dominant right solvent and a dominant left
solvent implies that 𝑨(𝜆) has the following Jordan triple:

𝑱1 𝟎 𝒀
𝑿 = [𝑿1 𝑿2 ], 𝑱=( ) and 𝒀 = ( 1 )
𝟎 𝑱2 𝒀2

where 𝑿1 , 𝑱1 and 𝒀1 are 𝑚 × 𝑚 matrices. 𝑿1 and 𝒀1 are nonsingular with 𝑹 = 𝑿1 𝑱1 𝑿1−1 and
𝑳 = 𝒀1 𝑱1 𝒀1−1. Knowing that 𝑼𝑘 = 𝑿 𝑱𝑘 𝒀 = 𝑿1 𝑱1𝑘 𝒀1 + 𝑿2 𝑱𝑘2 𝒀2 = (𝑿1 𝑱1𝑘 𝑿1−1 + (𝑿2 𝑱𝑘2 𝒀2 )𝒀1−1 𝑿1−1 )𝑿1 𝒀1

let 𝑴 = 𝑿1 𝒀1 and 𝑬𝑘 = 𝑿2 𝑱𝑘2 𝒀2 , then 𝑼𝑘 = ( 𝑹𝑘 + 𝑬𝑘 𝑴−1 )𝑴 = (𝑰 + 𝑬𝑘 𝑴−1 𝑹−𝑘 ) 𝑹𝑘 𝑴


The same factorization can be done from the left, in which case we obtain:

𝑼𝑘 = 𝑴(𝑳𝑘 + 𝑴−1 𝑬𝑘 ) = 𝑴𝑳𝑘 (𝑰 + 𝑳−𝑘 𝑴−1 𝑬𝑘 )

Assuming that 𝑹 and 𝑳 are nonsingular. For 𝑘 large enough, we have:

‖𝑬𝑘 𝑴−1 𝑹−𝑘 ‖ ≤ ‖𝑬𝑘 ‖‖ 𝑴−1 ‖‖ 𝑹−𝑘 ‖

and the right hand-side of the above inequality converges to zero because 𝑹𝑘 = 𝑿1 𝑱1𝑘 𝑿1−1
dominates 𝑬𝑘 = 𝑿2 𝑱𝑘2 𝒀2 , this implies that

lim 𝑼𝑘 = lim (𝑰 + 𝑬𝑘 𝑴−1 𝑹−𝑘 ) 𝑹𝑘 𝑴 = lim 𝑹𝑘 𝑴


𝑘→∞ 𝑘→∞ 𝑘→∞

so we obtain lim𝑘→∞ 𝑼𝑘+1 𝑼−1


𝑘 = lim𝑘→∞ 𝑹
𝑘+1
𝑴 𝑴−1 𝑹−𝑘 = 𝑹.

As for the left solvent we have lim𝑘→∞ 𝑼𝑘 = lim𝑘→∞ 𝑴𝑳𝑘 (𝑰 + 𝑳−𝑘 𝑴−1 𝑬𝑘 ) = lim𝑘→∞ 𝑴𝑳𝑘 so we
obtain lim𝑘→∞ 𝑼−1 −𝑘 −1
𝑘 𝑼𝑘+1 = lim𝑘→∞ 𝑳 𝑴 𝑴𝑳𝑘+1 = 𝑳. ■

Alternative proof: we have seen that 𝑼𝑘 = ∑ℓ𝑗=1 𝑹𝑗𝑘 𝒀𝑗 = 𝑹1𝑘 𝒀1 + ∑ℓ𝑗=2 𝑹𝑗𝑘 𝒀𝑗 , where 𝑹1 is a
dominant right solvent 𝑼𝑘 = (𝑰 + ∑ℓ𝑗=2 𝑹𝑗𝑘 𝒀𝑗 𝒀1−1 𝑹1−𝑘 )𝑹1𝑘 𝒀1 = (𝑰 + 𝑯𝑘 )𝑹1𝑘 𝒀1, and lim𝑘→∞ ‖𝑯𝑘 ‖
converge toward zero. Thus, for large enough 𝑘, 𝑼𝑘 is nonsingular and we can write:

lim 𝑼𝑘+1 𝑼−1 𝑘+1 −1 −𝑘


𝑘 = lim (𝑰 + 𝑯𝑘+1 )𝑹1 𝒀1 𝒀1 𝑹1 (𝑰 + 𝑯𝑘 )
−1
𝑘→∞ 𝑘→∞
= lim 𝑹1𝑘+1 𝒀1 𝒀1−1 𝑹1−𝑘 = 𝑹1 ■
𝑘→∞

Remark: There are some general remarks that we can make about the conditions for
convergence of the Bernoulli method. In the case of some nonsingular 𝑼𝑘 the Bernoulli
method will breakdown.

Bernoulli with Deflation Technique: The Bernoulli iteration as written here will only
converge to a single solvent of the matrix polynomial 𝑨1 (𝜆) = 𝑨(𝜆). There are several
approaches to modifying the Bernoulli method to find the 𝑘 dominant solvent of 𝑨(𝜆).
One technique, deflation, is reasonably straight forward: once the solvent 𝑹1 is
computed, a factorization is applied to the matrix polynomial 𝑨(𝜆) = 𝑨2 (𝜆)(𝜆𝑰 − 𝑹1 ) to
extract 𝑹1 from the spectrum, so that the second largest eigenvalue 𝑹2 becomes the
dominant solvent of the matrix polynomial 𝑨2 (𝜆). This process is repeated until the 𝑘
dominant eigenvalues have been found.

Lemma: If 𝑹 is an operator root of the matrix polynomial 𝑨(𝜆), then 𝑨(𝜆) = 𝑴(𝜆)(𝜆𝑰 − 𝑹)
𝑗
where 𝑴(𝜆) = ∑ℓ−1
𝑘=0 𝑴𝑘 𝜆
ℓ−𝑘−1
and 𝑴𝑗 = ∑𝑖=0 𝑨𝑖 𝑹𝑗−𝑖 , (𝑗 = 0,1, … , ℓ − 1)

Proof: Let we consider 𝑨(𝜆) = 𝑴(𝜆)(𝜆𝑰 − 𝑹) = (𝑴0 𝜆ℓ−1 + 𝑴1 𝜆ℓ−2 … + 𝑴ℓ−2 𝜆 + 𝑴ℓ−1 )(𝜆𝑰 − 𝑹)
Direct multiplication and comparision gives

𝑴0 = 𝑰
𝑴1 = 𝑴0 𝑨1 + 𝑴0 𝑹
𝑴2 = 𝑴0 𝑨2 + 𝑴1 𝑹

𝑴𝑘 = 𝑴0 𝑨𝑘 + 𝑴𝑘−1 𝑹 𝑘 = 1,2, … , ℓ − 1

𝟎 = 𝑴0 𝑨ℓ + 𝑴ℓ−1 𝑹
Back-substitution and evaluation give as 𝑴𝑘 = ∑𝑘𝑖=0 𝑨𝑖 𝑹𝑘−𝑖 , (𝑘 = 0,1, … , ℓ − 1) ■

Example: Let us now consider the generalized Bernoulli’s method. The considered
matrix polynomial is: 𝑨(𝜆) = 𝑨0 𝜆2 + 𝑨1 𝜆 + 𝑨2

Algorithm: (Matrix Bernoulli’s Method)

1 𝐴0 = eye(2,2);
2 𝐴1 = (1/3) [−13 − 4; 4 − 23];
3 𝐴2 = [4 4; −4 14];
4 % initialization
5 𝑈1 = zeros(2,2);
6 𝑈2 = eye(2,2);
7 𝐅𝐨𝐫 𝑘 = 1: 40
8 𝑈0 = −(𝐴1 ⋆ 𝑈1 + 𝐴2 ⋆ 𝑈2 ); % 𝑈(𝑘) = −(𝐴1 ⋆ 𝑈(𝑘 − 1) + 𝐴2 ⋆ 𝑈(𝑘 − 2))𝑏
9 % let 𝑈0 = 𝑈(𝑘), 𝑈1 = 𝑈(𝑘 − 1), 𝑈2 = 𝑈(𝑘 − 2)
10 𝑈2 = 𝑈1 ; % after one iteration(𝑘 = 𝑘 + 1)we have 𝑈2 = 𝑈1 ; 𝑈1 = 𝑈0 ;
11 𝑈1 = 𝑈0 ;
12 𝑞1 = 𝑈1 ⋆ inv(𝑈2 );
13
𝐄𝐧𝐝

clear all, clc, Z=zeros(2,2); I=eye(2,2);

R1=[0 1;-3.25 -2]; R2=[-11 -30;1 0]; R3=[0 1;-56 -15];


VR=[I I I ;R1 R2 R3;R1^2 R2^2 R3^2]; % Vadermond Matrix
Di=-[R1^3 R2^3 R3^3]*inv(VR);
%---------------------------------------%
% Bernoulli's matrix method
A1=Di(:,5:6); A2=Di(:,3:4); A3=Di(:,1:2);
U1=zeros(2,2); U2=zeros(2,2); U3=eye(2,2);

for k=1:100
U0=-(A1*U1+A2*U2+A3*U3); % U(k)=-(A1*U(k-1)+A2*U(k-2)+A3*U(k-3))
U3=U2; % let U0=U(k), U1=U(k-1), U2=U(k-2) and U3=U(k-3)
U2=U1; % after one iteration(k=k+1) we have U3=U2; U2=U1; U1=U0
U1=U0;
end
Q1=U1*inv(U2) % 1st spectral factor and also Q1 is a solvent
%---------------------------------------%
M0=I; M1=A1+Q1; M2=A2+M1*Q1;
U1=zeros(2,2); U2=eye(2,2);
for k=1:100
U0=-(M1*U1 + M2*U2);
U2=U1; U1=U0;
end
Q2=U1*inv(U2) % 2nd spectral factor
Block Power Method: In this section, we will give a method for finding the solvents and
spectral factors of 𝜆-matrix by extending the well-known (scalar) power method to the
block power method for non-symmetric matrices. The block power method will be used
to compute the block eigenvectors and associated block eigenvalues of a matrix. The
block eigenvalue is also the solvent of the characteristic matrix polynomial of the matrix.

The Bernoulli's iteration can be written as:

𝑼𝑘+1 𝑶 𝑰 ⋯ 𝑶 𝑶 𝑼𝑘
𝑼𝑘+2 𝑶 𝑶 ⋱ ⋮ ⋮ 𝑼 𝟎
𝑘+1
⋮ ⋮ ⋮ ⋮ 𝟎
= ⋮ ⋱ 𝑰 ⟺ 𝑿𝑅 (𝑘 + 1) = 𝑨𝑐 𝑿𝑅 (𝑘) & 𝑿𝑅0 = ( )
𝑼𝑘+ℓ−1 𝑶 𝑶 𝑶 𝑰 𝑼𝑘+ℓ−2 ⋮
−𝑨 −𝑨ℓ−1 −𝑨2 −𝑨1 ) (𝑼𝑘+ℓ−1 ) 𝑰
( 𝑼𝑘+ℓ ) ( ℓ ⋯

Another block state equation can be obtained by taking the block transpose of this
equation and its representation is

𝑽𝑘+1 𝐵𝑇 𝑽𝑘 𝐵𝑇
𝑶 𝑶 ⋯ 𝑶 −𝑨ℓ
𝑽𝑘+2 𝑽𝑘+1 𝑰 𝑶 ⋱ ⋮ −𝑨ℓ−1 𝟎 𝑇
⋮ ⋮ 𝑰 ⋮ 𝟎
= ⋮ ⋮ ⟺ 𝑿𝐿 (𝑘 + 1) = 𝑨𝑐 𝑿𝐿 (𝑘) & 𝑿𝐿0 = ( )
𝑽𝑘+ℓ−1 𝑽𝑘+ℓ−2 𝑶 ⋮ 𝑶 −𝑨2 ⋮

… 𝑰
( 𝑽𝑘+ℓ ) (𝑽𝑘+ℓ−1 ) (𝑶 𝑶 𝑰 −𝑨1 )

The solution of these homogenous equations is given by 𝑿𝑅 (𝑘) = 𝑨𝑘𝑐 𝑿𝑅0 and 𝑿𝐿 (𝑘) = 𝑿𝐿0 𝑨𝑘𝑐 .
Let we define 𝑿𝑅1 (𝑘) = 𝑼𝑘 , 𝑿𝑅2 (𝑘) = 𝑼𝑘+1 and 𝑹1 ∈ ℂ𝑚×𝑚 be the largest right solvent of the
complete set of distinct right solvents, and assume that 𝑹𝑖 > 𝑹𝑖+1 𝑖 = 1,2, … , ℓ − 1; then,
the largest right block eigenvector becomes
𝑇 𝑇
ℓ−1
lim 𝑿𝑅 (𝑘) 𝑿−1 𝑇 2 𝑇
𝑅1 (𝑘) = [𝑰, 𝑹1 , (𝑹1 ) , … , (𝑹1 ) ]
𝑘→∞

Also, the desired largest right solvent is 𝑹1 = lim𝑘→∞ 𝑿𝑅2 (𝑘) 𝑿−1
𝑅1 (𝑘).

If we define 𝑿𝐿1 (𝑘) = 𝑽𝑘 , 𝑿𝐿2 (𝑘) = 𝑽𝑘+1 and 𝑳1 ∈ ℂ𝑚×𝑚 be the largest left solvent of the
complete set of distinct left solvents, and assume that 𝑳𝑖 > 𝑳𝑖+1 𝑖 = 1,2, … , ℓ − 1; then, the
largest left block eigenvector becomes
ℓ−1
lim 𝑿−1 2
𝐿1 (𝑘) 𝑿𝐿 (𝑘) = [𝑰, 𝑳1 , 𝑳1 , … , 𝑳1 ]
𝑘→∞

Also, the desired largest left solvent is 𝑳1 = lim𝑘→∞ 𝑿−1


𝐿1 (𝑘)𝑿𝐿2 (𝑘)

clear all, clc, Z=zeros(2,2); I=eye(2,2); m=2;


A0=I; A1=[6 1.4142;1.4142 6];
A2=[12 4.65685;6.65685 12]; A3=[8 3.65685;7.65685 8];
A=[Z I Z;Z Z I;-A3 -A2 -A1]; N=100;
XR(:,:,1)=[Z;Z;I]; % initial conditions to start the program
for k=1:N
XR(:,:,k+1)= A*XR(:,:,k); % The solution of the DE: XR(k+1)=A*XR(k)
end
R1=XR(m+1:2*m,:, N)*inv(XR(1:m,:,N)) % R1 is a right solvent
ZERO1=A0*(R1)^3 + A1*(R1)^2 + A2*(R1) + A3
clear all, clc, Z=zeros(2,2); I=eye(2,2); m=2;
A0=I; A1=[6 1.4142;1.4142 6];
A2=[12 4.65685;6.65685 12]; A3=[8 3.65685;7.65685 8];
A=[Z Z -A3;I Z -A2;Z I -A1]; N=100;
XL(:,:,1)=[Z Z I]; % initial conditions to start the program
for k=1:N
XL(:,:,k+1)=XL(:,:,k)*A; % The solution of the DE: XL(k+1)=XL(k)*A
end
L1= inv(XL(:,1:m,N))*XL(:,m+1:2*m,N) % R1 is a right solvent
ZERO1=(L1)^3*A0 + (L1)^2 *A1+ (L1) *A2 + A3

Matrix Newton's Method: The matrix 𝑨𝑅 (𝑿) is the right evaluation of 𝑨(𝜆) at 𝑿, and
𝑨(𝜆) is a nonlinear operator that maps the space of square 𝑚 × 𝑚 matrices onto itself.
Since the space of complex 𝑚 × 𝑚 square matrices is a Banach space under any matrix
norm, we can use powerful results from functional analysis. This space is also a finite
dimensional space and as such the equation 𝑨𝑅 (𝑿) = 𝟎 is a set of 𝑚2 nonlinear equations
with 𝑚2 unknowns. In 1983 Dennis J. E. provides the general theory for solving this type
of problems using Newton and secant methods. Here in this section we present an
algorithm, which corresponds to the general Newton-Kantorovich method. In the case of
simple solvents, i.e. where the derivative is regular, this method converges quadratically
in a neighbourhood of any solvent.

Let 𝑨𝑅 (𝑿) = ∑ℓ𝑖=0 𝑨𝑖 𝑿ℓ−𝑖 with 𝑨𝑖 , 𝑿 ∈ ℂ𝑚×𝑚 (𝑖 = 1, … , ℓ) be a matrix polynomial. We present


a Newton method to solve the equation 𝑨𝑅 (𝑿) = 𝑩, and we prove that the algorithm
converges quadratically near simple solvents. We need the inverse of the Fréchet-
derivative 𝑨′𝑅 (𝑿) of 𝑨𝑅 (𝑿). This leads to linear equations for the corrections 𝑯.

Before introducing the algorithm of matrix Newton's method we review some basic
elements of algebra and functional analysis such as Kronecker product and gradient of
matrix polynomial (Fréchet-derivative).

Definition: Given two matrices 𝑨 ∈ 𝔽𝑚×𝑚 and 𝑩 ∈ 𝔽𝑛×𝑛 . The Kronecker product of 𝑨 and 𝑩
is denoted ⨂ and defined to be a matrix such that 𝑾 = 𝑨⨂𝑩 such that

𝑎11 𝑩 𝑎12 𝑩 ⋯ 𝑎1𝑚 𝑩


𝑎21 𝑩 𝑎22 𝑩 ⋯ 𝑎2𝑚 𝑩
𝑾 = 𝑨⨂𝑩 = ( ⋱ ) ∈ 𝔽𝑚𝑛×𝑚𝑛
⋮ ⋮ ⋮
𝑎𝑚1 𝑩 𝑎𝑚2 𝑩 … 𝑎𝑚𝑚 𝑩

Definition: Given a matrix 𝑨 ∈ 𝔽𝑚×𝑛 whose columns denoted by 𝑨 = [𝑨1 𝑨2 … 𝑨𝑛 ] the


𝑇
vector-valued function vec(. ) is defined by vec(𝑨) = col(𝑨𝑖 )𝑛𝑖=1 = [𝑨1 𝑇 𝑨2 𝑇 … 𝑨𝑛 𝑇 ]

Observe that vec(. ) is linear operator: for any two scalars and two matrices 𝑨, 𝑩 ∈ 𝔽𝑚×𝑛
and 𝛼, 𝛽 ∈ 𝔽 then vec(𝑨 + 𝑩) = vec(𝑨) + vec(𝑩) and vec(𝛼𝑨) = 𝛼 vec(𝑨) or they can be
summarized as vec(𝛼𝑨 + 𝛽𝑩) = 𝛼 vec(𝑨) + 𝛽vec(𝑩).

The following proposition shows the close relationship between the vec function and
Kronecker product.
Proposition: If 𝑨 ∈ 𝔽𝑚×𝑚 , 𝑩 ∈ 𝔽𝑛×𝑛 and 𝑿 ∈ 𝔽𝑚×𝑛 then vec(𝑨𝑿𝑩) = (𝑩𝑇 ⨂𝑨)vec(𝑿)

Theorem: consider the general matrix equation 𝑾 = 𝑨1 𝑿𝑩1 + 𝑨2 𝑿𝑩2 + ⋯ + 𝑨𝑝 𝑿𝑩𝑝 where
𝑨𝑖 ∈ ℂ𝑚×𝑚 , 𝑩𝑖 ∈ ℂ𝑛×𝑛 , 𝑿 ∈ ℂ𝑚×𝑚 then this equation has the solution if and only if the
matrix 𝑮 = ∑𝑝𝑖=1 𝑩𝑇𝑖 ⨂𝑨𝑖 is nonsingular and the solution is given by

𝑝 −1

vec(𝑿) = (∑ 𝑩𝑇𝑖 ⨂𝑨𝑖 ) vec(𝑾)


𝑖=1

Now we can apply this theory to matrix polynomials, let 𝑹 = 𝑴𝑱𝑴−1 be a solvent of the
matrix polynomial 𝑨(𝜆) and 𝑱 is the Jordan form of 𝑹.

𝑨𝑅 (𝑹) = 𝟎 ⟺ 𝑨0 𝑴𝑱ℓ 𝑴−1 + 𝑨1 𝑴𝑱ℓ−1 𝑴−1 … + 𝑨ℓ−1 𝑴𝑱𝑴−1 + 𝑨ℓ = 𝟎

Multiply both sides by 𝑴 we get: 𝑨0 𝑴𝑱ℓ + 𝑨1 𝑴𝑱ℓ−1 … + 𝑨ℓ−1 𝑴𝑱 + 𝑨ℓ 𝑴 = 𝟎 and by using


Kronecker product we get (∑ℓ𝑖=0(𝑱𝑇 )𝑖 ⨂𝑨ℓ−𝑖 )vec(𝑴) = 𝟎. Under this notation 𝑹 is right
solvent if and only if 𝑮 = (∑ℓ𝑖=0(𝑱𝑇 )𝑖 ⨂𝑨ℓ−𝑖 ) is rank deficient or equivalently
𝑁𝑢𝑙𝑙(∑ℓ𝑖=0(𝑱𝑇 )𝑖 ⨂𝑨ℓ−𝑖 ) is nonempty.

Similar result can be obtained for left solvent 𝑳 = 𝑷−1 𝑱𝑷.

𝑨𝐿 (𝑳) = 𝟎 ⟺ 𝑷−1 𝑱ℓ 𝑷𝑨0 + 𝑷−1 𝑱ℓ−1 𝑷𝑨1 … + 𝑷−1 𝑱𝑷𝑨ℓ−1 + 𝑨ℓ = 𝟎

Multiply both sides by 𝑷 we get: 𝑱ℓ 𝑷𝑨0 + 𝑱ℓ−1 𝑷𝑨1 … + 𝑱𝑷𝑨ℓ−1 + 𝑷𝑨ℓ = 𝟎 and by using
Kronecker product we get (∑ℓ𝑖=0 𝑨𝑇ℓ−𝑖 ⨂𝑱𝑖 )vec(𝑷) = 𝟎. Under this notation 𝑳 is right solvent
if and only if 𝑯 = (∑ℓ𝑖=0 𝑨𝑇ℓ−𝑖 ⨂𝑱𝑖 ) is rank deficient or equivalently 𝑁𝑢𝑙𝑙(∑ℓ𝑖=0 𝑨𝑇ℓ−𝑖 ⨂𝑱𝑖 ) is
nonempty.

The development of Newton's Method

 The Newton’s method can be used to solve equations of the kind (𝑿)=𝟎, where 𝑭:𝓥→𝓥
is a differentiable operator in a Banach space (we are interested only in the case in which
𝓥 is ℂ𝑚×𝑚 ). Taylor expansion give the next 𝑭(𝑿 + ∆𝑿) = 𝑭(𝑿) + 𝑭′ (𝑿)∆𝑿 + 𝒪(∆𝑿) Where
𝑭′ (𝑿) is the Fréchet derivative of 𝑭 at the point 𝑿.
−1
∆𝑿 = (𝑭′ (𝑿)) (𝑭(𝑿 + ∆𝑿) − 𝑭(𝑿))

The Fréchet derivative 𝑭′ (𝑿) is a derivative defined on Banach spaces. Named after
Maurice Fréchet, it is commonly used to generalize the derivative of a real-valued
function of a single real variable to the case of a vector-valued function of multiple real
variables, and to define the functional derivative used widely in the calculus of
variations. The Fréchet derivative a point 𝑿 ∈ ℂ𝑚×𝑚 is a linear mapping
𝑳(𝑿)
ℂ𝑚×𝑚 → ℂ𝑚×𝑚
∆𝑿 ⟶ 𝑳(𝑿, ∆𝑿) = 𝑭′ (𝑿)∆𝑿

such that for all ∆𝑿 ∈ ℂ𝑚×𝑚 : 𝑭(𝑿 + ∆𝑿) − 𝑭(𝑿) − 𝑳(𝑿, ∆𝑿) = 𝒪(‖∆𝑿‖), and it therefore
describes the first order effect on 𝑭 of perturbations in 𝑿.
Remark: (see W. Kratz 1987) The Freshet derivative 𝑭′ (𝑿) can be expressed in terms of
Cauchy's integral formula as follows
1
𝑭′ (𝑿)∆𝑿 = ∮ 𝑭(𝑧) (𝑧𝑰 − 𝑿)−1 ∆𝑿(𝑧𝑰 − 𝑿)−1 𝑑𝑧
2𝜋𝑖 Γ

In the practical computation it is preferable to avoid constructing and inverting


explicitly 𝑭′ (𝑿). Thus, a better way to compute 𝑭′ (𝑿) is to use numerical approximations.

Now assuming that 𝑭(𝑿) = 𝑨𝑹 (𝑿) is a right matrix polynomial:

𝑨𝑹 (𝑿) = 𝑨0 𝑿ℓ + 𝑨1 𝑿ℓ−1 … + 𝑨ℓ−1 𝑿 + 𝑨ℓ

it can be expanded around 𝑿0 ∈ ℂ𝑚×𝑚 as: 𝑨𝑹 (𝑿) = 𝑨𝑹 (𝑿0 ) + ∇𝑨𝑹 (𝑿)(𝑿 − 𝑿0 ) + 𝒪(𝑿 − 𝑿0 )
where the term 𝒪(𝑿 − 𝑿0 ) is a matrix polynomial with high degree terms of (𝑿 − 𝑿0 ), and
∇𝑨𝑹 (𝑿)(𝑿 − 𝑿0 ) is a contracted gradient (Shieh, L.S. et.al 1981) of dimension 𝑚 × 𝑚. The
first degree approximation of 𝑨𝑹 (𝑿) with ‖∆𝑿‖ < 1 becomes 𝑨𝑹 (𝑿) = 𝑨𝑹 (𝑿0 ) + ∇𝑨𝑹 (𝑿)∆𝑿
where ∆𝑿 = 𝑿 − 𝑿0 ; We define a recursive formula 𝑿𝑖+1 = 𝑿𝑖 + ∆𝑿𝑖+1 , so we get

𝑨𝑹 (𝑿𝑖+1 ) = 𝑨𝑹 (𝑿𝑖 ) + ∇𝑨𝑹 (𝑿𝑖 )∆𝑿𝑖+1

If 𝑿𝑖+1 is a right solvent of 𝑨(𝜆), or 𝑨𝑹 (𝑿𝑖+1 ) = 𝟎𝑚 then 𝑨𝑹 (𝑿𝑖 ) + ∇𝑨𝑹 (𝑿𝑖 )∆𝑿𝑖+1 = 𝟎𝑚

How to determine ∇𝑨𝑹 (𝑿𝑖 )? To answer such question let we introduce the notion of
gradient of a matrix polynomial.

Definition: A gradient of a matrix polynomial is defined by

𝜕 𝑨(𝑿)
(∇𝑨(𝑿))𝑖,𝑗,𝑘,𝑙 = { } 𝑖, 𝑗, 𝑘, 𝑙 = 1,2, … , 𝑚
𝜕𝑿𝑘,𝑙 𝑖,𝑗

𝑿𝑘,𝑙 : denotes the (𝑘, 𝑙) elements of 𝑿, and {𝑨(𝑿)}𝑖,𝑗 designates the elements of 𝑨(𝑿), and
(∇𝑨(𝑿))𝑖,𝑗,𝑘,𝑙 denotes the (𝑖, 𝑗, 𝑘, 𝑙) element of ∇𝑨(𝑿).

Direct use of the gradient for the purposes of this work involves the inversion of a fourth-
order tensor, thus causing computational difficulty. To overcome this problem of
difficulties a contraction operation on ∇𝑨(𝑿) with respect to an arbitrary 𝑚 × 𝑚 square
matrix 𝒀 is introduced by (Jeffrey T. Fong 1971). To start our development let we see the
effect of scalar derivative on matrix function (using chain rule 𝑑𝑭(𝑮(𝑡)) = ∇𝑭. 𝑑𝑮(𝑡))
𝑑 𝑑 𝑑
𝑭(𝑿 + 𝑡𝒀) = ∇𝑭(𝑿 + 𝑡𝒀) (𝑿 + 𝑡𝒀) = ∇𝑭(𝑿 + 𝑡𝒀)𝒀 ⟹ ∇𝑭(𝑿)𝒀 = lim 𝑭(𝑿 + 𝑡𝒀)
𝑑𝑡 𝑑𝑡 𝑡→0 𝑑𝑡

Now let the matrix function 𝑭(𝑿) be the right evaluation of 𝑨(𝜆) at 𝑿 that is 𝑭(𝑿) = 𝑨𝑹 (𝑿)
so we can write

𝑑 𝑑
∇𝑨𝑹 (𝑿)𝒀 = lim { 𝑨𝑹 (𝑿 + 𝜂𝒀)} = lim {∑ 𝑨𝑖 (𝑿 + 𝜂𝒀)ℓ−𝒊 }
𝜂→0 𝑑𝜂 𝜂→0 𝑑𝜂
𝑖=0
and each term of the summation in this equation can be computed as
𝑟−1
𝑑
lim 𝑨𝑖 (𝑿 + 𝜂𝒀)𝑟 = 𝑨𝑖 ∑ 𝑿𝑞 𝒀𝑿𝑟−𝑞−1
𝜂→∞ 𝑑𝜂
𝑞=0
Substituting this last formula into ∇𝑨𝑹 (𝑿)𝒀 and rearranging indexes gives
ℓ ℓ−𝑖−1

∇𝑨𝑹 (𝑿)𝒀 = ∑ 𝑨𝑖 ∑ 𝑿𝑞 𝒀𝑿ℓ−𝑖−𝑞−1


𝑖=0 𝑞=0

performing index transformation or letting 𝑘 = 𝑖 + 𝑞 + 1 and 𝑗 = 𝑖 leads to:


∇𝑨𝑹 (𝑿)𝒀 = ∑ 𝑩𝑘𝑹 (𝑿)𝒀𝑿ℓ−𝑘


𝑘=1

where 𝑩𝑘𝑹 (𝑿) is the right matrix polynomial of the following 𝜆-matrix 𝑩𝑘 (𝜆)
𝑘−1 𝑘−1
𝑘−𝑗−1
𝑩𝑘 (𝜆) = ∑ 𝑨𝑗 𝜆 ⟹ 𝑩𝑘𝑹 (𝑿) = ∑ 𝑨𝑗 𝑿𝑘−𝑗−1
𝑗=0 𝑗=0

Remark: In similar manner, for left matrix polynomial 𝑨𝑳 (𝑿): we have the following
∇𝑨𝑳 (𝑿)𝒀 = ∑ℓ𝑘=1 𝑿ℓ−𝑘 𝒀 𝑩𝑘𝑳 (𝑿) where 𝑩𝑘𝑳 (𝑿) is the left matrix polynomial of the 𝜆-matrix as
defined before.

To solve 𝑨𝑹 (𝑿𝑖 ) + ∇𝑨𝑹 (𝑿𝑖 )∆𝑿𝑖+1 = 𝟎𝑚 , we use the contracted gradient developed previously
as follows

∇𝑨𝑹 (𝑿𝑖 )∆𝑿𝑖+1 = ∑ 𝑩𝑘𝑹 (𝑿𝑖 )∆𝑿𝑖+1 𝑿ℓ−𝑘


𝑖
𝑘=1
Therefore

𝑨𝑹 (𝑿𝑖 ) + ∇𝑨𝑹 (𝑿𝑖 )∆𝑿𝑖+1 = 𝟎𝑚 ⟺ ∑ 𝑩𝑘𝑹 (𝑿𝑖 )∆𝑿𝑖+1 𝑿ℓ−𝑘


𝑖 = −𝑨𝑹 (𝑿𝑖 )
𝑘=1

−1
Using Kronecker product theorem we have vec(𝑿𝑖 ) = −(𝑮(𝑿𝑖 )) vec(𝑨𝑹 (𝑿𝑖 )) where

𝑇
𝑮(𝑿𝑖 ) = ∑(𝑿ℓ−𝑘
𝑖 ) ⨂𝑩𝑘𝑹 (𝑿𝑖 )
𝑖=1

In a similar fashion, the recursive formula for solving for left solvents of 𝑨(𝜆) is

−1 𝑇
vec(𝑿𝑖 ) = −(𝑯(𝑿𝑖 )) vec(𝑨𝑳 (𝑿𝑖 )) with 𝑯(𝑿𝑖 ) = ∑(𝑩𝑘𝑹 (𝑿𝑖 )) ⨂𝑿ℓ−𝑘
𝑖
𝑖=1

The convergence criteria is ‖𝑿𝑖+1 ‖ < 𝜀 where 𝜀 is an assigned small positive number. It
should also be noted that the above procedure gives only one solvent at a time, hence
this solvent must be removed from the matrix polynomial 𝑨(𝜆) through long division,
then redo again the above process for the computation of the next solvent. But as it can
be seen, this method depends largely on the initial guess, the reason for which most
researchers use it as a local method. Global methods are available among which we
mention the Bernoulli’s method and the QD-algorithm.
Theorem Let 𝑨(𝜆) = (∑ℓ𝑠=0 𝑨𝑠 𝜆ℓ−𝑠 ) ∈ ℂ𝑚×𝑚 [𝜆] be an arbitrary monic 𝜆-matrix. Suppose that
𝑹 ∈ ℂ𝑚×𝑚 is a simple solvent of 𝑨(𝜆). Then, if the starting matrix 𝑿0 is sufficiently near 𝑹,
the algorithm:
−1
ℓ 𝑘 ℓ
𝑇 𝑘−𝑠−1
vec(𝑿𝑖+1 − 𝑿𝑖 ) = − (∑(𝑿ℓ−𝑘
𝑖 ) ⨂ ∑ 𝑨𝑠 𝑿𝑖 ) vec (∑ 𝑨𝑠 𝑿𝑖 ℓ−𝑠 ) for 𝑖 = 1,2, …
𝑘=1 𝑠=0 𝑠=0

Converges quadratically to 𝑹. More precisely: if ‖𝑿0 − 𝑹‖ = 𝜀0 < 𝜀 and for sufficiently


small 𝜀 and 𝛿, then we have:

(𝑖) lim𝑖→∞ 𝑿𝑖 = 𝑹
(𝑖𝑖) ‖𝑿𝑖 − 𝑹‖ = 𝜀0 𝑞 𝑖 With 𝑞 < 1 and for 𝑖 = 0,1,2, …
(𝑖𝑖𝑖) ‖𝑿𝑖+1 − 𝑹‖ = 𝛿‖𝑿𝑖 − 𝑹‖2 for 𝑖 = 0,1,2, …

Example: Let us now consider the generalized Newton’s algorithm. The considered
matrix polynomial is: 𝑨(𝜆) = 𝑨0 𝜆3 + 𝑨1 𝜆2 + 𝑨2 𝜆 + 𝑨3

Algorithm: (Generalized Newton’s Method)


1 Enter the number of iterations 𝑁
2 𝑿0 ∈ ℝ𝑚×𝑚 Initial guess;
3 For 𝑘 = 1: 𝑁
4 𝑩1 = 𝑨0 ; 𝑩2 = (𝑨0 𝑿𝑘 ) + 𝑨1 ; 𝑩3 = (𝑨0 𝑿𝑘 2 ) + (𝑨1 𝑿𝑘 ) + 𝑨2 ;
5 𝑨𝑅 = 𝑨0 𝑿𝑘 3 + 𝑨1 𝑿𝑘 2 + 𝑨2 𝑿𝑘 + 𝑨3 ;
6 𝐯𝐞𝐜𝑨 = [𝑨𝑅 (: , 1); 𝑨𝑅 (: , 2)];
7 𝑮 = 𝑘𝑟𝑜𝑛((𝑿2 )𝑇 , 𝑩1 ) + 𝑘𝑟𝑜𝑛(𝑿𝑇 , 𝑩2 ) + 𝑘𝑟𝑜𝑛(𝑰𝑇 , 𝑩3 );
8 𝐯𝐞𝐜𝑿 = −(𝑮−1 )𝐯𝐞𝐜𝑨;
9 𝒅𝑿 = [𝐯𝐞𝐜𝑿(1 ∶ 2, ∶), 𝐯𝐞𝐜𝑿(3 ∶ 4, ∶)];
10 𝑿𝑘+1 = 𝑿𝑘 + 𝒅𝑿;
11 End
12 𝑿𝑘

clear all, clc, Z=zeros(2,2); I=eye(2,2);


% R1=[0 1;-3.25 -2]; R2=[-11 -30;1 0]; R3=[0 1;-56 -15];
A0 =[1 0;0 1]; A1 =[11.00 -1.00;6.7196 17.00];
A2 =[30.00 -11.00; 70.911 82.530]; A3 =[0.00 -30.00;182.00 89.839];
% A(λ)=I*λ^3 + A1*λ^2 + A2*λ + A3;
X=0.01*ones(2,2);
for k=1:6
B1=A0; B2=(A0*X)+A1; B3=(A0*X^2)+(A1*X)+A2;
AR=I*(X)^3 + A1*(X)^2 + A2*(X) + A3;
vecA=[AR(:,1);AR(:,2)];
G=kron((X^(2))',B1)+kron((X^(1))',B2)+kron((I)',B3);
VecX=-inv(G)*vecA;
dX=[VecX(1:2,:),VecX(3:4,:)];
X=X+dX;
end
X1=X % first domenant spectral factor
To compute all spectral factors we use long division

% The synthetic long division give


Q0=A0; Q1=A1+X1; Q2=A2+Q1*X1; Q2=-A3*inv(X1); X=-0.01*ones(2,2);
for k=1:6
B1=Q0; B2=(Q0*X)+Q1;
AR=I*(X)^2 + Q1*(X) + Q2;
vecA=[AR(:,1);AR(:,2)];
G=kron((X)',B1)+kron((I)',B2);
VecX=-inv(G)*vecA; dX=[VecX(1:2,:),VecX(3:4,:)];
X=X+dX;
end
X2=X % second dominant spectral factor
X3=-(Q1+X2) % X3 is third dominant spectral factor & is a left solvent

ZERO1=I*(X1)^3 + A1*(X1)^2 + A2*(X1) + A3


ZERO2=I*(X2)^2 + Q1*(X2) + Q2
ZERO3=(X3)^3*I + (X3)^2*A1 + (X3)*A2 + A3

Matrix Broyden’s Method: In this section we introduce the most used secant
approximation to the Jacobian, proposed by C. Broyden. The algorithm, analogous to
Newton's method, but that substitutes this approximation for the analytic Jacobian, is
called Broyden's method. We can see that the use of Newton's method means that we
have to evaluate the Jacobian (𝐱) and the function 𝐟(𝐱), then we have to solve a system of
𝑛2 linear equation at each step. So Newton's method has a quite high computational
cost. Broyden's method is a generalization of the secant method to the multivariable
case. It has only a superlinear convergence rate (J. E. Dennis et al.(1983)). However, it is
much less expensive in computations for each step.

In Newton's method we know that 𝑱(𝐱 𝑘 )𝛥𝐱 𝑘 = 𝐟(𝐱 𝑘+1 ) − 𝐟(𝐱 𝑘 ), and from the other side we
know that (𝛥𝐱 𝑘 )𝑇 𝛥𝐱 𝑘 = ‖𝛥𝐱 𝑘 ‖2 so
𝛥𝐟(𝐱 𝑘 ) − 𝑱(𝐱 𝑘−1 )𝛥𝐱 𝑘
𝑱(𝐱 𝑘 )𝛥𝐱 𝑘 = 𝛥𝐟(𝐱 𝑘 ) ⟺ (𝑱(𝐱 𝑘 ) − 𝑱(𝐱 𝑘−1 ))𝛥𝐱 𝑘 = ( ) (𝛥𝐱 𝑘 )𝑇 𝛥𝐱 𝑘
‖𝛥𝐱 𝑘 ‖2
𝛥𝐟(𝐱 𝑘 ) − 𝑱(𝐱 𝑘−1 )𝛥𝐱 𝑘
⟺ 𝑱(𝐱 𝑘 ) = 𝑱(𝐱 𝑘−1 ) + ( ) (𝛥𝐱 𝑘 )𝑇
‖𝛥𝐱 𝑘 ‖2

The computation of inverse need a lot of memory-space and this can be avoided if we
calculate iteratively the inverse of the matrix 𝑱(𝐱 𝑘 ) at each step. This can be
accomplished by using the Sherman-Morrison-Woodbury formula

𝑨𝑘−1 𝐮𝑘 𝐯𝑘𝑇 𝑨−1


𝑘
(𝑨𝑘 + 𝐮𝑘 𝐯𝑘𝑇 )−1 = 𝑨𝑘−1 − 𝑇 with 1 + 𝐯𝑘𝑇 𝑨𝑘 𝐮𝑘 ≠ 0
1 + 𝐯𝑘 𝑨𝑘 𝐮𝑘
−1
Let we define: 𝑨𝑘 = 𝑱(𝐱 𝑘−1 ), 𝐮𝑘 = (𝛥𝐟(𝐱 𝑘 ) − 𝑱(𝐱 𝑘−1 )𝛥𝐱 𝑘 )/‖𝛥𝐱 𝑘 ‖2 , 𝐯𝑘 = 𝛥𝐱 𝑘 & 𝑩𝑘 = (𝑱(𝐱 𝑘 ))
then it can be verified that:

𝛥𝐱 𝑘 − 𝑩𝑘−1 𝛥𝐟(𝐱 𝑘 )
𝑩𝑘 = (𝑰 + { } (𝛥𝐱 𝑘 )𝑇 ) 𝑩𝑘−1
(𝛥𝐱 𝑘 )𝑇 𝑩𝑘−1 𝛥𝐟(𝐱 𝑘 )
Algorithm: (Matrix Broyden’s Method)
Data: 𝐟(𝐱), 𝐱 0 , 𝐟(𝐱 0 ) and 𝑩0
Result: 𝐱 𝑘
begin:
𝐱 𝑘+1 = 𝐱 𝑘 − 𝑩𝑘 𝐟(𝐱 𝑘 ),
𝐬𝑘 − 𝑩𝑘 𝐲𝑘
𝑩𝑘+1 = 𝑩𝑘 + { 𝑇 } 𝐬 𝑇𝑩
𝐬𝑘 𝑩𝑘 𝛥𝐲𝑘 𝑘 𝑘
𝐲𝑘 = 𝐟(𝐱 𝑘+1 ) − 𝐟(𝐱 𝑘 )
𝐬𝑘 = 𝐱 𝑘+1 − 𝐱 𝑘
end

Example: Determine the solution of 𝑭(𝑿) = 𝑿𝑨1 − 𝑨4 𝑿 − 𝑿𝑨2 𝑿 + 𝑨3 = 𝟎 Where: 𝑿 ∈ ℝ2×2


is the variable matrix to be determined and 𝑨𝑖 ∈ ℝ2×2 are constant matrices:

clear all,clc, A1=[1 2;3 2]; A2=[4 3;2 5];A3=[1 3;5 5]; A4=[8 8;6 1];
X0=I-0.1*rand(2,2); B=inv(100*eye(4,4)); % initialization
for k=1:1000
y0=X0*A1 - A4*X0 - X0*A2*X0 + A3; % 𝑭(𝑿𝑘 )
f0=[y0(:,1);y0(:,2)]; % 𝐟𝑘 = vec(𝑭(𝑿𝑘 ))
x0=[X0(:,1); X0(:,2)]; % 𝐱 𝑘 = vec(𝑿𝑘 )
x1=x0-B*f0; % vec(𝑿𝑘+1 ) = vec(𝑿𝑘 ) − (𝑱𝑘 )−1 vec(𝑭(𝑿𝑘 ))
X1=[x1(1:2,:) x1(3:4,:)]; % construction of 𝑿𝑘 from 𝐱 𝑘
y1=X1*A1 - A4*X1 - X1*A2*X1 + A3; % 𝑭(𝑿𝑘 + 1)
f1=[y1(:,1);y1(:,2)]; % 𝐟𝑘+1 = vec(𝑭(𝑿𝑘+1 ))
x1=[X1(:,1); X1(:,2)]; % 𝐱 𝑘+1 = vec(𝑿𝑘+1 )
y=f1-f0; s=x1-x0; % 𝐲𝑘 = 𝐟𝑘+1 − 𝐟𝑘 and 𝐬𝑘 = 𝐱 𝑘+1 − 𝐱 𝑘
B = B +((s-B*y)*(s'*B))/(s'*B*y); % 𝑱−1 (𝐱𝑘 )
X0=X1; % update
end
X1 % solution
ZERO1=X1*A1 - A4*X1 - X1*A2*X1 + A3 % verifications

Example: Determine the solution of 𝑨(𝑿) = 𝑨0 𝑿3 + 𝑨1 𝑿2 − 𝑨2 𝑿 + 𝑨3 = 𝟎

clear all, clc, Z=zeros(2,2);I=eye(2,2); X0=0.1*eye(2,2);M=100*eye(4,4);


R1=[0 1;-3.25 -2]; R2=[-11 -30;1 0]; R3=[0 1;-56 -15];
VR=[I I I;R1 R2 R3;R1^2 R2^2 R3^2]; Di=-[R1^3 R2^3 R3^3]*inv(VR);
A0=I, A1=Di(:,5:6), A2=Di(:,3:4), A3=Di(:,1:2),
for k=1:100
AR0=I*(X0)^3 + A1*(X0)^2 + A2*(X0) + A3;
f0=[AR0(:,1);AR0(:,2)]; x0=[X0(:,1); X0(:,2)];
A=inv(M); x1=x0-A*f0; X1=[x1(1:2,:) x1(3:4,:)];
AR1=I*(X1)^3 + A1*(X1)^2 + A2*(X1) + A3;
f1=[AR1(:,1);AR1(:,2)]; x1=[X1(:,1); X1(:,2)];
y=f1-f0; s=x1-x0; A = A +((s-A*y)*(s'*A))/(s'*A*y); X0=X1;
end
X1, ZERO1=I*(X1)^3 + A1*(X1)^2 + A2*(X1) + A3
To compute all spectral factors we use long division

Q0=A0; Q1=A1+X1; Q2=A2+Q1*X1; % The synthetic long division


X0=0.1*eye(2,2); M=100*eye(4,4);
for k=1:350
AR0=I*(X0)^2 + Q1*(X0) + Q2;
f0=[AR0(:,1);AR0(:,2)];
x0=[X0(:,1); X0(:,2)];
A=inv(M); x1=x0-A*f0;
X1=[x1(1:2,:) x1(3:4,:)];

AR1=I*(X1)^2 + Q1*(X1) + Q2;


f1=[AR1(:,1);AR1(:,2)];
x1=[X1(:,1); X1(:,2)];

y=f1-f0; s=x1-x0;
A = A +((s-A*y)*(s'*A))/(s'*A*y);
X0=X1;
end
X2=X1
ZERO2=I*(X2)^2 + Q1*(X2) + Q2
%------------------------------------------------%
X0=0.1*eye(2,2); M=100*eye(4,4);
for k=1:100
AL0=(X0)^3*I + (X0)^2*A1 + (X0)*A2 + A3;
f0=[AL0(:,1);AL0(:,2)];
x0=[X0(:,1); X0(:,2)];

A=inv(M); x1=x0-A*f0;
X1=[x1(1:2,:) x1(3:4,:)];
AL1=(X1)^3*I + (X1)^2*A1 + (X1)*A2 + A3;
f1=[AL1(:,1);AL1(:,2)];
x1=[X1(:,1); X1(:,2)];

y=f1-f0; s=x1-x0;
A = A +((s-A*y)*(s'*A))/(s'*A*y);
X0=X1;
end
X3=X1
ZERO3=(X3)^3*I + (X3)^2*A1 + (X3)*A2 + A3

Matrix Quotient-Difference Method: The matrix quotient-difference Q.D. algorithm is a


generalization of the scalar one (Heinz Rutishauser 1954 and P. Henrici et.al 1958). The
use of the Q.D. algorithm for such purpose has been suggested firstly by my teacher
(Kamel Hariche 1987) and implemented by my teacher (Abdelhakim Dahimene 1992).
The scalar Q.D. algorithm is just one of the many global methods that are commonly
used for finding the roots of a scalar polynomial.
From the above study we have proved both 𝜆𝑰 − 𝑻𝑞 and 𝜆𝑰 − 𝑨𝑐 are two equivalent
linearization of the monic matrix polynomial 𝑨(𝜆) = (𝜆𝑰 − 𝑸ℓ ) … . (𝜆𝑰 − 𝑸1 ) where

𝟎𝑚 𝑰𝑚 ⋯ 𝟎𝑚 𝟎𝑚 𝑸1 𝑰𝑚 ⋯ 𝟎𝑚 𝟎𝑚
𝟎𝑚 𝟎 ⋱ ⋮ ⋮ 𝟎𝑚 𝑸2 ⋱ ⋮ ⋮
𝑚
𝑨𝑐 = ⋮ ⋮ ⋮ 𝑰𝑚 and 𝑻𝑞 = ⋮ ⋮ ⋮ 𝑰𝑚
𝟎𝑚 ⋱ 𝟎𝑚 ⋱
𝟎𝑚 𝟎𝑚 𝑰𝑚 𝟎𝑚 𝑸ℓ−1 𝑰𝑚
(−𝑨ℓ −𝑨 ℓ−1 ⋯ −𝑨2 −𝑨1 ) (𝟎𝑚 𝟎𝑚 ⋯ 𝟎𝑚 𝑸ℓ )

This means that they are similar 𝑾−1 𝑨𝑐 𝑾 = 𝑻𝑞 . In 1986 Vicente G Hernandez introduced
an algorithm which provide a sequence of similarity transformations that can gradually
transform 𝑨𝑐 to 𝑻𝑞 , here is the algorithm

Algorithm: (Hernandez Transformation)


Data: 𝑨(𝜆) = (𝜆𝑰 − 𝑸ℓ ) … . (𝜆𝑰 − 𝑸1 )
Result: 𝑻𝑞 = 𝑾−1 𝑨𝑐 𝑾
begin:
■ Construct a matrices 𝑫(𝑸𝑘 ) such that
𝑰𝑚 𝟎𝑚 𝟎𝑚 ⋯
… 𝟎𝑚
𝑸𝑘 𝑰𝑚 𝟎𝑚 ⋮ 𝟎𝑚
𝑫(𝑸𝑘 ) = 𝑸𝑘 2 𝑸𝑘 𝑰𝑚 ⋱ 𝟎𝑚 , 𝑘 = 1,2, … ℓ − 1
⋮ ⋮ ⋮ ⋮
ℓ−𝑘 𝑸ℓ−𝑘−1 𝑸ℓ−𝑘−2 ⋮
… 𝑰𝑚 )
(𝑸𝑘 𝑘 𝑘

■ Construct similarity transformation 𝑺 such that 𝑾 = 𝑫1 𝑫2 … 𝑫ℓ−1 with

𝑫1 = 𝑫(𝑸1 ), 𝑫2 = blkdiag(𝑰, 𝑫(𝑸2 )), … 𝑫ℓ−1 = blkdiag(𝑰, … , 𝑰, 𝑫(𝑸ℓ−1 ))

■ Compute 𝑻𝑞 = 𝑾−1 𝑨𝑐 𝑾
end

clear all, clc, I=eye(2,2); Z=zeros(2,2);


Q1=10*rand(2,2), Q2=10*rand(2,2), Q3=10*rand(2,2), Q4=10*rand(2,2),
%------------------------------------------------%

A0=I;
A1=-(Q4 + Q3 + Q2 + Q1);
A2=Q4*Q3 + Q4*Q2 + Q4*Q1 + Q3*Q2 + Q3*Q1 + Q2*Q1;
A3=-(Q4*Q3*Q2 + Q4*Q3*Q1 + Q4*Q2*Q1 + Q3*Q2*Q1);
A4=Q4*Q3*Q2*Q1;
Ac=[Z I Z Z;Z Z I Z;Z Z Z I;-A4 -A3 -A2 -A1];
%------------------------------------------------%

D1=[I Z Z Z;Q1 I Z Z;Q1^2 Q1 I Z;Q1^3 Q1^2 Q1 I];


D2=blkdiag(I,[I Z Z;Q2 I Z;Q2^2 Q2 I]);
D3=blkdiag(I,I,[I Z;Q3 I]);

W=D1*D2*D3;
T=inv(W)*Ac*W
Remark: The Hernandez's algorithm is merely a proof of equivalence between the two
matrices 𝑨𝑐 and 𝑻𝑞 but, is not a practical way to go from 𝑨𝑐 to 𝑻𝑞 .

Before we proceed developing an algorithm that transform directly from 𝑨𝑐 to 𝑻𝑞 , we


present an intermediate phase that paves and help to what we want to reach. Besides
the block bidiagonal form 𝑻𝑞 , there are many other canonical forms which are equivalent
to 𝑨𝑐 and are quit interesting in the proof of Q.D. algorithm. Such as canonical forms: we
look for block tridiagonal form 𝑴1 (Jacobi's form) in 1978 Dr. L.S Shieh demonstrated
the existence of similarity transformation between 𝑨𝑐 and 𝑴1 .

𝑴1,1 𝑰𝑚 𝟎𝑚 ⋯ 𝟎𝑚 𝟎𝑚
𝑴2,1 𝑴2,2 𝑰𝑚 ⋮ 𝟎𝑚 𝟎𝑚
𝑴1 = 𝟎𝑚 𝑴3,2 𝑴3,3 ⋮ ⋮ ⋮
⋮ 𝑴ℓ−1,ℓ−1 𝑰𝑚
⋮ ⋮ ⋮
( 𝟎𝑚 𝟎𝑚 𝟎𝑚 … 𝑴ℓ,ℓ−1 𝑴ℓ,ℓ )

The block tridiagonal, 𝑴1 can be decomposed (under certain conditions) into a product of
two block bidiagonal matrices 𝑳1 and 𝑹1 (this idea was proposed by Heinz Rutishauser
1954 but for scalar QD). The matrix 𝑳1 is a lower block triangular matrix with identity
matrices on the main diagonal and 𝑹1 is an upper block triangular matrix. By using a
"block 𝐿𝑅 algorithm", we obtain a sequence of similar matrices: 𝑴𝑘 = 𝑳𝑘 𝑹𝑘 , 𝑴𝑘+1 = 𝑹𝑘 𝑳𝑘 .

𝑰 𝑚 𝟎𝑚 ⋯ 𝟎𝑚 𝟎𝑚 𝑸1
(𝑘) 𝑰𝑚 ⋯ 𝟎𝑚 𝟎𝑚
(𝑘) 𝑰𝑚 ⋱ ⋮ (𝑘) ⋱ ⋮ ⋮
𝑬1 (𝑘) ⋮
⋮ 𝟎𝑚 𝑸2 ⋮
𝟎𝑚 𝑰𝑚
𝑳𝑘 = ⋮ 𝑬2 ⋱ , 𝑹𝑘 = ⋮ ⋮ ⋱
𝑰𝑚 𝑸ℓ−1 𝑰𝑚
𝟎𝑚 (𝑘)
𝟎𝑚 𝟎𝑚 𝟎𝑚 𝟎𝑚
(𝑘) (𝑘)
⋯ 𝑰𝑚 ( 𝟎𝑚 𝟎𝑚 ⋯ 𝟎𝑚 𝑸ℓ )
( 𝟎𝑚 𝟎𝑚 𝑬ℓ−1 )

By identifying 𝑴𝑘+1 = 𝑹𝑘 𝑳𝑘 and 𝑴𝑘+1 = 𝑳𝑘+1 𝑹𝑘+1 we obtain the following "rhombus rules":
(𝑘+1) (𝑘+1) (𝑘) (𝑘)
𝑸𝑖 + 𝑬𝑖−1 = 𝑸𝑖 + 𝑬𝑖
(𝑘+1) (𝑘+1) (𝑘) (𝑘)
𝑬𝑖 𝑸𝑖 = 𝑸𝑖+1 𝑬𝑖
(𝑘) (𝑘)
𝑬0 = 𝑬ℓ = 0
𝑖 = 1,2, … ℓ − 1 ; 𝑘 = 1,2, …
(𝑘)
It is clear from the expression of 𝑳𝑘 that the matrices 𝑬𝑖 converge to zero, so that 𝑳𝑘
converge to the identity matrix and 𝑹𝑘 converge to 𝑻𝑞 then the block companion matrix
𝑨𝑐 , will be similar to 𝑻𝑞 .

Remark: The starting point for the matrix QD algorithm is explained well in the book
Fundamentals of Scientific Computations and Numerical Linear Algebra by BEKHITI 2020.
(0) (0)
𝑸1 = −𝑨1 𝑨−1
0 , and 𝑸𝑖 =0 𝑖 = 2,3, …
(0) (0) −1 (0)
𝑬1 = 𝑨2 𝑨1−1 , 𝑬2 = 𝑨3 𝑨−1
2 , … , 𝑬ℓ−1 = 𝑨ℓ 𝑨ℓ−1

Consider the initial elements thus generated as the first two block rows of a QD scheme,
and generate further rows by means of progressive generation, using the side conditions
(𝑘) (𝑘)
𝑬0 = 𝑬ℓ = 𝟎
Comment: For the Q.D. algorithm, we have made the implicit assumption that an LR
factorization exists at each step. If such factorization cannot be made, it will lead to a
breakdown of the algorithm.

Remark: The results obtained during this research work generated many questions and
problems whose solutions are to be explored: Think about Block Lanczos
tridiagonalization process based on block Krylov space. Also one can try to extend the
explicit LR iteration to its block form.

Example: Consider a matrix polynomial of 2nd order and 3rd degree with the following
matrix coefficients: 𝑨(𝜆) = 𝑨0 𝜆3 + 𝑨1 𝜆2 + 𝑨2 𝜆 + 𝑨3 . We apply now the generalized row
generation Q.D. algorithm to find spectral factors.

Algorithm (Generalized Quotient-Difference Method)

Data: 𝑨(𝜆) = 𝑨0 𝜆3 + 𝑨1 𝜆2 + 𝑨2 𝜆 + 𝑨3
Result: 𝑺1 , 𝑺2 and 𝑺3 spectral factors

1 % initialization
2 Enter the degree and the order 𝑚 = 2, ℓ = 3
3 Enter the number of iterations 𝑁 = 35
4 Enter the matrix polynomial coefficients 𝑨𝑖
5
6 𝑸1 = [−𝑨1 𝑨−1 0 𝑶2 𝑶2 ]; 𝐸1 = [𝑶2 𝑨2 𝑨1−1 𝑨3 𝑨−1 2 𝑶2 ] ;
7 For 𝑛 = 1: 𝑁
8 𝑬2 = [ ]; 𝑸2 = [ ];
9 For 𝑘 = 1: 2: 𝑚 ⋆ ℓ
10 𝒒2 = (𝑬1 (: , k + 2: k + 3) − 𝑬1 (: , k: k + 1)) + 𝑸1 (: , k: k + 1);
11 𝑸2 = [𝑸2 , 𝒒2 ];
12 End
13 𝑸2 ;
14 For 𝑘 = 1: 2: 𝑚 ⋆ ℓ − 2
−1
15 𝒆2 = (𝑸2 (: , k + 2: k + 3)) ⋆ (𝑬1 (: , k: k + 1)) ⋆ (𝑸2 (: , k: k + 1)) ;
16 𝑬2 = [𝑬2 , 𝒆2 ];
17 End
18 𝑬2 = [𝑶2 , 𝑬2 , 𝑶2 ];
19 𝑸1 = 𝑸2 ;
20 𝑬1 = 𝑬2 ;
21 End
22 𝑸1 ;
23 𝑺1 = 𝑸1 (: ,1: 2) 𝑺2 = 𝑸1 (: ,3: 4) 𝑺3 = 𝑸1 (: ,5: 6)
clear all, clc, format('short','e'), Z=zeros(2,2); I=eye(2,2);
% A0=[1 0;0 1];
% A1=[-11.79104, 0.82090;1.91045, -9.20896];
% A2 =[42.343, -10.164;-13.433,25.642];
% A3 =[-50.358, 21.881;19.582, -22.806];
% A(λ)= A0*λ^3 + A1*λ^2 + A2λ + A3;
A0=eye(2,2); A1 =[-27.152538 .8166050;-179.782629 38.152538];
A2 =[116.387033 84.978971;1043.444653 836.739866];
A3 =[126.928789 335.502350;1038.682417 2947.561338];
%-----------------------------------------%
Q1=[-A1*inv(A0) , Z , Z]; E1=[Z , A2*inv(A1) , A3*inv(A2) , Z];
%-----------------------------------------%
for n=1:35
E2=[]; Q2=[];
for k=1:2:6
q2=(E1(:,k+2:k+3)-E1(:,k:k+1))+(Q1(:,k:k+1));
Q2=[Q2,q2];
end
Q2;
for k=1:2:4
e2=(Q2(:,k+2:k+3))*E1(:,k+2:k+3)*inv(Q2(:,(k:k+1)));
E2=[E2,e2];
end
E2=[Z,E2,Z];
Q1=Q2;
E1=E2;
end
E1;
q1=Q1(:,1:2), q2=Q1(:,3:4), q3=Q1(:,5:6)
ZERO1=I*(q1)^3 + A1*(q1)^2 + A2*(q1) + A3 % Most right solvent
ZERO3=(q3)^3*I + (q3)^2*A1 + (q3)*A2 + A3 % Most left solvent

Remark: It is very interesting to observe that: becuase matrix polynomial possess left
and right evaluation so, there exist two Q.D. algorithms; one that factorizes the matrix
polynomial from the right and one that factorizes it from the left. So, we have provided
two different subroutines: QDRF and QDLF for the right and left factorization
respectively. Those two subroutines are straightforward applications of the formulas

𝐑𝐢𝐠𝐡𝐭 𝐐. 𝐃. 𝐚𝐥𝐠𝐨𝐫𝐢𝐭𝐡𝐦 𝐋𝐞𝐟𝐭 𝐐. 𝐃. 𝐚𝐥𝐠𝐨𝐫𝐢𝐭𝐡𝐦


(𝑘+1) (𝑘+1) (𝑘) (𝑘) (𝑘+1) (𝑘+1) (𝑘) (𝑘)
𝑸𝑖 + 𝑬𝑖−1 = 𝑸𝑖 + 𝑬𝑖 𝑸𝑖 + 𝑬𝑖−1 = 𝑸𝑖 + 𝑬𝑖
(𝑘+1) (𝑘) (𝑘) (𝑘+1) −1 (𝑘+1) (𝑘+1) −1 (𝑘) (𝑘)
𝑬𝑖 = 𝑸𝑖+1 𝑬𝑖 (𝑸𝑖 ) 𝑬𝑖 = (𝑸𝑖 ) 𝑬𝑖 𝑸𝑖+1
(𝑘) (𝑘) (𝑘) (𝑘)
𝑬0 = 𝑬ℓ = 0 𝑬0 = 𝑬ℓ = 0
𝑖 = 1,2, … ℓ − 1 ; 𝑘 = 1,2, … 𝑖 = 1,2, … ℓ − 1 ; 𝑘 = 1,2, …
CHAPTER V:
Transformations of Solvents and
Spectral Factors of 𝜆-Matrices
Transformations of Solvents and Spectral
Factors of 𝜆-Matrices
Introduction: The relationships between solvents and spectral factors of a high-degree
matrix polynomial are introduced. Various transformations which convert right (left)
solvents into spectral factors and vice-versa also are given here in this chapter. The
transformation of right (left) solvent to left (right) solvent is also established.

Since the eigenvalues of a complete set of right (left) solvents as well as those of spectral
factors are identical to the latent roots of a 𝜆-matrix, a solvent (right or left) is similar to a
spectral factor if both have the same set of Eigen-values. In this chapter, various
similarity transformations are derived to convert the right (left) solvents to the spectral
factors of a 𝜆-matrix without using latent roots and latent vectors of the 𝜆-matrix. Also if
a complete set of spectral factors of 𝜆-matrix is given or can be determined (Dennis
et.al.1978, Shieh and Chahin1981) without using latent vectors of the 𝜆-matrix, these
available spectral factors can be transformed to a complete set of right (left) solvents.

Preliminaries We are interested in algorithms for calculating solvents. Since rather the
reader may know little about the mathematical properties of matrix polynomials and
solvents, so we offer some preliminaries here.

Let 𝑨(𝜆) = 𝑨0 𝜆ℓ + 𝑨1 𝜆ℓ−1 +. . . +𝑨ℓ−1 𝜆 + 𝑨ℓ = ∑ℓ𝑖=0 𝑨𝑖 𝜆ℓ−𝑖 be an ℓ𝑡ℎ degree 𝑚𝑡ℎ order matrix
polynomial and let 𝑨𝑅 (𝑿) be the right evaluation of 𝑨(𝜆) at 𝑿 ∈ ℂ𝑚×𝑚 , 𝑨𝐿 (𝑿) be the left
evaluation of 𝑨(𝜆) at 𝑿 ∈ ℂ𝑚×𝑚
𝑨𝑅 (𝑿) = 𝑨0 𝑿ℓ + 𝑨1 𝑿ℓ−1 +. . . +𝑨ℓ−1 𝑿 + 𝑨ℓ
𝑨𝐿 (𝑿) = 𝑿ℓ 𝑨0 + 𝑿ℓ−𝑨1 1 +. . . +𝑿𝑨ℓ−1 + 𝑨ℓ

A matrix 𝑺 is a right solvent of 𝑨𝑅 (𝑿) if 𝑨𝑅 (𝑿) = 𝟎. The terminology right solvent is


explained below. For simplicity we study the right evaluation and some-times refer to
right solvents as solvents. We say a matrix 𝑾 is a weak solvent of 𝑨𝑅 (𝑿) if 𝑨𝑅 (𝑾) is
singular. We will deal primarily with the case where 𝑨𝑅 (𝑿) is monic (𝑨0 = 𝑰).

If the 𝑨𝑖 are scalar matrices, 𝑨𝑖 = 𝛼𝑖 𝑰, then 𝑨𝑅 (𝑿) reduces to



𝑨𝑅 (𝑿) = 𝛼0 𝑿ℓ + 𝛼1 𝑿ℓ−1 +. . . +𝛼ℓ−1 𝑿 + 𝛼ℓ = ∑ 𝛼𝑖 𝑿ℓ−𝑖
𝑖=0

This problem has been thoroughly studied (Gantmacher Chapter 5) and we have such
classical results as the Cayley-Hamilton Theorem and the Lagrange-Sylvester
interpolation theorem.

If 𝑿 is a scalar matrix, 𝑿 = 𝜆𝑰, then 𝑨𝑅 (𝑿) reduces to



𝑨𝑅 (𝑿) = 𝑨𝑅 (𝜆𝑰) = 𝑨0 𝜆ℓ + 𝑨1 𝜆ℓ−1 +. . . +𝑨ℓ−1 𝜆 + 𝑨ℓ = ∑ 𝑨𝑖 𝜆ℓ−𝑖
𝑖=0
Remark: This is called a lambda-matrix and has also been thoroughly studied
(Lancaster). Unfortunately, both ∑ℓ𝑖=0 𝛼𝑖 𝑿ℓ−𝑖 and ∑ℓ𝑖=0 𝑨𝑖 𝜆ℓ−𝑖 are sometimes called matrix
polynomials but we shall reserve this name for ∑ℓ𝑖=0 𝑨𝑖 𝜆ℓ−𝑖 .

A problem closely related to that of finding solvents of a matrix polynomial is finding a


scalar 𝜆 = 𝑝 such that the lambda-matrix 𝑨(𝜆) is singular. Such a scalar is called a latent
root of 𝑨(𝜆) and vectors 𝐯 and 𝐰 are right and left latent vectors, respectively, if for a
latent root p, 𝑨(𝑝)𝐯 = 𝟎 and 𝐰 𝑇 𝑨(𝑝) = 𝟎𝑇 . See Lancaster, Gantmacher, MacDuffee, Peters
and Wilkinson for discussions of latent roots.

A corollary of the generalized Bezout Theorem states that if 𝑺 is a solvent of 𝑨(𝜆), then

𝑨(𝜆) = 𝑸(𝜆)(𝜆𝑰 − 𝑺)

where 𝑸(𝜆) is a lambda-matrix of degree ℓ − 1. It is because of 𝑨(𝑺) = 𝟎 that 𝑺 is called a


right solvent. Here the lambda-matrix 𝑨(𝜆) has 𝑚ℓ latent roots. From 𝑨(𝑺) = 𝟎 the 𝑚
eigenvalues of a solvent 𝑺 are all latent roots of 𝑨(𝜆). The ℓ(𝑚 − 1) latent roots of 𝑸(𝜆)
are also latent roots of 𝑨(𝜆). Thus if one is interested in the solution of a lambda-matrix
problem, then a solvent will provide ℓ latent roots and can be used for matrix deflation,
which yields a new problem 𝑸(𝜆).

Theorem: Let 𝑨(𝑿) = 𝑨0 𝑿ℓ + 𝑨1 𝑿ℓ−1 +. . . +𝑨ℓ−1 𝑿 + 𝑨ℓ & 𝑩(𝑿) = 𝑿𝑝 + 𝑩1 𝑿𝑝−1 +. . . +𝑩𝑝−1 𝑿 + 𝑩𝑝


with ℓ ≥ 𝑝 then there exists a unique, monic matrix polynomial 𝑭(𝑿)of degree ℓ − 𝑝 and a
unique matrix polynomial 𝑳(𝑿) of degree 𝑝 − 1 such that

𝑨(𝑿) = 𝑭(𝑿)𝑿𝑝 + 𝑩1 𝑭(𝑿)𝑿𝑝−1 +. . . +𝑩𝑝−1 𝑭(𝑿)𝑿 + 𝑩𝑝 𝑭(𝑿) + 𝑳(𝑿)

Proof: this is a matrix polynomial division and can be verified by direct evaluation,
where 𝑭(𝑿) is called quotient and is called 𝑳(𝑿) remainder.

Corollary: if 𝑿 and 𝒀 be any scalar complex matrices, 𝑨(𝜆) be an ℓ𝑡ℎ degree matrix
polynomial and 𝑭(𝜆), 𝑯(𝜆), 𝑷(𝜆), & 𝑾(𝜆) are matrix polynomials of degree ℓ − 1 then

𝑨(𝜆) = 𝑭(𝜆)(𝜆𝑰 − 𝑿) + 𝑷(𝜆) = (𝜆𝑰 − 𝒀)𝑯(𝜆) + 𝑾(𝜆)

matrices 𝑿 = 𝑹 and 𝒀 = 𝑳 are right and left solvents of 𝑨(𝜆) if and only if 𝑷(𝑹) = 𝑾(𝑳) = 0

Remark: In before we have seen that any monic matrix polynomial can be factorized into
a product of linear factors (𝜆𝑰 − 𝑸𝑖 ) so that 𝑨(𝜆) = (𝜆𝑰 − 𝑸ℓ )(𝜆𝑰 − 𝑸ℓ−1 ) … (𝜆𝑰 − 𝑸1 ).

The most right spectral factor 𝑸1 is a right solvent and the most left spectral factor 𝑸ℓ is a
left solvent that is 𝑳 = 𝑸ℓ and 𝑹 = 𝑸1 . Moreover, spectral factors are not necessarily a
solvents of 𝑨(𝜆).

Corollary: If 𝑨(𝜆) has ℓ left solvents, 𝑳1 , … , 𝑳ℓ and if


𝑰 𝑳1 𝑳12 𝑳1ℓ−1
𝑰 𝑳2 𝑳22 𝑳ℓ−1
𝑽(𝑳1 , … , 𝑳ℓ ) = 2
⋮ ⋮ ⋮ ⋮
(𝑰 𝑳ℓ 𝑳2ℓ ℓ−1
𝑳ℓ )

is non-singular, then the remainder 𝑾(𝑳) = 0.


Proof: If 𝑨(𝜆) has ℓ left solvents, then 𝑨(𝑳𝑖 ) = (𝜆𝑰 − 𝑳𝑖 )𝑪(𝜆) + 𝑾(𝜆) | = 𝟎 ⟹ 𝑾(𝑳𝑖 ) = 𝟎
𝜆=𝑳𝑖

𝑰 𝑳1 𝑳12 𝑳1ℓ−1 𝑾ℓ−1


𝑰 𝑳2 𝑳22 𝑳2ℓ−1 ⋮
( )=𝟎
⋮ ⋮ ⋮ ⋮ 𝑾 1
( 𝑰 𝑳ℓ 𝑳2ℓ 𝑳ℓ−1 𝑾0
ℓ )

The coefficient of 𝑾(𝜆) are exist only if 𝑽(𝑳1 , … , 𝑳ℓ ) is non-singular. The same thing can
said for right solvents ■

We know that 𝑨(𝜆) = (𝜆𝑰 − 𝑳)𝑯(𝜆) = 𝜆𝑯(𝜆) − 𝑳𝑯(𝜆) if we take the right evaluation of 𝑹 in
this expression we obtain 𝑨(𝑹) = 𝑯(𝑹)𝑹 − 𝑳𝑯(𝑹) = 𝟎 which implies that if 𝑯(𝑹) is
−1
nonsingular then 𝑹 = (𝑯(𝑹)) 𝑳𝑯(𝑹). We can summarize this result into the following
corollary.

Corollary: let 𝑹 and 𝑳 be a right and left solvents of the monic matrix polynomial 𝑨(𝜆),
respectively, and let 𝑨(𝜆) = (𝜆𝑰 − 𝑳)𝑯(𝜆) or equivalently 𝑯(𝑹)𝑹 − 𝑳𝑯(𝑹) = 𝟎. Now if 𝑹 and 𝑳
have no common eigenvalues then 𝑯(𝑹) = 𝟎.

Proof: This follows, since the solution of Sylvester equation 𝑨𝑿 = 𝑿𝑩 has the unique
solution 𝑿 = 𝟎, if and only if 𝑨 and 𝑩 have no common eigenvalues. See the Algebra book
by BEKHITI 2020 pp 199-200.

Remark: If 𝑹 and 𝑳 have common eigenvalues then 𝑯(𝑹) is non-singular and 𝑹 = 𝑯−1 𝑳𝑯.
This gives an association between left and right solvents.

Theorem: let 𝑨(𝜆) be an ℓ𝑡ℎ degree 𝑚𝑡ℎ order monic matrix polynomial and let 𝜆𝑰 − 𝑨𝑐 be
it linearization where 𝑨𝑐 is the companion controller form, then

det(𝑨𝑐 − 𝜆𝑰) = (−1)𝑚ℓ det(𝑰𝜆ℓ + 𝑨1 𝜆ℓ−1 +. . . +𝑨ℓ−1 𝜆 + 𝑨ℓ )

Proof: the proof of this theorem was stated in the chapter before.

Since 𝑨𝑐 is an 𝑚ℓ by 𝑚ℓ matrix, we immediately obtain the following well known result.

Corollary 𝑨(𝜆) has exactly 𝑛 = 𝑚ℓ finite latent roots.

Now we are in stage where we can answer the following question: What is the
relationship between latent-vectors of 𝑨(𝜆) and eigenvectors of 𝑨𝑐 ?

Theorem: If is 𝜆𝑖 latent root of 𝑨(𝜆) and 𝐯𝑖 and 𝐰𝑖 are right and left latent vectors,
respectively, then is an eigenvalue of 𝑨𝑐 and of 𝑨𝑐 𝐵𝑇 and
𝐯𝑖 𝐰𝑖
𝜆𝑖 𝐯𝑖 𝜆𝑖 𝐰𝑖
𝑽𝑐 = 𝜆2𝑖 𝐯𝑖 is the right eigenvector of 𝑨𝑐 and 𝑾𝑐 = 𝜆2𝑖 𝐰𝑖 is left eigenvector of 𝑨𝑐 𝐵𝑇 .
⋮ ⋮
ℓ−1 ℓ−1
(𝜆𝑖 𝐯𝑖 ) (𝜆𝑖 𝐰𝑖 )
𝐵𝑇
The subscript stands for block transpose.
Proof: The proof of this theorem is something straightforward.

Remark: To determine the relationship between solvents and latent-vectors/latent-roots


we look for

[𝑨(𝜆𝑖 )𝐯𝑖1 ⋮ 𝑨(𝜆𝑖 )𝐯𝑖2 ⋮ ⋯ ⋮ 𝑨(𝜆𝑖 ) 𝐯𝑖𝑚 ] = [𝜆𝑖1 𝐯𝑖1 ⋮ 𝜆𝑖2 𝐯𝑖2 ⋮ ⋯ ⋮ 𝜆𝑖𝑚 𝐯𝑖𝑚 ]
𝜆𝑖1 𝟎
= [𝐯𝑖1 𝐯𝑖2 … 𝐯𝑖𝑚 ] ( ⋱ )
𝟎 𝜆𝑖𝑚
we multiply this on the right by [𝐯𝑖1 𝐯𝑖2 … 𝐯𝑖𝑚 ]−1 we get

𝜆𝑖1 𝟎
𝑨(𝜆𝑖 )𝐯𝑖 = 𝜆𝑖 𝐯𝑖 ⟹ 𝑹𝑖 = [𝐯𝑖1 𝐯𝑖2 … 𝐯𝑖𝑚 ] ( ⋱ ) [𝐯𝑖1 𝐯𝑖2 … 𝐯𝑖𝑚 ]−1 = 𝑽𝑅𝑖 𝜦𝑖 𝑽−1
𝑅𝑖
𝟎 𝜆𝑖𝑚

In other hand
𝟎 = 𝑨(𝑹𝑖 ) = 𝑨0 𝑹ℓ𝑖 + 𝑨1 𝑹ℓ−1
𝑖 +. . . +𝑨ℓ−1 𝑹𝑖 + 𝑨ℓ
= 𝑨0 (𝑽𝑅𝑖 𝜦𝑖 𝑽𝑅𝑖 ) + 𝑨1 (𝑽𝑅𝑖 𝜦ℓ−1
ℓ −1
𝑖 𝑽−1 −1
𝑅𝑖 )+. . . +𝑨ℓ−1 (𝑽𝑅𝑖 𝜦𝑖 𝑽𝑅𝑖 ) + 𝑨ℓ = 𝟎

Which is implies that 𝑨0 𝑽𝑅𝑖 𝜦ℓ𝑖 + 𝑨1 𝑽𝑅𝑖 𝜦ℓ−1


𝑖 +. . . +𝑨ℓ−1 𝑽𝑅𝑖 𝜦𝑖 + 𝑨ℓ = 𝟎 ⟹ 𝑨(𝜆𝑖 )𝐯𝑖 = 𝜆𝑖 𝐯𝑖 ■

Now let 𝑽𝑅𝑖 are matrices whose coloumns are latent vectors and 𝜦𝑖 are matrices Jordan
forms with latent roots on the main diagonal. If we define a new set of matrices such that
𝑿 = [𝑽𝑅1 𝑽𝑅2 … 𝑽𝑅ℓ ] and 𝑻 = blkdiag(𝜦1 , 𝜦2 , … 𝜦ℓ ) then the triple (𝑿, 𝑻, 𝒀) is a Jordan triple,
where 𝒀 = 𝑿−1 . In such case if 𝑻 is diagonal, i.e., if all eigenvalues 𝜆𝑖 are distinct, then

−1 (𝜆) −1
𝐱 𝑖 𝐲𝑖𝑇
𝑛 𝑛 𝐏𝑖
𝑨 = 𝑿(𝜆𝑰 − 𝑻) 𝒀 = ∑ =∑
𝑖=1 𝜆 − 𝜆𝑖 𝑖=1 𝜆 − 𝜆𝑖

Where 𝐱 𝑖 are columns of 𝑿 and 𝐲𝑖𝑇 are rows of 𝒀, and 𝐏𝑖 are projector matrices. In case of
repeated latent roots we orient the reader to see Hariche et al 1987.

 In the last chapter we have proved that the Jordan chain for each latent root 𝜆0 satisfy
𝑖−1
1 𝑑𝑗
∑ ( 𝑨(𝜆0 )) 𝐯𝑖−𝑗 = 𝟎 𝑖 = 1,2, … , 𝑘
(𝑗)! 𝑑𝑡𝑗
𝑗=0

Sometime we call the sequence of vectors {𝐯1 𝐯2 … 𝐯𝑘 } the principal vectors.

Theorem: The principal vectors of a solvent are principal latent vectors of 𝑨(𝜆).

The Fundamental Theorem of Algebra does not hold for matrix polynomials. This is
known from the extensive studies of the square root problem 𝑿2 = 𝑨 see the book of
Gantmacher pp 231. The next theorem demonstrates this claim

Theorem: There exists a matrix polynomial with no sol-vents.

Proof: This theorem can be checked by counter example, let we verify the next
2
𝑨(𝜆) = (𝜆 − 2𝜆 + 2 1 ) = (1 0) 𝜆2 − 2 (1 0) 𝜆 + ( 2 1)
2 0 1 0 1 −1 0
−1 𝜆 − 2𝜆
Definition: A sequence of matrices 𝑸1 , … . , 𝑸ℓ form a chain of spectral factors of 𝑨(𝜆) if 𝑸𝑖
is a left solvent of 𝑵𝑖 (𝜆), where 𝑵0 (𝜆) = 𝑰; 𝑵ℓ (𝜆) = 𝑨(𝜆) and

𝑵𝑖 (𝜆) = (𝜆𝑰 − 𝑸𝑖 )𝑵𝑖−1 (𝜆) 𝑖 = 1,2, … , ℓ

It should be noted that, in general, only 𝑸1 is a right solvent of 𝑨(𝜆). Furthermore, 𝑸ℓ is


a left solvent of 𝑨(𝜆). An equivalent definition of a chain of spectral factors could be
defined with 𝑸𝑖 is a right solvent of 𝑵𝑖 (𝜆), where 𝑵0 (𝜆) = 𝑰; 𝑵ℓ (𝜆) = 𝑨(𝜆) and

𝑵𝑖 (𝜆) = 𝑵𝑖−1 (𝜆)(𝜆𝑰 − 𝑸ℓ−𝑖+1 ) 𝑖 = 1,2, … , ℓ

If the latent roots of 𝑨(𝜆) are distinct then 𝑨(𝜆) has a chain of spectral factors, where
𝑨(𝜆) = 𝑨0 𝜆ℓ + 𝑨1 𝜆ℓ−1 +. . . +𝑨ℓ−1 𝜆 + 𝑨ℓ = (𝜆𝑰 − 𝑸ℓ ) … (𝜆𝑰 − 𝑸2 )(𝜆𝑰 − 𝑸1 )

Lemma: If 𝑸1 , … . , 𝑸ℓ form a chain of spectral factors for 𝑨(𝜆) , then


𝑨1 = −(𝑸1 + 𝑸2 … . +𝑸ℓ )
𝑨2 = +(𝑸1 𝑸2 + 𝑸1 𝑸3 … . +𝑸ℓ−1 𝑸ℓ )


𝑨ℓ = (−1) (𝑸1 𝑸2 𝑸3 … . 𝑸ℓ−1 𝑸ℓ )

Theorem: Given ℓ pairs of matrices, (𝑿𝑖 , 𝒀𝑖 ), 𝑖 = 1, … , ℓ, then there exists aunique matrix
polynomial 𝑷(𝜆) = 𝑨1 𝜆ℓ−1 + 𝑨2 𝜆ℓ−2 +. . . +𝑨ℓ−1 𝜆 + 𝑨ℓ such, that 𝑷(𝑿𝑖 ) = 𝒀𝑖 for 𝑖 = 1, … , ℓ, if
and only if the block Vandermonde matrix 𝑽(𝑿1 , … , 𝑿ℓ ) is nonsingular.

Proof: 𝑷(𝑿𝑖 ) = 𝒀𝑖 for 𝑖 = 1, … , ℓ is equivalent to


𝑰 𝑰 ⋯ 𝑰
𝑿1 𝑿2 ⋯ 𝑿ℓ
[𝑨ℓ 𝑨ℓ−1 … 𝑨2 𝑨1 ] ( ⋮ ⋮ ⋮ ⋮ ) = [𝒀1 𝒀2 … 𝒀ℓ−1 𝒀ℓ ] 
𝑿1ℓ−1 𝑿ℓ−1 …⋮ 𝑿ℓ−1
2 ℓ

Corollary: Given ℓ pairs of matrices, (𝑿𝑖 , 𝒀𝑖 ), 𝑖 = 1, … , ℓ, they uniquely determine a monic


matrix polynomial 𝑨(𝜆) = 𝑨0 𝜆ℓ + 𝑨1 𝜆ℓ−1 + 𝑨2 𝜆ℓ−2 +. . . +𝑨ℓ−1 𝜆 + 𝑨ℓ such, that 𝑨(𝑿𝑖 ) = 𝒀𝑖 for
𝑖 = 1, … , ℓ, if and only if the block Vandermonde matrix 𝑽(𝑿1 , … , 𝑿ℓ ) is nonsingular.

Proof: Let 𝑫𝑖 = 𝒀𝑖 − 𝑿ℓ𝑖 then 𝑷(𝑿𝑖 ) = 𝑫𝑖 for 𝑖 = 1, … , ℓ 

Let 𝑨(𝜆) have a complete set of solvents, 𝑹1 , … . , 𝑹ℓ such that 𝑽(𝑹1 , … . , 𝑹ℓ ) is nonsingular.
According to the last corollary there exists a unique matrix polynomial
(𝑖) (𝑖) (𝑖) (𝑖)
𝑴𝑖 (𝜆) = 𝑨1 𝜆ℓ−1 + 𝑨2 𝜆ℓ−2+. . . +𝑨ℓ−1 𝜆 + 𝑨ℓ for 𝑖 = 1, … , ℓ

such that 𝑴𝑖 (𝑹𝑗 ) = 𝛿𝑖𝑗 𝑰 where 𝛿𝑖𝑗 is the Kronecker delta.

𝑰 𝑖=𝑗
𝑴𝑖 (𝑹𝑗 ) = {
𝟎 𝑖≠𝑗

Note that 𝑴𝑖 (𝜆) has the same solvents as 𝑨(𝜆), except 𝑺𝑖 has been deflated out. The
matrices 𝑴𝑖 (𝜆) are called the fundamental matrix polynomials.

Denote by 𝑽(𝑹1 , … , 𝑹𝑖−1 , 𝑹𝑖+1 , … , 𝑹ℓ ) the block Vandermonde at the ℓ − 1 solvents; that is
the set 𝑹1 , … . , 𝑹ℓ , with 𝑹𝑖 deleted.
Theorem: It is clearly that, if matrices 𝑹1 , … . , 𝑹ℓ are such that 𝑽(𝑹1 , … . , 𝑹ℓ ) is nonsingular
(𝑖) (𝑖) (𝑖) (𝑖)
then there exists a unique matrix polynomials 𝑴𝑖 (𝜆) = 𝑨1 𝜆ℓ−1 + 𝑨2 𝜆ℓ−2 +. . . +𝑨ℓ−1 𝜆 + 𝑨ℓ
for 𝑖 = 1, … , ℓ such that 𝑴1 (𝜆) … 𝑴ℓ (𝜆) are fundamental matrix polynomials. Now, if
(𝑘)
furthermore, the block Vandermonde 𝑽(𝑹1 , … , 𝑹𝑖−1 , 𝑹𝑖+1 , … , 𝑹ℓ ) is nonsingular then 𝑨1 is
nonsingular.

Proof: 𝑽(𝑹1 , … , 𝑹ℓ ) is nonsingular implies that there exists unique set of fundamental
matrix polynomials 𝑴1 (𝜆) … 𝑴ℓ (𝜆).

Now let the block Vandermonde matrix 𝑽(𝑹1 , … , 𝑹𝑘−1 , 𝑹𝑘+1 , … , 𝑹ℓ ) be a nonsingular and
according to the previous corollary it imply that there exists a unique monic matrix
(𝑘) (𝑘) (𝑘)
polynomial 𝑵𝑘 (𝜆) = 𝑰𝜆ℓ−1 + 𝑵1 𝜆ℓ−2 +. . . +𝑵ℓ−1 𝜆 + 𝑵ℓ , such that 𝑵𝑘 (𝑹𝑗 ) = 𝟎 for 𝑗 ≠ 𝑘.

𝑵𝑘 (𝑹𝑘 ) for 𝑗 = 𝑘
Consider 𝑸(𝑹𝑗 ) = 𝑵𝑘 (𝑹𝑘 )𝑴𝑘 (𝑹𝑗 ) = { Since 𝑽(𝑹1 , … . , 𝑹ℓ ) is nonsingular
𝟎 for 𝑗 ≠ 𝑘
and both 𝑸(𝜆) and 𝑵𝑘 (𝜆) are of degree ℓ − 1, it follows that 𝑸(𝜆) ≡ 𝑵𝑘 (𝜆) Thus,

𝑵𝑘 (𝑹𝑗 ) = 𝑵𝑘 (𝑹𝑘 )𝑴𝑘 (𝑹𝑗 )


(𝑘) (𝑘)
Equating leading coefficients, we get 𝑰 = 𝑵𝑘 (𝑹𝑘 )𝑨1 and thus 𝑨1 is nonsingular. 

The fundamental matrix polynomials 𝑴1 (𝜆) … 𝑴ℓ (𝜆) can be used in a generalized


Lagrange interpolation formula. Paralleling the scalar case we get the following
representation theorems.

Theorem: If matrices 𝑹1 , … . , 𝑹ℓ are such that 𝑽(𝑹1 , … . , 𝑹ℓ ) is nonsingular, and


𝑴1 (𝜆) … 𝑴ℓ (𝜆) are a set of fundamental matrix polynomials, then, for an arbitrary

𝑩(𝜆) = 𝑩1 𝜆ℓ−1 + 𝑩2 𝜆ℓ−2 +. . . +𝑩ℓ−1 𝜆 + 𝑩ℓ

it follows that 𝑩(𝜆) = ∑ℓ𝑖=1 𝑬 𝑴𝑖 (𝜆) with 𝑬 = 𝑩(𝑹𝑖 ) generalized interpolation formula

Proof: Let 𝑮(𝜆) = ∑ℓ𝑖=1 𝑩(𝑹𝑖 ) 𝑴𝑖 (𝜆) then 𝑮(𝑹𝑖 ) = 𝑩(𝑹𝑖 ) for 𝑖 = 1, … , ℓ. Since the block
Vandermonde is nonsingular, it follows that 𝑮(𝜆) is unique and, hence, 𝑮(𝜆) = 𝑩(𝜆). 

Theorem: If 𝑨(𝜆) has a set of right solvents, 𝑹1 , … . , 𝑹ℓ , such that 𝑽(𝑹1 , … . , 𝑹ℓ ) and
𝑽(𝑹1 , … , 𝑹𝑖−1 , 𝑹𝑖+1 , … , 𝑹ℓ ) for each 𝑖 = 1, … , ℓ are nonsingular and 𝑴1 (𝜆) … 𝑴ℓ (𝜆) are the set
of fundamental matrix polynomials, then
(𝑖)
𝑴𝑖 (𝜆)𝜆 − 𝑹𝑖 𝑴𝑖 (𝜆) = 𝑨1 𝑨(𝜆) for 𝑖 = 1, … , ℓ
(𝑖)
where 𝑨1 is the leading matrix coefficient of 𝑴𝑖 (𝜆).
(𝑖)
If 𝑿 ∈ ℂ𝑚×𝑚 then the right evaluation gives 𝑴𝑖 (𝑿)𝑿 − 𝑹𝑖 𝑴𝑖 (𝑿) = 𝑨1 𝑨(𝑿)

Proof: Let 𝑸𝑖 (𝑿) = 𝑴𝑖 (𝑿)𝑿 − 𝑹𝑖 𝑴𝑖 (𝑿) Note that 𝑸𝑖 (𝑹𝑗 ) = 𝟎 for all 𝑗. 𝑨(𝑿) is the unique
monic matrix polynomial with right solvents 𝑹1 , … , 𝑹ℓ since 𝑽(𝑹1 , … . , 𝑹ℓ ) is nonsingular.
(𝑖)
The leading matrix coefficient of 𝑸𝑖 (𝑿) is 𝑨1 which is non-singular, since
(𝑖) −1
𝑽(𝑹1 , … , 𝑹𝑖−1 , 𝑹𝑖+1 , … , 𝑹ℓ ) is non-singular. Thus, 𝑨(𝑿) = (𝑨1 ) 𝑸𝑖 (𝑿). 
A previous result stated that if 𝑳𝑖 is a left solvent of 𝑨(𝜆), then there exists a unique,
monic polynomial 𝑯𝑖 (𝑿) of degree ℓ − 1, such that

𝑨(𝜆) = 𝑯𝑖 (𝜆)𝜆 − 𝑳𝑖 𝑯𝑖 (𝜆) ⟺ the right evaluation 𝑨(𝑿) = 𝑯𝑖 (𝑿)𝑿 − 𝑳𝑖 𝑯𝑖 (𝑿)

(𝑖) −1
Corollary: Under the conditions of the previous theorem 𝑯𝑖 (𝑿) = (𝑨1 ) 𝑴𝑖 (𝑿) and
−1 (𝑖) (𝑖) −1
therefore 𝑨(𝑹𝑖 ) = 𝑯𝑖 (𝑹𝑖 )𝑹𝑖 − 𝑳𝑖 𝑯𝑖 (𝑹𝑖 ) = 𝟎 ⟹ 𝑹𝑖 = (𝑯𝑖 (𝑹𝑖 )) 𝑳𝑖 𝑯𝑖 (𝑹𝑖 ) = 𝑨1 𝑳𝑖 (𝑨1 ) which
(𝑖) −1 (𝑖)
is the similarity transformation between left and right solvents: 𝑳𝑖 = (𝑨1 ) 𝑹𝑖 𝑨1 .

If 𝑨(𝜆) has a set of right solvents, 𝑹1 , … . , 𝑹ℓ such that the block Vandermonde 𝑽(𝑹1 , … . , 𝑹ℓ )
and 𝑽(𝑹1 , … , 𝑹𝑖−1 , 𝑹𝑖+1 , … , 𝑹ℓ ) are all nonsingular, then, by similarity transformation
between left and right solvents there exists a set of left solvents of 𝑨(𝜆), 𝑳1 , … , 𝑳ℓ such that
𝑳𝑖 is similar to 𝑹𝑖 for all 𝑖.

(𝑖) −1
Remark: From the above result it can be concluded that 𝑨(𝜆) = (𝜆𝑰 − 𝑳𝑖 )(𝑨1 ) 𝑴𝑖 (𝜆).

The block Vandermonde matrix is of fundamental importance to the theory of Matrix


Polynomials. So we are going to consider some of its properties.

It is well known that in the scalar case (𝑚 = 1), det( 𝑽(𝜆1 , … , 𝜆ℓ )) = ∏𝑖>𝑗(𝜆𝑖 − 𝜆𝑗 ) and, thus,
the Vandermonde is nonsingular if the set of 𝜆𝑖 ′𝑠 are distinct. One might expect that if
the eigenvalues of 𝑿1 and 𝑿2 are disjoint and distinct, then 𝑽(𝑿1 , 𝑿2 ) is non-singular.
That this is not the case is shown by the following example.

Example: The determinant of the block Vandermonde at two points is

𝑰 𝑰
det(𝑽(𝑿1 , 𝑿2 )) = det ( ) = det(𝑿2 − 𝑿1 )
𝑿1 𝑿2

Even if 𝑿1 and 𝑿2 have no eigenvalues in common, 𝑿2 − 𝑿1 may still be singular. But if


𝑿1 , 𝑿2 is a complete set of disjoint solvents then 𝑿2 − 𝑿1 still be non-singular. So the
scalar case is different completely from the block matrix case.

Recall that if 𝑨(𝜆) has distinct latent roots, then there exists acomplete set of right
solvents of 𝑨(𝜆), 𝑹1 , … . , 𝑹ℓ and for any such set of solvents, 𝑽(𝑹1 , … . , 𝑹ℓ ) nonsingular,

Theorem: If 𝑽(𝑹1 , … , 𝑹𝑘 ) is nonsingular for 𝑘 = 2 , … , 𝑟 − 1, and we define monic matrix


(𝑑)
polynomial 𝑭𝑘 (𝜆) of degree 𝑑 ≥ 𝑘 with right solvents 𝑹1 , … , 𝑹𝑘 then

(𝑟−1)
det(𝑽(𝑹1 , … , 𝑹𝑟 )) = det(𝑽(𝑹1 , … , 𝑹𝑟−1 )) det (𝑭𝑟−1 (𝑹𝑟 ))

(𝑑)
Proof: The non-singularity of 𝑽(𝑹1 , … , 𝑹𝑟−1 ) guarantee that 𝑭𝑘 (𝜆) exists uniquely and
the determinant det(𝑽(𝑹1 , … , 𝑹𝑟 )) will be evaluated by block Gaussian elimination using
the fact that for an arbitrary matrix 𝑬 of the proper dimension

𝑨 𝑩 𝑨 + 𝑬𝑪 𝑩 + 𝑬𝑫
det ( ) = det ( )
𝑪 𝑫 𝑪 𝑫
It is showed that in (J. E. Dennis 1973) after 𝑘 steps of the block Gaussian elimination,

𝑰 𝑰
𝑰 𝑰 𝑹3 − 𝑹1 … 𝑹𝑟 − 𝑹1
𝟎 𝑹2 − 𝑹1 (2) ⋮ (2)
det(𝑽(𝑹1 , … , 𝑹𝑟 )) = det 𝟎 𝟎 𝑭2 (𝑹3 ) 𝑭2 (𝑹𝑟 ) = ⋯
⋮ ⋮ ⋮ ⋮ ⋮
𝟎 𝟎 (𝑟−1) (𝑟−1)
( 𝑭2 (𝑹3 ) 𝑭2 (𝑹𝑟 ))

(𝑖−1) (𝑖−1) (𝑘−1) −1 (𝑘−1)
The general term for the (𝑖, 𝑗 > 𝑘) block, is 𝑭𝑘−1 (𝑹𝑗 ) − 𝑭𝑘−1 (𝑹𝑘 ) (𝑭𝑘−1 (𝑹𝑘 )) 𝑭𝑘−1 (𝑹𝑗 ).
Using the fact that the determinant of a block triangular matrix is the product of the
determinants of the diagonal matrices, the result follows. 

Corollary: Given a set of matrices 𝑹1 , … , 𝑹ℓ such that det(𝑽(𝑹1 , … , 𝑹𝑘 )) is nonsingular for


−1
𝑘 = 2, … , ℓ, the iteration 𝑵0 (𝑿) = 𝑰, 𝑵𝑖 (𝑿) = 𝑵𝑖−1 (𝑿)𝑿 − (𝑵𝑖−1 (𝑹𝑖 )𝑹𝑖 (𝑵𝑖−1 (𝑹𝑖 )) ) 𝑵𝑖−1 (𝑿) is
defined and yields an ℓ degree monic matrix polynomial 𝑵ℓ (𝑿) such that 𝑵ℓ (𝑹𝑖 ) = 𝟎 for
𝑖 = 1,2, … , ℓ.
(1) (𝑘)
Proof: 𝑵1 (𝑿) = 𝑿 − 𝑹1 = 𝑭1 (𝑿). Assume 𝑵𝑘 (𝑿) = 𝑭𝑘 (𝑿) Then, from the given iteration,
(𝑘+𝟏)
𝑵𝑘+1 (𝑹𝑖 ) = 𝟎 for 𝑖 = 1, … , 𝑘 + 1 and, hence, 𝑵𝑘+1 (𝑿) = 𝑭𝑘+1 (𝑿) The sequence of block
Vandermonde being nonsingular guarantees the non-singularity 𝑵𝑖−1 (𝑹𝑖 ). 

Transformation of Solvents to Spectral Factors: since the diagonal forms of a


complete set of solvents 𝚲𝑅 = 𝑽−1𝑅 𝑨𝑐 𝑽𝑅 = blkdiag(𝑹1 , … , 𝑹ℓ ) and those of a complete set of
spectral factors 𝚲𝑄 = blkdiag(𝑸1 , … , 𝑸ℓ ) are identical, then they are related by similarity
transformations 𝚲𝑅 = 𝑷𝚲𝑄 𝑷−1.

Theorem Consider a complete set of right solvents {𝑹1 , … , 𝑹ℓ } of a monic 𝜆-matrix 𝑨(𝜆),
then 𝑨(𝜆) can be factored into a product as: 𝑨(𝜆) = 𝑵ℓ (𝜆) = (𝜆𝑰 − 𝑸ℓ )(𝜆𝑰 − 𝑸ℓ−1 ) … (𝜆𝑰 − 𝑸1 )
−1
Using the following recursive scheme: 𝑸𝑘 = 𝑵𝑘−1 (𝑹𝑘 )𝑹𝑘 (𝑵𝑘−1 (𝑹𝑘 )) for 𝑘 = 1,2, … , ℓ
where: 𝑵𝑘 (𝜆) = (𝜆𝑰 − 𝑸𝑘 )𝑵𝑘−1 (𝜆) for 𝑘 = 1,2, … , ℓ and for any 𝑗 we write

𝑵𝑘 (𝑹𝑗 ) = 𝑵𝑘−1 (𝑹𝑗 )𝑹𝑗 − 𝑸𝑘 𝑵𝑘−1 (𝑹𝑗 ) for 𝑘 = 1,2, … , ℓ

With 𝑵0 (𝜆) = 𝑰, 𝑵0 (𝑹𝑗 ) = 𝑰 for any 𝑗 and rank (𝑵𝑘−1 (𝑹𝑗 )) = 𝑚 for 𝑘 = 1,2, … , ℓ

Proof: 𝑽(𝑹1 , … , 𝑹𝑘 ) is a block Vandermonde matrix; therefore, the following identity holds:

det(𝑽(𝑹1 , … , 𝑹𝑘 )) = det(𝑽(𝑹1 , … , 𝑹𝑘−1 )) det(𝑵𝑘−1 (𝑹𝑘 ))

If 𝑽(𝑹1 , … , 𝑹𝑘 ) is non-singular, then 𝑵𝑘−1 (𝑹𝑘 ) in is non-singular for 𝑘 = 1,2, , ℓ. Thus,


−1
𝑸𝑘 = 𝑵𝑘−1 (𝑹𝑖 )𝑹𝑖 (𝑵𝑘−1 (𝑹𝑖 )) exists.

Expanding 𝑵𝑘 (𝜆) = (𝜆𝑰 − 𝑸𝑘 )𝑵𝑘−1 (𝜆) we have 𝑵𝑘 (𝜆) = (𝜆𝑰 − 𝑸𝑘 )(𝜆𝑰 − 𝑸𝑘−1 ) … (𝜆𝑰 − 𝑸1 ). The
right evaluation of 𝑹𝑘 in the matrix polynomial 𝑵𝑘 (𝜆) gives:

𝑵𝑘 (𝜆) = (𝜆𝑰 − 𝑸𝑘 )𝑵𝑘−1 (𝜆) ⟹ 𝑵𝑘 (𝑹𝑘 ) = 𝑵𝑘−1 (𝑹𝑘 )𝑹𝑘 − 𝑸𝑘 𝑵𝑘−1 (𝑹𝑘 )
−1
Substituting 𝑸𝑘 = 𝑵𝑘−1 (𝑹𝑘 )𝑹𝑘 (𝑵𝑘−1 (𝑹𝑘 )) into the last equation yield
−1
𝑵𝑘 (𝑹𝑘 ) = 𝑵𝑘−1 (𝑹𝑘 )𝑹𝑘 − (𝑵𝑘−1 (𝑹𝑘 )𝑹𝑘 (𝑵𝑘−1 (𝑹𝑘 )) ) 𝑵𝑘−1 (𝑹𝑘 ) = 𝟎

Also, by replacing the 𝑘 in 𝑵𝑘 (𝑹𝑘 ) = 𝑵𝑘−1 (𝑹𝑘 )𝑹𝑘 − 𝑸𝑘 𝑵𝑘−1 (𝑹𝑘 ) by (𝑘 − 1) and using the
−1
relationship 𝑸𝑘 = 𝑵𝑘−1 (𝑹𝑘 )𝑹𝑘 (𝑵𝑘−1 (𝑹𝑘 )) we have

𝑵𝑘−1 (𝑹𝑘−1 ) = 𝑵𝑘−2 (𝑹𝑘−1 )𝑹𝑘−1 − 𝑸𝑘−1 𝑵𝑘−2 (𝑹𝑘−1 )


−1
= 𝑵𝑘−2 (𝑹𝑘−1 )𝑹𝑘−1 − (𝑵𝑘−2 (𝑹𝑘−1 )𝑹𝑘−1 (𝑵𝑘−2 (𝑹𝑘−1 )) ) 𝑵𝑘−2 (𝑹𝑘−1 ) = 𝟎

Also we have 𝑵𝑘 (𝜆) = (𝜆𝑰 − 𝑸𝑘 )𝑵𝑘−1 (𝜆) ⟹ 𝑵𝑘 (𝑹𝑘−1 ) = 𝑵𝑘−1 (𝑹𝑘−1 )𝑹𝑘−1 − 𝑸𝑘 𝑵𝑘−1 (𝑹𝑘−1 ) = 𝟎

Repeating the above processes of right evaluations, we have 𝑵𝑘 (𝑹𝑗 ) = 0 for 𝑗 ≤ 𝑘. This
implies that the right solvents 𝑹𝑗 , 𝑗 = 1, . . . , 𝑘, of 𝑨(𝜆) are the right solvents of 𝑵𝑘 (𝜆). Since
a monic 𝜆-matrix can be uniquely determined by a complete set of solvents if 𝑽(𝑹1 , … , 𝑹ℓ )
is non-singular, we have 𝑨(𝜆) = 𝑵ℓ (𝜆) = (𝜆𝑰 − 𝑸ℓ )(𝜆𝑰 − 𝑸ℓ−1 ) … (𝜆𝑰 − 𝑸1 ) Thus, given
scheme is the algorithm to determine the spectral factors 𝑸𝑖 ; from the known right
solvents 𝑹𝑗 of 𝑨(𝜆). ■ Q.E.D

In a similar manner, the spectral factors can be obtained from the known 𝑳𝑖 of 𝑨(𝜆) as
follows: we first take the order change 𝑸1 ⟵ 𝑸ℓ , 𝑸2 ⟵ 𝑸ℓ−1 … 𝑸ℓ ⟵ 𝑸1 that is 𝑸𝑘 ≡ 𝑸ℓ+1−𝑘 .
−1
Using the following recursive scheme: 𝑸𝑘 = (𝑴𝑘−1 (𝑳𝑘 )) 𝑳𝑘 𝑴𝑘−1 (𝑳𝑘 ) for 𝑘 = 1,2, … , ℓ
where: 𝑴𝑘 (𝜆) = 𝑴𝑘−1 (𝜆)(𝜆𝑰 − 𝑸𝑘 ) for 𝑘 = 1,2, … , ℓ and for any 𝑗 we write

𝑴𝑘 (𝑳𝑗 ) = 𝑳𝑗 𝑴𝑘−1 (𝑳𝑗 ) − 𝑴𝑘−1 (𝑳𝑗 )𝑸𝑘 for 𝑘 = 1,2, … , ℓ

With 𝑴0 (𝜆) = 𝑰, 𝑴0 (𝑳𝑗 ) = 𝑰 for any 𝑗 and rank (𝑴𝑘−1 (𝑳𝑗 )) = 𝑚 for 𝑘 = 1,2, … , ℓ. 𝑴𝑘−1 (𝑳𝑗 ) is
a left matrix polynomial of 𝑴𝑘−1 (𝜆) having 𝜆 replaced by a left solvent 𝑳𝑗 the spectral
factorization of 𝑨(𝜆) becomes: 𝑨(𝜆) = 𝑴ℓ (𝜆) = (𝜆𝑰 − 𝑸1 )(𝜆𝑰 − 𝑸2 ) … (𝜆𝑰 − 𝑸ℓ ).

Transformation of Spectral Factors to Right (Left) Solvents: If a complete set of


spectral factors of a 𝜆-matrix is given or can be determined (Dennis et.al.1978, Shieh and
Chah in1981) without using latent vectors of the 𝜆-matrix, these available spectral
factors can be transformed to a complete set of right (left) solvents. The transformation of
spectral factors to right (left) solvents of a 𝜆-matrix can be derived as follows.

Theorem Given a monic 𝜆-matrix with all elementary divisors being linear
ℓ−1
𝑨(𝜆) = ∏ (𝜆𝑰 − 𝑸ℓ−𝑖 ) = (𝜆𝑰 − 𝑸ℓ )(𝜆𝑰 − 𝑸ℓ−1 ) … (𝜆𝑰 − 𝑸1 )
𝑖=0

where 𝑸𝑘 (≜ 𝑸ℓ+1−𝑘 ) 𝑘 = 1, . . . , ℓ is a complete set of spectral factors of a 𝜆-matrix 𝑨(𝜆),


and 𝜎(𝑸𝑖 )⋂𝜎(𝑸𝑗 ) = ∅ Define 𝜆-matrices 𝑵𝑘 (𝜆) 𝑘 = 1, … , ℓ as follow:

𝑵𝑘 (𝜆) = 𝑰𝑚 𝜆ℓ−𝑘 + 𝑨1𝑘 𝜆ℓ−𝑘−1 +. . . +𝑨(ℓ−𝑘−1)𝑘 𝜆 + 𝑨(ℓ−𝑘)𝑘


= (𝜆𝑰 − 𝑸𝑘 )−1 𝑵𝑘−1 (𝜆) for 𝑘 = 1,2, … , ℓ
with 𝑵0 = 𝑨(𝜆) then the transformation matrix 𝑷𝑘 (rank(𝑷𝑘 ) = 𝑚) which transforms the
spectral factor 𝑸𝑘 (≜ 𝑸ℓ+1−𝑘 ) into the right solvent 𝑹𝑘 (≜ 𝑹ℓ+1−𝑘 ) of 𝑨(𝜆) can be
constructed from the new algorithm as follow: 𝑹𝑘 ≜ 𝑹ℓ+1−𝑘 = 𝑷𝑘 𝑸𝑘 𝑷−1
𝑘 𝑘 = 1, … , ℓ
where: the 𝑚 × 𝑚 matrix 𝑷𝑘 can be solved from the following matrix equation 𝑘 = 1, … , 𝑚
−1
vec(𝑷𝑘 ) = (𝑮𝑘 (𝑸𝑘 )) vec(𝑰𝑚 ) with (rank(𝑮𝑘 (𝑸𝑘 )) = 𝑚2 )
𝑇
where 𝑮𝑘 (𝑸𝑘 ) is defined by: 𝑮𝑘 (𝑸𝑘 ) = ∑ℓ𝑖=𝑘(𝑸ℓ−𝑖
𝑘 ) ⊗ 𝑨(𝑖−𝑘)𝑘

Proof: we have 𝑵𝑘−1 (𝜆) = (𝜆𝑰 − 𝑸𝑘 )𝑵𝑘 (𝜆) the corresponding right evaluation of gives

𝑵𝑘−1 (𝜆) = (𝜆𝑰 − 𝑸𝑘 )𝑵𝑘 (𝜆) ⟹ 𝑵𝑘−1 ( 𝑹𝑘 ) = 𝑵𝑘 (𝑹𝑘 )𝑹𝑘 − 𝑸𝑘 𝑵𝑘 (𝑹𝑘 )

We want to have 𝑹𝑘 = 𝑷𝑘 𝑸𝑘 𝑷−1 −1 −1


𝑘 ⟹ 𝑷𝑘 𝑹𝑘 − 𝑸𝑘 𝑷𝑘 = 𝟎, so we can force 𝑵𝑘 (𝑹𝑘 ) = 𝑷𝑘 we
−1

obtain 𝑵𝑘−1 ( 𝑹𝑘 ) = 𝟎 and 𝑵𝑘 (𝑹𝑘 ) = 𝑰𝑚 𝑹ℓ−𝑘


𝑘 + 𝑨1𝑘 𝑹ℓ−𝑘−1
𝑘 +. . . +𝑨(ℓ−𝑘−1)𝑘 𝑹𝑘 + 𝑨(ℓ−𝑘)𝑘 = 𝑷−1
𝑘
−1
Substituting 𝑹𝑘 = 𝑷𝑘 𝑸𝑘 𝑷𝑘 into these formula yields
ℓ−𝑘
𝑵𝑘 (𝑷𝑘 𝑸𝑘 𝑷−1
𝑘 ) = (𝑷𝑘 𝑸𝑘 + 𝑨1𝑘 𝑷𝑘 𝑸ℓ−𝑘−1
𝑘 +. . . +𝑨(ℓ−𝑘−1)𝑘 𝑷𝑘 𝑸𝑘 + 𝑨(ℓ−𝑘)𝑘 𝑷𝑘 )𝑷−1 −1
𝑘 = 𝑷𝑘

This implies that: 𝑷𝑘 𝑸ℓ−𝑘


𝑘 + 𝑨1𝑘 𝑷𝑘 𝑸ℓ−𝑘−1
𝑘 +. . . +𝑨(ℓ−𝑘−1)𝑘 𝑷𝑘 𝑸𝑘 + 𝑨(ℓ−𝑘)𝑘 𝑷𝑘 = 𝑰 The solution of
this last matrix equation (Lancaster1970) is
−1

𝑇
vec(𝑷𝑘 ) = (∑(𝑸ℓ−𝑖
𝑘 ) ⊗ 𝑨(𝑖−𝑘)𝑘 ) vec(𝑰𝑚 )
𝑖=𝑘

Because in general, we have 𝑵𝑘−1 ( 𝑹𝑘 ) = 𝟎 and 𝑵0 = 𝑨(𝜆) then we can conclude that 𝑹𝑘 ,
𝑘 = 1,2, … , ℓ are the right solvents of 𝑨(𝜆). Thus, the known spectral factors 𝑸𝑘 can be
transformed to the right solvents 𝑹𝑘 of 𝑨(𝜆) using the prescribed algorithm. ■ Q.E.D

In the same fashion the complete set of spectral factors 𝑸𝑘 , 𝑘 = 1,2, … , ℓ can be converted
into left solvents 𝑳𝑘 , 𝑘 = 1,2, … , ℓ using the following algorithm.

𝑴𝑘 (𝜆) = 𝑰𝑚 𝜆ℓ−𝑘 + 𝑨1𝑘 𝜆ℓ−𝑘−1 + ⋯ + 𝑨(ℓ−𝑘−1)𝑘 𝜆 + 𝑨(ℓ−𝑘)𝑘


= 𝑴𝑘−1 (𝜆)(𝜆𝑰 − 𝑸𝑘 )−1 for 𝑘 = 1,2, … , ℓ

𝑇
𝑯𝑘 (𝑸𝑘 ) = ∑(𝑨(𝑖−𝑘)𝑘 ) ⊗ 𝑸ℓ−𝑖
𝑘 with (rank(𝑯𝑘 (𝑸𝑘 )) = 𝑚2 )
𝑖=𝑘

−1
vec(𝑺𝑘 ) = (𝑯𝑘 (𝑸𝑘 )) vec(𝑰𝑚 )

𝑳𝑘 = (𝑺𝑘 )−1 𝑸𝑘 𝑺𝑘 for 𝑘 = 1,2, … , ℓ


Algorithm: (𝑸𝑖 to right solvents 𝑹𝑖 ) Algorithm: (𝑸𝑖 to left solvents 𝑳𝑖 )
Given 𝑨1 , 𝑨2 , … 𝑨ℓ and 𝑸1 , 𝑸2 , … 𝑸ℓ Given 𝑨1 , 𝑨2 , … 𝑨ℓ and 𝑸1 , 𝑸2 , … 𝑸ℓ
For 𝑖 = 1: ℓ For 𝑖 = 1: ℓ
𝑵𝑖0 = 𝑰𝑚 & 𝑿𝑖 = 𝑸𝑙−𝑖+1 % flipping the order 𝑴𝑖0 = 𝑰𝑚 & 𝑿𝑖 = 𝑸𝑖 % don't flip the order
For 𝑗 = 1: ℓ − 𝑖 For 𝑗 = 1: ℓ − 𝑖
𝑵𝑖𝑗 = 𝑨𝑗 + 𝑿𝑖 𝑵𝑖(𝑗−1) 𝑴𝑖𝑗 = 𝑨𝑗 + 𝑴𝑖(𝑗−1) 𝑿𝑖
ℓ ℓ
ℓ−𝑗 𝑇 𝑇 ℓ−𝑗
𝑮𝑖 = ∑ (𝑿𝑖 ) ⨂𝑵𝑖(𝑗−1) 𝑯𝑖 = ∑ (𝑴𝑖(𝑗−𝑖) ) ⊗ 𝑿𝑖
𝑗=𝑖 𝑗=𝑖
𝑨𝑗 = 𝑵𝑖𝑗 𝑨𝑗 = 𝑴𝑖𝑗
End End
vec(𝑷𝑖 ) = (𝑮𝑖 )−1 vec(𝑰𝑚 ) vec(𝑺𝑖 ) = (𝑯𝑖 )−1 vec(𝑰𝑚 )
𝑹𝑖 = 𝑷𝑖 𝑿𝑖 (𝑷𝑖 )−1 𝑳𝑖 = (𝑺𝑖 )−1 𝑿𝑖 𝑺𝑖
End End

Example The following numerical results will clarify the 𝑸𝑖 to right solvents 𝑹𝑖 algorithm

clear all, clc, format('short','e'), Z=zeros(2,2); I=eye(2,2);


A0 =[1 0;0 1]; A1 =[-3.00 1.00;-4 -3]; A2 =[3.00 0.00;4 3.00];
A3 =[-5.00 -1.00;-4.00 -5.00];
q1 =[2.00 -1.00; 3.00 2.00];
q2 =[2.00 -1.00; 3.0 0.00];
q3 =[-1.00 1.00;-2.00 1.00];

% A(λ)=(λI- q3)*(λI- q2)*(λI- q1)


%-------------------------------------

Q1=q3; Q2=q2; Q3=q1; N11=A1+Q1; N12=A2+Q1*N11;


G1=kron((Q1^2)',I) + kron(Q1',N11) + kron(I,N12);
Vec_P1=inv(G1)*[1;0;0;1];
P1=[Vec_P1(1:2),Vec_P1(3:4)];
R1=P1*Q1*inv(P1);

%-------------------------------------
N21=N11+Q2;
G2=kron(Q2',I) + kron(I,N21);
Vec_P2=inv(G2)*[1;0;0;1];
P2=[Vec_P2(1:2),Vec_P2(3:4)];
R2=P2*Q2*inv(P2);

%-------------------------------------
G3=kron(I,I);
Vec_P3=inv(G3)*[1;0;0;1];
P3=[Vec_P3(1:2),Vec_P3(3:4)];
R3=P3*Q3*inv(P3);
% result
>> R1=[1.00 1.00;-2.00 -1.00];
>> R2=[1.00 -1.00;2.00 1.00];
>> R3=[2.00 -1.00;3.00 2.00];
Example The following numerical results will clarify the 𝑸𝑖 to left solvents 𝑳𝑖 algorithm

clear all, clc, format('short','e'), Z=zeros(2,2); I=eye(2,2);


A0 =[1 0;0 1]; A1 =[-3.00 1.00;-4 -3]; A2 =[3.00 0.00;4 3.00];
A3 =[-5.00 -1.00;-4.00 -5.00];
q1 =[2.00 -1.00; 3.00 2.00];
q2 =[2.00 -1.00; 3.0 0.00];
q3 =[-1.00 1.00;-2.00 1.00];

% A(λ)=(λI- q3)*(λI- q2)*(λI- q1)


%-------------------------------------
Q1 =q1; Q2 =q2; Q3 =q3;
M11=A1+Q1; M12=A2+M11*Q1;
H1=kron(I,(Q1^2)) + kron(M11',(Q1)) + kron(M12',I);
Vec_S1=inv(H1)*[1;0;0;1]; S1=[Vec_S1(1:2),Vec_S1(3:4)];
L1=inv(S1)*Q1*S1;
%-------------------------------------
M21=M11+Q2; %M21=-M12*inv(Q2)
H2=kron(I,Q2)+ kron(M21',I);
Vec_S2=inv(H2)*[1;0;0;1]; S2=[Vec_S2(1:2),Vec_S2(3:4)];
L2=inv(S2)*Q2*S2;
%-------------------------------------
H3=kron((I),I);
Vec_S3=inv(H3)*[1;0;0;1]; S3=[Vec_S3(1:2),Vec_S3(3:4)];
L3=inv(S3)*Q3*S3;
% result
>> L1=[2.00 -1.00;3.00 2.00];
>> L2=[1.00 -1.00;2.00 1.00];
>> L3=[-1.00 1.00;-2.00 1.00];

Transformation Between Left and Right Solvents: For design and analysis of large-
scale multivariable systems, it is useful to determine a complete set of solvents of the
matrix polynomial. Given the matrix polynomial 𝑨(𝜆) if a right solvent 𝑹 is obtained, the
left solvent of 𝑳 of 𝑨(𝜆) associated with 𝑹 can be determined using an algorithmic
relationship (Tsai, J S.H. and Chen, C.M. and Shieh, L.S. 1992)

Let we look for 𝑨(𝜆) = (𝜆𝑰 − 𝑳)𝑸(𝜆) = 𝑷(𝜆)(𝜆𝑰 − 𝑹) ⟹ (𝜆𝑰 − 𝑳) = 𝑷(𝜆)(𝜆𝑰 − 𝑹)𝑸−1 (𝜆)

First of all we can conclude that there is a similarity transformation between right and
left solvents. If we let 𝜆 = 0 then 𝑳 = 𝑷−1 (0)𝑹 𝑸(0), but 𝑸(𝜆), 𝑷(𝜆) are not provided so, we
think for an algorithmic procedure to find such similarity in between. Let 𝑳𝑘 = 𝑸𝑘−1 𝑹𝑘 𝑸𝑘
where rank(𝑸𝑘 ) = 𝑚 and we want to find a recursive scheme such that
𝑸𝑘
𝑹𝑘 ↔ 𝑳 𝑘

Remark: In this part, we do not mean by 𝑸𝑘 a linear factor rather it is a similar between
𝑳𝑘 and 𝑹𝑘 , , so there is no need for confusion.
Consider as a special case:

𝑨(𝜆) = 𝑩ℓ (𝜆)(𝜆𝑰 − 𝑹ℓ ) = (𝜆𝑰 − 𝑳1 ) 𝑷(𝜆)(𝜆𝑰 − 𝑹ℓ ) ⟹ 𝑩ℓ (𝜆) = (𝜆𝑰 − 𝑳1 ) 𝑷(𝜆)

with 𝑩ℓ (𝜆) = 𝑩ℓ0 𝜆ℓ−1 + 𝑩ℓ1 𝜆ℓ−2 +. . . +𝑩ℓ(ℓ−2) 𝜆 + 𝑩ℓ(ℓ−1) The coefficient can be determined by
long division 𝑩ℓ𝑘 = ∑𝑘𝑖=0 𝑨𝑖 𝑹𝑘−𝑖
ℓ 𝑘 = 0,2, … , ℓ − 1.

The left evaluation of 𝑩ℓ (𝜆) at 𝑳1 will give 𝑩ℓ (𝑳1 ) = 𝟎 but 𝑩ℓ (𝑳ℓ ) = constant from the other
side we have

𝑨(𝜆) = 𝑩ℓ (𝜆)(𝜆𝑰 − 𝑹ℓ ) ⟹ 𝑨(𝑳ℓ ) = 𝑳ℓ 𝑩ℓ (𝑳ℓ ) − 𝑩ℓ (𝑳ℓ )𝑹ℓ = 𝟎 ⟹ 𝑳ℓ = 𝑩ℓ (𝑳ℓ )𝑹ℓ 𝑩−1


ℓ (𝑳ℓ )

In general if we define 𝑨(𝜆) = 𝑩𝑘 (𝜆)(𝜆𝑰 − 𝑹𝑘 ) then the left evaluation of 𝑩𝑘 (𝜆) at 𝑳𝑘 will give
𝑩𝑘 (𝑳𝑘 ) ≠ 𝟎
𝑩𝑘 (𝑳𝑘 ) = 𝑳ℓ−1 ℓ−2
𝑘 𝑩𝑘0 + 𝑳𝑘 𝑩𝑘1 +. . . +𝑳𝑘 𝑩𝑘(ℓ−2) + 𝑩𝑘(ℓ−1)
ℓ−1 −1 ℓ−2
= 𝑸−1 −1
𝑘 𝑹𝑘 𝑸𝑘 𝑩𝑘0 + 𝑸𝑘 𝑹𝑘 𝑸𝑘 𝑩𝑘1 +. . . +𝑸𝑘 𝑹𝑘 𝑸𝑘 𝑩𝑘(ℓ−2) + 𝑩𝑘(ℓ−1)

𝑸𝑘 𝑩𝑘 (𝑳𝑘 ) = 𝑹ℓ−1 ℓ−2


𝑘 𝑸𝑘 𝑩𝑘0 + 𝑹𝑘 𝑸𝑘 𝑩𝑘1 +. . . +𝑹𝑘 𝑸𝑘 𝑩𝑘(ℓ−2) + 𝑸𝑘 𝑩𝑘(ℓ−1)

−1 ℓ−1
Notice that 𝑨(𝜆) = 𝑩𝑘 (𝜆)(𝜆𝑰 − 𝑹𝑘 ) = 𝜆𝑩𝑘 (𝜆) − 𝑩𝑘 (𝜆)𝑹𝑘 ⟹ 𝑳𝑘 = 𝑩𝑘 (𝑳𝑘 )𝑹𝑘 𝑩−1
𝑘 (𝑳𝑘 ) = 𝑸𝑘 𝑹𝑘 𝑸𝑘

From this last observation we conclude that 𝑸𝑘 𝑩𝑘 (𝑳𝑘 ) = 𝑰 which implies that
ℓ−1
𝑹ℓ−1 ℓ−2
𝑘 𝑸𝑘 𝑩𝑘0 + 𝑹𝑘 𝑸𝑘 𝑩𝑘1 +. . . +𝑹𝑘 𝑸𝑘 𝑩𝑘(ℓ−2) + 𝑸𝑘 𝑩𝑘(ℓ−1) = 𝑰 ⟺ ∑ 𝑹ℓ−1−𝑖
𝑘 𝑸𝑘 𝑩𝑘𝑖 = 𝑰
𝑖=0

The algorithm is summarized by the help of Kronecker operator


𝑗
𝑗−𝑖
𝑩𝑘𝑗 = ∑ 𝑨𝑖 𝑹𝑘 𝑗 = 0,2, … , ℓ − 1
𝑖=0
ℓ−1
vec(𝑸𝑘 ) = (𝑮−1 (𝑹𝑘 ))vec(𝑰) with 𝑮(𝑹𝑘 ) = (∑ 𝑩𝑇𝑘𝑖 ⨂𝑹ℓ−1−𝑖
𝑘 )
𝑖=0
𝑳𝑘 = 𝑸−1
𝑘 𝑹𝑘 𝑸𝑘

Algorithm: (right solvents 𝑹𝑖 to left solvents 𝑳𝑖 )


Given 𝑨1 , 𝑨2 , … 𝑨ℓ and 𝑹1 , 𝑹2 , … 𝑹ℓ and 𝑩0 = 𝑰𝑚

For 𝑘 = 1: ℓ
For 𝑖=1: ℓ −1
𝑩𝑖 = 𝑩0 𝑨𝑖 + 𝑩𝑖−1 𝑹𝑘
−1
∑ℓ−1 ℓ−1−𝑖
𝑖=0 𝑹𝑘 𝑸𝑘 𝑩𝑖 = 𝑰𝑚 or vec(𝑸𝑘 ) = (∑ℓ−1 𝑇
𝑖=0 𝑩𝑖 ⨂𝑹
ℓ−1−𝑖
) vec(𝑰𝑚 )
End
𝑳𝑘 = 𝑸−1
𝑘 𝑹𝑘 𝑸𝑘
End
In the same fashion, the algorithm which transforms the left solvent 𝑳 to the right
solvent 𝑹 is as follows: 𝑨(𝜆) = (𝜆𝑰 − 𝑳1 )𝑪(𝜆) where 𝑪(𝜆) = 𝑪0 𝜆ℓ−1 +. . . +𝑪ℓ−2 𝜆 + 𝑪ℓ−1 & 𝑪0 = 𝑰
ℓ−1 𝑇
𝑯(𝑳𝑘 ) = (∑ (𝑳ℓ−1−𝑖
𝑘 ) ⨂𝑪𝑘𝑖 ) and vec(𝑷𝑘 ) = (𝑯−1 (𝑳𝑘 ))vec(𝑰)
𝑖=0
𝑹𝑘 = 𝑸𝑘 𝑳𝑘 𝑸−1
𝑘
Example The following numerical results will clarify the: 𝑹𝑖 to left solvents 𝑳𝑖 algorithm

clear all, clc, format('short','e'), Z=zeros(2,2); I=eye(2,2);


A0 =[1 0;0 1]; A1 =[-5.5200 -1.3600;-2.0267 -8.4800];
A2=[5.40 14.16;8.5333 22.5467];
A3=[-0.880 -22.880;-6.50667 -19.17333]; R1=[1 2;0 3];
R2=[-1 0;1 2]; R3=[5 -1;2 4];
%-------------------------------------
B01=I; B11=A1+R1; B21=A2+B11*R1; K1=[];
Vec_Q1=inv(kron(B01',R1^2) + kron(B11',R1) + kron(B21',I))*[1;0;0;1];
Q1=[Vec_Q1(1:2) , Vec_Q1(3:4)]; L1=inv(Q1)*R1*Q1;
%-------------------------------------
B02=I; B12=A1+R2; B22=A2+B12*R2;
Vec_Q2=inv(kron(B02',R2^2) + kron(B12',R2) + kron(B22',I))*[1;0;0;1];
Q2=[Vec_Q2(1:2) , Vec_Q2(3:4)]; L2=inv(Q2)*R2*Q2;
%-------------------------------------
B03=I; B13=A1+R3; B23=A2+B13*R3;
Vec_Q3=inv(kron(B03',R3^2) + kron(B13',R3) + kron(B23',I))*[1;0;0;1];
Q3=[Vec_Q3(1:2) , Vec_Q3(3:4)]; L3=inv(Q3)*R3*Q3;

% result
>> L1=[1.44296, 3.82206;0.18046, 2.55704]
>> L2=[-0.080062, 1.560043;1.226587, 1.080062]
>> L3=[3.5200, -2.6400;1.0267, 5.4800]

Matrix Polynomials—Evaluation, Interpolation and Inversion:


In many
problems in systems theory, (Barnett [1971], Rosenbrock [1970]) we encounter matrices
(called “Polynomial matrices”) whose elements are polynomials over the field of rationals.
The inversion of such matrices becomes an interesting subject due to the systems theory
demand. In this section, we describe a procedure known as the evaluation-interpolation
method which is suitable for exact inversion of polynomial matrices. This method
consists in evaluating the given polynomial matrix at certain specified points (elements of
a field or a ring).

Let 𝑨(𝜆) have a complete set of solvents, 𝑹1 , … . , 𝑹ℓ such that 𝑽(𝑹1 , … . , 𝑹ℓ ) is nonsingular.
According to the previous results about fundamental matrix polynomials

(𝜆𝑰 − 𝑹𝑘 )𝑴𝑘 (𝜆) = 𝑴1(𝑘) 𝑨(𝜆) ⟹ 𝑴𝑘 (𝜆) = (𝜆𝑰 − 𝑹𝑘 )−1 𝑴1(𝑘) 𝑨(𝜆)
ℓ ℓ
⟹∑ 𝑴𝑘 (𝜆) = (∑ (𝜆𝑰 − 𝑹𝑘 )−1 𝑴1(𝑘) ) 𝑨(𝜆)
𝑘=1 𝑘=1

since 𝑴𝑖 (𝑹𝑗 ) = 𝛿𝑖𝑗 𝑰 where 𝛿𝑖𝑗 is the Kronecker delta.

𝑰 𝑖=𝑗
𝑴𝑖 (𝑹𝑗 ) = {
𝟎 𝑖≠𝑗
(𝑘)
This implies that ∑ℓ𝑘=1 𝑴𝑘 (𝜆) = 𝑰 therefore 𝑨−1 (𝜆) = (∑ℓ𝑘=1(𝜆𝑰 − 𝑹𝑘 )−1 𝑴1 )
Note This form of 𝑨−1 (𝜆) provides a block partial-fraction expansion about the right
solvent of 𝑨(𝜆). Also the researchers Dennis, Traub, and Webber have shown that the
(𝑘) −1 (𝑘)
left solvent 𝑳𝑘 can be obtained from the right by 𝑳𝑘 = (𝑴1 ) 𝑹𝑘 𝑴1 which would lead to
(𝑘)
a block partial-fraction expansion about the left solvent 𝑨−1 (𝜆) = (∑ℓ𝑘=1 𝑴1 (𝜆𝑰 − 𝑳𝑘 )−1 )
(𝑘) (𝑘) (𝑘) (𝑘) −1
starting from 𝑨−1 (𝜆) = (∑ℓ𝑘=1(𝜆𝑰 − 𝑹𝑘 )−1 𝑴1 ) = (∑ℓ𝑘=1 𝑴1 (𝑴1 ) (𝜆𝑰 − 𝑹𝑘 )−1 𝑴1 ) which
can be written as
ℓ −1 ℓ
−1 (𝜆) (𝑘) (𝑘) −1 (𝑘) (𝑘)
𝑨 =∑ 𝑴1 (𝜆𝑰 − (𝑴1 ) 𝑹𝑘 𝑴1 ) = (∑ 𝑴1 (𝜆𝑰 − 𝑳𝑘 )−1 )
𝑘=1 𝑘=1

Note The fundamental matrix polynomials are sometimes called the fundamental
interpolating polynomials for a 𝜆–matrix.
(𝑘)
But, how to compute the coefficients 𝑴𝑖 of the interpolating polynomials ? To answer
such question we consider 𝑴𝑖 (𝑹𝑗 ) = 𝛿𝑖𝑗 𝑰 which can be written in more compact form

(𝑖) (𝑖) (𝑖) (𝑖)


𝑴𝑖 (𝑹𝑗 ) = 𝑴1 𝑹𝑗ℓ−1 + 𝑴2 𝑹𝑗ℓ−2 + ⋯ + 𝑴ℓ−1 𝑹𝑗 + 𝑴ℓ

𝑴ℓ
(1) ⋯ 𝑴 (1)
𝑴1
(1) 𝑰 𝑰 ⋯ 𝑰 𝑰 𝟎 ⋯ 𝟎
2 𝑹 1 𝑹 2 𝑹ℓ 𝟎 𝑰 𝟎
(2) (2) (2)
𝑴ℓ ⋮ 𝑴2 𝑴1 𝑹12 𝑹22 ⋮ 𝑹2ℓ = 𝟎 𝟎 ⋮ 𝟎
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
(ℓ) (ℓ) (ℓ)
(𝑴ℓ … 𝑴2 𝑴1 ) (𝑹1ℓ−1 𝑹ℓ−12 … 𝑹 ℓ−1
ℓ ) ( 𝟎 𝟎 … 𝑰)

−1
𝑴ℓ
(1) ⋯ 𝑴2
(1)
𝑴1
(1) 𝑰 𝑰 ⋯ 𝑰
𝑹1 𝑹2 𝑹ℓ
(2) (2) (2)
⟹ 𝑴ℓ ⋮ 𝑴2 𝑴1 = 𝑹12 𝑹22 ⋮ 𝑹2ℓ
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
(ℓ) (ℓ) (ℓ)
(𝑴ℓ … 𝑴2 𝑴1 ) (𝑹1ℓ−1 𝑹ℓ−1
2 … 𝑹ℓ−1
ℓ )

(𝑘)
The leading coefficients 𝑴1 of the interpolating polynomials can be obtained from 𝑹𝑖 ′𝑠 by
−1
𝑰
(1)
𝑴1 𝑰 ⋯ 𝑰 𝟎
𝑹 1 𝑹2 𝑹ℓ 𝟎
(2)
𝒀= 𝑴1 = 𝑹12 𝑹22 ⋮ 𝑹2ℓ 𝟎
⋮ ⋮ ⋮ ⋮ ⋮
(ℓ) ℓ−1 ℓ−1
(𝑴1 ) (𝑹1 𝑹2 … 𝑹ℓ−1
ℓ ) 𝑰
( )

In the same fashion 𝑴𝑖 (𝑳𝑗 ) = 𝛿𝑖𝑗 𝑰, we can obtain those matrices in terms of left solvents 𝑳𝑖

(𝑖) (𝑖) (𝑖) (𝑖)


𝑴𝑖 (𝑳𝑗 ) = 𝑳𝑗ℓ−1 𝑴1 + 𝑳𝑗ℓ−2 𝑴2 + ⋯ + 𝑳𝑗 𝑴ℓ−1 + 𝑴ℓ

(1) (2) (ℓ) ⋯ 𝟎
𝑰 𝑳1 ⋯ 𝑳1ℓ−1 𝑴ℓ 𝑴ℓ ⋯ 𝑴ℓ 𝑰 𝟎
⋮ ⋮ ⋮ ⋮ 𝟎 𝑰 𝟎
𝑰 𝑳2 ⋯ 𝑳ℓ−1
2 = 𝟎
(1) (2) (2) 𝟎 ⋮ 𝟎
⋮ ⋮ ⋮ ⋮ 𝑴2 𝑴2 ⋯ 𝑴1
⋮ ⋮ ⋮
𝑰 𝑳ℓ … 𝑳ℓ−1 (1) (2) (ℓ)
𝟎 𝟎 … 𝑰)
( ℓ ) ( 𝑴1 𝑴1 … 𝑴1 ) (
(𝑘)
The leading coefficients 𝑴1 of the interpolating polynomials can be obtained from 𝑳𝑖 ′𝑠 by
−1
𝑰 𝑳1 ⋯ 𝑳1ℓ−1
[𝑴1(1) (2) (ℓ) 𝑰 𝑳2 ⋯ 𝑳ℓ−1
𝑴1 … 𝑴1 ] = [𝟎 𝟎 … 𝑰] 2
⋮ ⋮ ⋮ ⋮
𝑰 𝑳ℓ … 𝑳ℓ−1

( )
Remark: In the book of Matrix Polynomials in Automatic Control we have demonstrated
that 𝜦𝐿 = 𝑻−1
𝐿 𝑨𝑐 𝑻𝐿 = blkdiag(𝑳1 , 𝑳2 , … , 𝑳ℓ ) where

𝑻𝐿1 𝑻𝐿1 = [𝟎 𝑰]𝑾−1


𝟎 … 𝑐
𝑻𝐿1 𝜦𝐿
𝑻𝐿 = 𝑻𝐿1 𝜦2𝐿 with 𝑾𝑐 = [𝑩𝐿 ⋮ 𝑩𝐿 𝜦𝐿 ⋮ ⋯ ⋮ 𝑩𝐿 𝜦ℓ−1
𝐿 ]

( 𝐿1 ℓ−1
𝑻 𝜦 𝐿 ) { 𝑩𝐿 = [𝟎 𝟎 … 𝑰 ]𝑇

−1
It can be checked that 𝑾𝑐 = 𝑽(𝑳1 , 𝑳2 , … , 𝑳ℓ ) so 𝑻𝐿1 = [𝟎 𝟎 … 𝑰](𝑽(𝑳1 , 𝑳2 , … , 𝑳ℓ )) or

𝑻𝐿1 = [𝑴1(1) 𝑴1
(2) (ℓ)
… 𝑴1 ]

Again the similarity transformation can be rewritten as


(1) (2) ⋯ (ℓ)
𝑴1 𝑴1 𝑴1 (𝑘)
(1) (2) (ℓ) 𝑴1
𝑴1 𝑳1 𝑴1 𝑳2 𝑴1 𝑳ℓ (𝑘)
𝑻𝐿 = (1) (2) ⋮ (ℓ) = [𝑿𝐿1 ⋮ … ⋮ 𝑿𝐿ℓ ] & 𝑿𝐿𝑘 = 𝑴1 𝑳2
𝑴1 𝑳12 𝑴1 𝑳22 𝑴1 𝑳2ℓ ⋮
⋮ ⋮ ⋮ (𝑘) ℓ−1
(1) (2)
… (ℓ) (𝑴1 𝑳2 )
(𝑴1 𝑳1ℓ−1 𝑴1 𝑳ℓ−1
2 𝑴1 𝑳ℓ−1
ℓ )

So 𝑨𝑐 𝑻𝐿 = 𝜦𝐿 𝑻𝐿 ⟺ 𝑨𝑐 𝑿𝐿𝑘 = 𝑿𝐿𝑘 𝑳𝑘 . Now in terms of right bock eigenvectors of 𝑨𝑐 one can


𝐵𝑇
write 𝑨𝑐 𝑿𝑅𝑘 = 𝑿𝑅𝑘 𝑹𝑘 with 𝑿𝑅𝑘 = (𝑰 ⋮ 𝑹𝑘 ⋮ 𝑹2𝑘 ⋮ ⋯ ⋮ 𝑹ℓ−1
𝑘 )

𝑨𝑐 𝑿𝐿𝑘 = 𝑿𝐿𝑘 𝑳𝑘
(𝑘) −1 (𝑘)
{ and ⟹ 𝑳𝑘 = (𝑴1 ) 𝑹𝑘 𝑴1
𝑨𝑐 𝑿𝑅𝑘 = 𝑿𝑅𝑘 𝑹𝑘

(𝑘) ℓ
Remark: The set (𝑿, 𝑻, 𝒀) where 𝑿 = [𝑰 ⋮ … ⋮ 𝑰], 𝒀 = col(𝑴1 ) and 𝑻 = 𝜦𝑅 is a standard
𝑘=1
triple for the matrix polynomial 𝑨(𝜆) because
−1
𝜆𝑰𝑚 − 𝑹1 𝟎 ⋯ 𝟎 𝑴1
(1)

ℓ 𝟎 𝜆𝑰𝑚 − 𝑹2 𝟎 (2)
(𝑘)
𝑨−1 (𝜆) = (∑ (𝜆𝑰 − 𝑹𝑘 )−1 𝑴1 ) = [𝑰 ⋮ … ⋮ 𝑰] 𝟎 ⋮ 𝟎 𝑴1
𝟎
𝑘=1
⋮ ⋮ ⋮ ⋮
(ℓ)
( 𝟎 𝟎 … 𝜆𝑰𝑚 − 𝑹ℓ ) (𝑴1 )
= 𝑿(𝜆𝑰 − 𝑻)−1 𝒀
(𝑘)
Since 𝑴1 are obtained from the right solvents, then the complete data of the Eigen-
structure of 𝑨(𝜆) is covered by 𝑹𝑘 .

 In case of repeated Block roots we orient the reader to see Hariche K 1987.
Obtaining Latent Structure From Eigen-Structure: The standard approach solve
latent value problem is to reduce the matrix differential equation to a generalized
eigenproblem (GEP) 𝑮𝐯 = 𝜆𝑯𝐯 of higher dimension, This is the linearization process and
the “linearized” problem can be further converted to a standard eigenvalue problem.

Let 𝑨(𝜆) = ∑ℓ𝑘=0 𝑨𝑘 𝜆ℓ−𝑘 be an 𝑚 × 𝑚 matrix polynomial and consider the equation 𝑨(𝜆)𝐱 = 𝟎
This equation is nonlinear in 𝜆. However, by putting 𝐱1 = 𝐱, 𝐱 2 = 𝜆𝐱1 , … 𝐱 ℓ = 𝜆𝐱 ℓ−1 the
matrix differential equation reduces to (𝜆𝑨0 𝐱 ℓ + ∑ℓ−1
𝑘=0 𝑨ℓ−𝑘 𝐱 𝑘+1 ) = 𝟎 or in expanded form

𝜆𝑨0 𝐱 ℓ + 𝑨1 𝐱 ℓ + 𝑨2 𝐱 ℓ−1 + ⋯ + 𝑨ℓ−1 𝐱 2 + 𝑨ℓ 𝐱1 = 𝟎

which is linear in 𝜆. (This is the standard method of reduction of ℓ𝑡ℎ degree linear
differential equation to a system of first order linear differential equations) The system of
equations (𝜆𝑨0 𝐱 ℓ + ∑ℓ−1
𝑘=0 𝑨ℓ−𝑘 𝐱 𝑘+1 ) = 𝟎 can also be written in the form

𝑰 𝟎 ⋯ 𝟎 𝑶 𝑰 ⋯ 𝑶 𝑶
𝟎 𝑶 ⋱ ⋮ 𝐱1
𝟎 𝑰 𝑶 ⋮
⋮ ⋮ 𝐱2
𝜆 𝟎 𝟎 ⋮ 𝟎 − ⋮ 𝑰 ( ) = 𝟎 ⟺ 𝑪(𝜆)𝑿 = 𝟎
𝑶 ⋱ ⋮
⋮ ⋮ ⋮ 𝑶 𝑶 𝑰 𝐱ℓ
{ (𝟎 𝟎 … 𝑨0 ) (−𝑨ℓ −𝑨ℓ−1 ⋯ −𝑨2 −𝑨1 )}

Where 𝑪(𝜆) = 𝜆𝑬𝑐 − 𝑨𝑐 is a linear 𝑚ℓ × 𝑚ℓ matrix polynomial called the companion


polynomial of 𝑨(𝜆). This is a generalized Eigen-problem which can be solved by the QZ
method "i.e. the generalized Schur method". If 𝑨(𝜆) is monic matrix polynomila we get an
equivalent linearization 𝑪(𝜆) = 𝜆𝑰 − 𝑨𝑐 and in such case we arrive at standard eigenvalue
problem.

Now, the natural question that we can ask, is: What is the relationship between latent
vectors of 𝑨(𝜆) and eigenvectors of 𝑨𝑐 ?

𝐯1
Let (𝜆, 𝑽𝑐 )be the Eigen-pair of 𝑨𝑐 where 𝑽𝑐 = ( ⋮ ) so 𝑨𝑐 𝑽𝑐 = 𝜆𝑽𝑐 , if we expand this
𝐯ℓ
equation we get 𝜆𝐯ℓ + 𝑨1 𝐯ℓ + 𝑨2 𝐯ℓ−1 + ⋯ + 𝑨ℓ−1 𝐯2 + 𝑨ℓ 𝐯1 = 𝟎, we define the following chain
of recurrence 𝐯𝑘 = 𝜆𝐯𝑘−1 therefore we get

(𝑰𝜆ℓ + 𝑨1 𝜆ℓ−1 + 𝑨2 𝜆ℓ−2 + ⋯ + 𝑨ℓ−1 𝜆 + 𝑨ℓ )𝐯1 = 𝟎 ⟺ 𝑨(𝜆)𝐯1 = 𝟎

From this last equation we conclude that if (𝜆, 𝑽𝑐 ) is an Eigen-pair of 𝑨𝑐 then (𝜆, 𝐯) is an
Eigen-pair of 𝑨(𝜆) where 𝐯 = [𝑰 ⋮ … ⋮ 𝟎]𝑽𝑐 .

Remark: If 𝑽 is an eigenvector of any general form matrix 𝑨 = 𝑻−1


𝑐 𝑨𝑐 𝑻𝑐 that is 𝑽𝑐 = 𝑻𝑐 𝑽
then 𝐯 = [𝑰 ⋮ … ⋮ 𝟎]𝑻𝑐 𝑽 = 𝑻𝑐1 𝑽.
Exercises

Ex: 01 Let 𝑹1 , 𝑹2 , 𝑳1 & 𝑳2 be real matrices such that (𝑹2 − 𝑹1 ) and (𝑳2 − 𝑳1 ) are
nonsingular and let
𝑰 𝑰 𝑰 𝑳1
𝑽𝑅 = ( ), 𝑽𝐿 = ( )
𝑹1 𝑹2 𝑰 𝑳2
Prove that
(𝑹2 − 𝑹1 )−1 𝑹2 −(𝑹2 − 𝑹1 )−1 𝑳2 (𝑳2 − 𝑳1 )−1 −𝑳1 (𝑳2 − 𝑳1 )−1
𝑽−1
𝑅 = ( ) , 𝑽 −1
𝐿 = ( )
−(𝑹2 − 𝑹1 )−1 𝑹1 (𝑹2 − 𝑹1 )−1 −(𝑳2 − 𝑳1 )−1 (𝑳2 − 𝑳1 )−1

Ex: 02 Given a three matrices 𝑨 ∈ ℝ(𝑚+𝑛)×(𝑚+𝑛) , 𝑳 ∈ ℝ𝑛×𝑚 & 𝑯 ∈ ℝ𝑚×𝑛 and define a block
matrix
𝑰 − 𝑯𝑳 −𝑯
𝑴=( 𝑚 )
𝑳 𝑰𝑛

■ Prove that 𝑯(𝑰𝑛 ± 𝑳𝑯)−1 = (𝑰𝑚 ± 𝑯𝑳)−1 𝑯 and 𝑳(𝑰𝑛 ± 𝑯𝑳)−1 = (𝑰𝑚 ± 𝑳𝑯)−1 𝑳
𝑰 𝑯
■ Prove that 𝑵 = ( 𝑚 ) is the inverse of 𝑴
−𝑳 𝑰𝑛 − 𝑳𝑯
■ Find a condition on 𝑳 & 𝑯 such that 𝜦 = 𝑴𝑨𝑵 is block diagonal matrix

Ex: 03 Given a matrix polynomial of a 2nd degree 𝑫(𝜆) = 𝑫0 𝜆2 + 𝑫1 𝜆 + 𝑫2 Let 𝑸1 , 𝑸2 be a


spectral factors of 𝑫(𝜆) that is 𝑫(𝜆) = (𝜆𝑰𝑚 − 𝑸1 )(𝜆𝑰𝑚 − 𝑸2 ).
■ Prove that 𝑸2 (the most right spectral factor) is a right solvent of 𝑫(𝜆).
■ Prove that 𝑸1 (the most left spectral factor) is a left solvent of 𝑫(𝜆).
■ Let 𝑹 be a right solvent of 𝑫(𝜆), prove that 𝑸1 = 𝑫2 𝑹−1 & 𝑸2 = −𝑫1 − 𝑫2 𝑹−1 = 𝑹
■ Let 𝑳 be a left solvent of 𝑫(𝜆), prove that 𝑸2 = 𝑳−1 𝑫2 & 𝑸1 = −𝑫1 − 𝑳−1 𝑫2 = 𝑳

−7 2 −3 2
Ex: 04 As a numerical exemplification let 𝑹1 = [ ], 𝑹2 = [ ]
−2 −2 −1 0

𝑫 𝑰 𝟎 𝑰 −1
■ Find 𝑫1 and 𝑫2 and prove that [ 1 ]=[ ] ■ Find 𝑳1 and 𝑳2
𝑰 𝟎 𝑰 −𝑫1
■ Check that
−6 4 −2 0
−1 −2
1 1 −4
𝑽−1
𝑅 = 8( ) Compute the following 𝜞1 = −[𝑹12 𝑹22 ]𝑽−1
𝑅
14 −4 2 0
1 10 −1 4
■ Check that
20 −16 −12 16 2
4 4 −4 4 −1 𝑳1
𝑽−1
𝐿 =( ) Compute the following 𝜞2 = −𝑽𝐿 [ 2 ]
2 0 −2 0 𝑳2
−1 4 1 −4
20 −16 11 −6
■ Prove that 𝜞1 = 𝜞𝐵𝑇
2 =( ) where stands for the block transpose.
17/2 −5 7/2 1

■ Check the following 𝑱 = 𝑺−1 −1


𝑅 𝑨𝑅 𝑺𝑅 = 𝑺𝐿 𝑨𝐿 𝑺𝐿 where 𝑱 is the Jordan form and

−1 2 0 0 −1/2 1 0 0
3/2 −2 0 0
𝑺−1
𝑅 = ( 2 −1 0 0 ) & 𝑺−1
𝐿 =( )
0 0 −1 2 0 0 −1/6 1/3
0 0 1 −1 0 0 5/6 −2/3
References

[1] Householder A, The Theory of Matrices in Numerical Analysis, Blaisdell, New York, 1964.
[2] Lancaster P. Lambda-matrices and Vibrating Systems, Pergamon Press, New York, 1966
[3] MacDuffee C., The Theory of Matrices. Chelsea, New York, 1946 (Berlin 1933)
[4] Peters, G. and Wilkinson J, 𝑨𝑿 = 𝜆𝑩𝑿 and the Generalized Eigenproblem, SIAM J. Num. Anal,
7, 1970, pp. 479-492
[5] Gantmacher, The Theory of Matrices. I-II Chelsea Publishing Co, New York, 1960.
[6] N. Cohen, On spectral analysis of matrix polynomials, M.Sc. Thesis, Weizmann Institute of
Science, Tel-Aviv, Israel, 1979
[7] N. Cohen, Spectral analysis of regular matrix polynomials, Integral Equations and Operator
Theory, 6(1983), 161-183
[8] N. Cohen, 2×2 monic irreducible matrix polynomials, Linear Multilinear Algebra 23(4), 325-
331 (1988)
[9] I. Gohberg, P. Lancaster and L. Rodman, Spectral analysis of matrix polynomials, II. The
resolvent form and spectral divisors, Linear Algebra and Appl. 21 (1976), 65-88
[10] T. Kailath, Linear Systems Prentice-Hall, New Jersey, 1980.
[11] R. Kalman, R. E. (1962). Mathematical description of linear dynamical systems, Center for
Control Theory, Research Institute for Advanced Study (RIAS), Maryland, Technical Report 62-
18, 1-79, also (1963) SIAM Journal on Control 1(2), 152192
[12] I. Gohberg and L. Rodman, On spectral structure of monic matrix polynomials and extension
problems, Linear Algebra and Appl. 24 (1979), 157-172.
[13] P. Lancaster, M. Tismenetsky, The Theory of Matrices, 2nd Edition, Academic Press, New
York, 1985.
[14] Turnbull, H. W. The Theory of Determinants, Matrices, and Invariants, London (1928)
[15] Muir, Sir Thomas. The Theory of Determinants, Vols. I-IV. London (1906-1923).
[16] Muir, Sir Thomas. Contributions to the History of Determinants, 1900-1920. London (1930).
[17] Frazer, R. A. And Duncan, W. J. "The Flutter of Aeroplane Wings" R. & M. 1155 (1928).
[18] Frazer, R. A., Jones, W. P. And Skan, S. W. "Approximations to Functions and to the
Solutions of Differential Equations" R. & M. 1799 (1938)
[19] Frazer, R. A. "Disturbed Unsteady Motion, and the Numerical Solution of Linear Ordinary
Differential Equations." Report T. 3179 (1931).
[20] Frazer, R. A. Elementary Matrices And Some Applications To Dynamics And Differential
Equations, Cambridge At The University Press 1963
[21] Sadri Hassani. Mathematical Physics A Modern Introduction to Its Foundations, Second
Edition-New York, Switzerland 1999, 2013.

You might also like