You can find part 1 of this series below.

Article focuses of Tony Hsieh – the founder of Zappos.

Hsieh noticed that a lot of other entrepreneurs usually do all sorts of crazy things. They also tend to try a lot of different things at a pretty young age.

He thought of failure as being part of the path to success: “I think entrepreneurs view the failures as getting one step closer to the success. As opposed to oh, like I’m a failure.”

A key part of Zappos success lies in their culture and their obsession with making customers happy.

When hiring, they pay special attention to their 10 core values:

Deliver Wow Through Service

Embrace and Drive Change

Create Fun and a Little Weirdness

Be Adventurous, Creative and Open-Minded

Pursue Growth and Learning

Build Open and Honest Relationships with Communication

Build a Positive Team and Family Spirit

Do More with Less

Be Passionate and Determined

Be Humble

Zappos has a 4-week training program for everyone. No matter what the role, all employees were asked to spend 2 weeks on the phone with customers.

They don’t (or didn’t) tolerate employees who strayed from their culture: any employees that didn’t fit in were paid to leave.

Zappos really struggled during the dot-com bust: had trouble with getting enough cash-flow and almost went bankrupt, but persevered.

Wanted employees to be who they are and to not be scared of sharing who they really were. The idea is that people who are themselves at work aren’t scared of sharing new ideas and being creative.

Research shows that best companies over the long-term are ones that can combine profits, passion and purpose. Hsieh’s focus was on delivering happiness, not just to Zappos employees but to investors and customers as well.

Brian Chesky (the founder of AirBnb) asked Peter Thiel in 2012 what the single most important piece of advice was for him and has answer was: “Don’t fuck up the culture.”

Culture is simply a shared way of doing something with passion.

The culture is the foundation of a company.

If you break the culture, you break the machine that makes your product.

**How do you build a culture?**By upholding your core values in everything you do. When you hire, when you write an email, when you’re working on something – it must all be rooted in your culture.Why this is important: when you have a strong culture, you don’t have to rely on process. You can trust your team to do the right thing.

Problems come and go but culture is forever and something that you’ll be remembered for.

Stripe has a culture based on transparency and communication.

E-mails are sent to either the whole company or a team: this fosters inclusion and communication. Everyone knows what’s going on and everyone is attached and encouraged to give input. Other companies preach fearless communication. Stripe practices it.

Once a week they have an all-hands meeting where each team explains what happened during the week prior and at the end of the meeting. They discuss FUD (fear, uncertainty and doubt). This prevents problems from being hidden and encourages everyone to think about them and try to solve them.

Meetings are expensive, so they try to limit them (weekly meeting is the only mandatory meeting they have).

They all have lunch together once a day. They also have social events together (like going to the theater) and they have company hackathons twice a year where they all go away on a trip and work on new Stripe-related projects.

Every single engineer does support on a bi-weekly rotation. Even the founders take part in it. This is one of the best ways to learn about the company and customers.

They use chat often and communicate asynchronously to prevent interruptions. Also employees are prompted to post what they’re working on every few hours in the chat (keeps communication channels open).

Once a week – one team takes a technical paper and discusses it over lunch.

Employees are asked to describe their ‘dream machine’ and find that it’s waiting for them on their first day of work (i.e. give your employees the right tools to do their jobs).

During the first few days, new employees are asked to work with different engineering groups (great way to get familiar with everyone in the company and what they do).

Organization is very flat: they don’t have managers / hierarchy and focus on hiring great people. People are also encouraged to be generalists and to not be afraid of exploring different domains.

After raising money, the next biggest problems startups face is hiring. Hiring is really hard and really important. In fact, it’s most likely the most important thing a founder does.

If you don’t hire very well, you will not be successful. Companies are a product of the team and founders who build them.

Some advice on hiring:

Spend more time doing it: most founders should spend between one third to one half their time hiring. Greatest companies always have great people. You can’t outsource this either.

Get your hands dirty: hiring for a sales role? Spend some time doing it / understanding it and then hire.

Look for smart, effective people. Ask them what they’ve done during the last month or what their average day looks like – as well as check their references. You want smart people that get things done.

Have people do a ‘work sample test.’ Instead of hiring them right away, pay them to do the work you’re hiring them to do for 1-2 days to see how they perform.

Sourcing candidates: focus on using your personal networks more. Look for friends and friends of friends. Often to get great people, you have to poach. Also, use your investors and their network to find candidates.

You need a mission in order to hire good people. The candidates you hire need to believe in the mission.

Hire people that you like (would you be willing to spend a Sunday working with them all day if you had to do so)?

Have a set of cultural values you hire for. Make sure everyone in your company knows what they are and believes in them. “Values are a decision making framework that empower individuals to make the decision that you, the founder, would make, in situations where there are conflicting interests (e.g. growth vs. customer satisfaction)”. - Andrew Mason

Don’t compromise. Only hire great people that fit your culture. A bad hire can ruin a company. Great people attract other great people. Getting someone mediocre can kill this phenomenon.

Be generous with compensation – especially when it comes to giving equity. You should give out below or fair market salaries with equity (ownership) of the company to great people and they should be happy to take it.

Trust your gut and always be on the lookout for new hires.

Fire fast: many founders don’t fire fast enough. If the person doesn’t fit, it’s better to let go of them sooner than waiting.

After you hire people, you need to keep them. Remember to check in with people, be a good manager, have regular all hands meetings, make sure people are happy and challenged, etc. Always keep a sense of momentum at your company. Give people new roles every six months or so.

Always be identifying and promoting new talent.

Get familiar with these 6 principles which are:

Reciprocity (the practice of exchanging things with others for mutual benefit).

Commitment and consistency (feeling the need to be consistent in our beliefs).

Social proof (when people are uncertain, they'll look to others for behavioral guidance).

Authority (people follow the lead of credible, knowledgeable experts).

Liking (people prefer to work with and support those that they like).

Scarcity (the rarer or more difficult the products are perceived to be, the more valuable they are).

This is a really good interview with Marc Andreessen that discusses a few topics primarily dealing with the adoption of technology / software and how it’s going to change the world.

Talks about how the world goes in cycles: 2001 internet bust brought about extreme pessimism for technology / online businesses, but many turned out to be wrong in their assessment. Businesses are never easy to run, but investors seem to be attracted in finding ‘easy’ businesses; but then they find out that running them isn’t so easy and they flock to different industries / segments at different times. The only ‘easy’ business might be Google and that’s a very rare exception - most businesses are hard to establish and run.

Some predictions for the future?

A lot of industries prior to the internet boom had a lot of gate-keeping: unless you were a large company, you had limited resources and you couldn’t compete. Today, this is different, and being small is almost an advantage to being large. Predicts that small companies will become more predominant.

CAPEX to OPEX: before, you had to buy equipment and servers etc.. to run a business, but these days everyone can just rent (i.e. use Amazon / cloud computing). This enables lower barriers to entry for companies and the need to budget less (in the short term anyways).

A lot more enterprise companies are going to use SaaS going forward.

Next companies he believes will bring a great return? Companies which offer outsourced work (like TaskRabbit and Zaarly). He’s also big on enterprise education (many companies can’t find qualified employees so he thinks online education is the solution).

Big technology markets actually tend to be winner take all. Winners in technology tend to have 90 percent market share. In normal markets though (like Pepsi and Coke) it works very differently (Coke had only ~10% of the total market at the time).

Since this is the case, many competitors or companies in tech space tend to be over-founded and go bust after a period of time. The same will happen in VC space: it will have a few winners which take the whole market while many other competitors will lose.

Market is cyclical and investors tend to act in waves: Once capital starts flowing in a rotation, all capital leaves one sector and goes into another one (raising stock prices) and everyone follows due to rising prices and wanting to make money. Down cycle is similar. A smart investor ignores these waves and simply invests in good entrepreneurs.

Like entrepreneurs with vision, knowledge and persistence and who say: “I have this really good idea and I know it’s a good idea for the following eight reasons, and I have thought about it and I have worked in the field, and I know what I am doing, and I have talked to the customers and I have figured it out, and I am going to do it. I am just going to flat-out do it. And I am going to do it whether you fund me or whether you don’t fund me or I don’t get funded. I am still going to do it.”

A lot of these were either generic advice / book pitches by Reid Hoffman (one of which being that employee / employer relationship dynamics are changing in the modern marketplace and there is no loyalty unless a mutual relationship between each is established) along with some semi-useful advice about pitching to VCs and the importance of hiring a “professional” CEO.

There’s only one mistake that kills startups: not making something users want. If you make something users want, you'll most likely be fine.

This is a list of 18 things that cause startups not to make something users want:

**Single founder**: few companies are started by one person. There’s a reason for this – most people can’t handle running a company by themselves, but also: if you have co-founders and people that depend on you, you won’t want to let them down.**Bad location**: Startups prosper in some areas but not in others. Main locations where startups thrive? Silicon Valley, Boston, Seattle, Austin, Denver and New York. There’s a huge drop off as we go down.**Marginal Niche**: choosing a small, obscure niche market in the hope of avoiding competition.**Imitating Other Companies**: most successful companies / startups aren’t started through imitation. Solve problems that affect you personally – don’t imitate others. “Instead of starting from companies and working back to the problems they solved, look for problems and imagine the company that might solve them.”**Not Being Open to New Ideas**: Most successful startups end up doing something different than what they originally intended. Many pivot into new areas so you have to be adaptable and open to new ideas / avenues.**Hiring Bad Programmers**: what killed most of the startups in the e-commerce space in the 90s was bad programmers / business guys. Most successful companies he founded were run by programmers. Programmers know how to hire good programmers – business guys don’t.**Choosing the Wrong Platform**: PayPal dodged a bullet when they merged with X.com. Their new CEO (Musk) wanted to use the Windows platform but the PayPal team showed that the software scaled only 1% as well as it did on Unix. Switching CEOs took them on a new path and were keys in their success. How do you choose the right platform? Hire good programmers and let them choose (or talk to other programmers in comp sci departments).**Slowness in Launching**: force yourself to launch things quickly. The best form of feedback you can get is directly from your users.**Launching Too Early**: launching slowly kills more startups than launching early, but launching early can also be detrimental. If you scare away your early adopters, many may not come back. Think about the overall goal, then start by writing the smallest subset of it that does anything useful and launch it. Early adapters don’t expect a new launch to do everything; it just has to do something.**Having No Specific User in Mind**: you can’t make great things unless you understand its users. This is why most successful start ups start off as something the founders are trying to solve themselves. If the problem you’re solving doesn’t involve you, you need empirical / measurable feedback from your users.**Raising Too Little Money**: startup funding is measured in time. How much run-way do you have to be profitable? If you don’t have enough, there is a high chance your start-up will fail. That’s why the general advice is to keep costs low initially and to spend all your time in building a solid prototype.**Spending Too Much**: the number one issue that tends to happen is hiring too many people. This slows you down and increases your costs. Avoid hiring people unless it’s necessary. Also, pay people with equity rather than salary and only hire people who are either going to code or go out and get users.**Raising Too Much Money**: there is pressure with taking on money, and it limits you in pivoting if you do need to pivot. Raising capital also usually takes a long time. Advice to founders is to take the first reasonable deal they can get from VCs.**Poor Investor Management: you have to manage your investors**: You as the founder have to dictate where the company goes, not them. If you raise too much money, they will be your boss. If things go well and your company grows rapidly, usually they’ll leave you alone but if this isn’t the case be ready to deal with trouble.**Sacrificing Users to (Supposed) Profit**: “In version 1, solve the core problem. And the core problem in a startup is how to create wealth (= how much people want something x the number who want it), not how to convert that wealth into money.” Advises company to worry about building the product or something people want first and worry about the business model later (side note: this is something which I largely disagree with).**Not Wanting to Get Your Hands Dirty**: as a founder, you can’t spend all your time programming. You have to get into the business and find users / make sales.**Fights Between Founders**: fights between founders are very common (20% of their startups had a founder leave). This doesn’t kill a startup though (Blogger recovered from having one founder to being successful). If you have a bad feeling about a founder, let it be known and don’t hide it. People are the most important ingredient in a startup.**A Half-Hearted Effort**: Most startups fail because they don't make something people want, and the reason most don't is that they don't try hard enough. Most successful start-ups are also led by founders who quit their day-jobs to dedicate their time fully to the startup – they are fully committed to the startup and idea and are willing to dedicate enough energy to making things happen.

“His mind for technical football was extraordinary, but beyond that was his ability to organize and manage his staff, players, everybody—to get the whole organization on exactly the same page. On that page he set the standard for how he wanted things done, and his standard was simple:

**perfection**. That’s what he taught us individually and as a group—to believe it could be achieved and then achieve it (or come close). He had in his mind this ideal—an image of perfect football—coupled with the nuts-and-bolts details of how to accomplish it, which he then taught.”His primary leadership asset: his ability to teach people how to think and play at a different and much higher, and, at times, perfect level.

Three main ways he did it in 1) he had tremendous knowledge of all aspects of football 2) he brought in great staff and coaches who knew how to coach and 3) he taught players to hate mistakes.

Got all players to strive to be perfect in games and in practice. If you aim for perfection and miss, you’re still pretty good. If you aim for mediocrity…

Bill didn’t jump on anyone for making mistakes. Instead he found out what was wrong and taught others how to correct it. He taught the smallest details in perfecting performance.

He didn’t view the organization as a group of separate individuals. He viewed people as being the heart of the organization.

Was an amazing problem solver: knew how to look at things differently and break down complex things into individual parts to make them simpler. He loved lists.

Was an educator and teacher. He studied other great coaches and also learned from them.

Almost always, your road to victory goes through a place called ‘failure.’ However, a good leader intelligently and relentlessly seeks solutions that will increase his chances of succeeding and prevailing in a competitive environment. When you do that, the score will take care of itself.

“Pursuing your ambitions, especially those of any magnitude, can be grueling and hazardous, and produce agonizing failure along the way, but achieving those goals is among life’s most gratifying and thrilling experiences. The ability to survive and overcome the former to attain the latter is a fundamental difference between winners and losers. I’ve observed that if individuals who prevail in a highly competitive environment have any one thing in common besides success, it is failure—and their ability to overcome it.”

Being the best has a price: “If you’re up at 3 A.M. every night talking into a tape recorder and writing notes on scraps of paper, have a knot in your stomach and a rash on your skin, are losing sleep and losing touch with your wife and kids, have no appetite or sense of humor, and feel that everything might turn out wrong, then you’re probably doing the job.”

Talks about his times as a coach where he was devastated by defeat and failure and was very close to resigning (but didn’t): “When the inevitable setback, loss, failure, or defeat comes crashing down on you—losing a big sale, being passed over for a career-making promotion, even getting fired—allow yourself the “grieving time,” but then recognize that the road to recovery and victory lies in having the strength to get up off the mat and start planning your next move. This is how you must think if you want to win. Otherwise you have lost.”

During the ensuing fourteen years after his ‘big’ failure, the 49ers won five championships. The worst had been over. This happened because at the moment of his deepest despair, he had the strength to stand up and keep fighting.

How did he turn 49ers around? He implemented a Standard of Performance: ‘regardless of your specific job, it is vital to our team that you do that job at the highest possible level in all its various aspects, both mental and physical.’

If you were lucky enough to receive a 49er paycheck, it meant you were part of an organization that had high expectations of itself and of you, whether you were a superstar or a secretary, manager or maintenance man, athlete, executive, or head coach. Those expectations, of course, went beyond ethics and attitude to specific performance standards and actions.

In regards to linesmen, the coaching staff identified 30 specific skills they needed to master in order to the job at the highest level and created multiple drills for each one of those skills which were practiced relentlessly until their execution as automatic.

Passing routes were designed down to the inch and then practiced until receivers learned how to be at that exact inch at the exact moment the ball arrived.

Treated all people the same: it doesn’t matter if you were the start quarterback or secretary, everyone was part of the organization and needed to embrace his principles.

Instead of making winning the primary goal, he focused in on getting people to focus on improving their actions and attitude and obsessing about the quality of their execution and the content of their thinking through following his standard of performance.

Hired people based on the characteristics he valued most: talent, character, functional intelligence, and an eagerness to adopt to his way of doing things.

“The leader’s job is to facilitate a battlefield-like sense of camaraderie among his or her personnel, an environment for people to find a way to bond together, to care about one another and the work they do, to feel the connection and extension so necessary for great results. Ultimately, it’s the strongest bond of all, even stronger than money.”

Don’t seek ‘conventional’ approaches or wisdom (it produces average results). Jerry Rice coming out of school wasn’t considered a top-flight prospect due to his limited forty-yard time according to scouts, but Walsh recognized that there was more to simple running speed when it comes down to receiving (he knew how to get open, catch the ball, and gain yardage with ease) and Walsh ended up drafting him 1

^{st}. Rice ended up being one of the greatest football players of all time.Use mentors and learn from them. A good leader is always learning. The great leaders start learning young and continue until their last breath.

“Bill Walsh had the ability to change the way people thought—not just how we performed a task, but how we thought and felt about who we were. In the beginning, when we were as bad as we were, nobody was thinking about a Super Bowl. Our goal in life was just to be pretty good. Bill’s goal in life was to convince us that we could be great. And he did; and we were. That’s why he was such a great leader.”

The path to mastery: practice, continual improvement, and mastering the fundamentals. Joe Montana and Jerry Rice were constantly practicing and trying to improve. Even at the last stages of their careers were still working very hard on the fundamental things that high-school kids won’t do because it’s too damn dull.

Hard work is the key to success. You as a leader need to lead by example, but it’s not up to you alone. Bring in people that put the work in.

“I sought perfection, and 99 percent isn’t perfection. Why ‘almost perfect’ wasn’t enough for me is something I can’t explain.”

Superb, reliable results take time. The little improvements that lead to impressive achievements come from a series of months and years until your organization knows what you are teaching inside and out and everyone is able to execute their responsibilities in all ways at the highest level.

I didn’t have time to summarize this book / recommended chapters, but if you’re curious you can find a great summary and notes on it in the provided link (available in the heading title).

Being a CEO doesn’t come very naturally to most people. Developing the skill set to be a good CEO usually takes years of experience / practice.

Most people have a natural instinct to do things that make people like them (enhancing their chance of survival), but a good CEO must do many things that upset people in the short run in order to be liked in the long run, which is in many ways unnatural.

An important part of being a good CEO is giving feedback. Earlier in his career, he believed in the shit sandwich approach to feedback (wrap the bad feedback / news (shit) with a layer of good feedback (bread)) but it came off badly.

Here are the keys to be an effective CEO:

Be authentic: believe in the information / feedback which you hand out.

Come from the right place: give people feedback because you want the to succeed – not fail.

Don’t get personal: if you decide to fire someone, fire them.

Don’t embarrass / clown other people in front of others.

Tailor your feedback to the employee: not everyone is the same when it comes to feedback.

Be direct, but not mean.

Keep an open door: feedback isn’t one way. Listen to what your employees have to say.

Say what you think and always express yourself. Get used to giving your company constant feedback: it gets people to get used to it (so it’s not personal) and people get comfortable with discussing bad news.

Being a CEO isn’t easy and doesn’t come naturally and there are many more unnatural things which you must do in order to be a good one. The key to becoming a good one basically is mastering the unnatural.

Mostly bullshit article. Talks about how one of his employees / managers didn’t meet with his/her subordinates for over 6 months, so he called in their manager and threatened to fire both of them if they didn’t meet in 24 hours.

Apparently, meetings are very important but doesn’t go into why in much detail and gives a few analogies / stories that don’t really connect with the message.

The CEO at a company creates politics. What is politics? ‘People advancing their careers or agendas by means other than merit and contribution.

*’*Apparently, giving a raise to an employee counts as political behavior (even if the said employee has a higher offer to go work else-where). Why? Because it may get other employees to ask for a raise. This encourages ‘political behavior.’ (

**Personal take:**bullshit. Apparently companies are allowed to operate according to the principles of the free market but employees aren’t??)Advice on minimizing politics is:

Hire people with the right kind of ambition (should value the company’s success as first and foremost priority ahead of anything else).

Avoid activities which encourage political behavior, including: performance evaluation and compensation, organizational design and territory, promotions.

Be careful with “he said, she said.” Apparently even listening to another employee complaining about another one will get others to notice and may encourage further bad behavior. If an employee has a problem with an executive – get them in the room together and get them to talk it out. If you hear another executive complaining about another executive? Try to ignore it and assess it yourself. Doesn’t address what to do if non-executives complain (apparently to him they might not matter?).

**Personal take**: Sorry, but most of these write-ups by Mr. Horowitz come off as complete bullshit and in my opinion are extremely bad advice. Read at your own discretion.

Talks about the early days prior to having success: silicon valley was struggling at the time and mobile didn’t take off, but their idea was to bring ‘Siri’ and a better way to interact with voice technology.

Pivoted from that idea to noise-canceling: discovered this by accident as a product of attempting to develop voice-technology.

Convincing people to invest with them was really hard at the time.

Raising first 500,000 was harder than raising the first 100 million.

Focused on little details and product: how do they deliver a great experience to customers?

First head-set they released wasn’t a hit. In order to hit deadlines, they compromised the end-product.

They ended up running out of money and investors lost faith. They still believed in their product though. They paid off 600,000 in debt and took on another DARPA contract.

Failure is the best thing to learn from: they ruminated over all of the things they did wrong with their first effort and started again. Realized that the product had to be packaged correctly in order to make it a success.

Worked for next 2 years with no salaries.

Their next launch: had issues with manufacturer releasing product from customs. Had only 2K in the bank and they wouldn’t release, so they had to raise money to get it released. They had a huge amount of success with this product (revenue was 70 million in the first year).

Key lesson: you have to focus on solving problems ruthlessly to make product work and to package it correctly.

Hit another huge bump on the read in 2008 (lots of orders were canceled). Hit another issue in regards to product quality when launching another head set after. Focused on putting customer first and solving the problem. Learned to persevere from earlier experiences.

To solve issues, he focused on ‘doing what’s right’ and putting customers first.

Talks about difference between hardware and software. Hardware: have to focus on selling something for a higher price than what it cost them to make it (had to have discipline in choosing parts / making decisions). Had to infuse that same discipline in their software division.

Talks about the importance of great design in both hardware and software and how important it is to get it right.

Mostly quality advice on hiring a lawyer that I won’t include here.

This is one of the best books on selling that I’ve ever read and I highly recommend it. It mostly talks about Frank Bettger and what he did to overcome his failure in sales to become masterful at it. The key points from the book and summarized below:

**The importance of enthusiasm:**talks about how he was demoted for being ‘too lazy’ to being promoted and having his salary go up more than 700% simply by forcing himself to be enthusiastic. The enthusiasm went over and impacted his team-mates by quite a lot.Enthusiasm is the most important part of selling. It doesn’t matter how much you know. Showing enthusiasm is vital and carries over to your client. The best salesmen all have great enthusiasm.

**Force yourself to tell your story / see at least 4 or 5 people every day**. He forced himself to make at least 4-5 calls every single day. Kept records of each one and calculated that it made a huge difference. Calculated that each call was worth 2.30 (which amounted to quite a bit) over the year.**Keep records of what you do (be data driven)**: he saw that his success rate was 70 percent for his first interview, 23 percent for his second, and 7 percent for third. This made him realize what to focus his energy on and improved his per call metrics from 2.80 dollars per call to 4.27. Keeping records is vital to improvement.You must make calls in order to be successful in selling.

**The importance of hard work**: sales is the easiest job in the world if you work hard, but the hardest one in the world if you aren’t trying.**The importance of public speaking:**talks about how he went from being terrified to speaking to crowds to taking a course in public speaking which got him to get rid of the fear. ‘Join only a course where you make a talk at every meeting because that’s what you want – experience is speaking.’When you lose your fear of speaking to an audience, you lose your fear of talking to individuals, no matter how big and important they are.

The best way he every found to help him overcome fear and rapidly develop courage and self-confidence is through public speaking and speaking in front of groups. Almost all leaders have courage and self-confidence, so this is important.

**The importance of planning:**realized that he wasn’t meeting his call goals and started dedicating each Saturday to planning. Studied his records and made notes in preparation for each week to the extent of what he would say / do and making a schedule to make each call which lead to outstanding results.**Become an early riser**: Benjamin Franklin advised many others to raise early and he used the same approach. He gave the same advice to his peers and it worked great for them as well.Take more time to think and do things in the order of their importance (and to also plan them).

**Set aside one day as a ‘self-organization’ day and plan your hours**: many successful people are strict with planning and utilizing their time in the correct manner each day.Talks about how he made an outstanding sale to someone who didn’t seem like he wanted it, but he was persistent enough to stick around and get them interested and to eventually buy in. ‘

**The most important secret of salesmanship is to find out what the other fellow wants, then help him find the best way to get it.**’ If you remember this one rule, selling will be easy.Talks about a sale he made to a businessman: he convinced him to focus on the future of his business (where he highlighted an issue of what will happen to him once he passed away). The businessman accepted this and he made more money that day than he had in the previous year.

Resolved right then and there to dedicate the rest of his selling career to this principle:

**Finding out what people want and helping them get it.****The importance of framing things the right way:**talks about how he got 2 calls pitching him to speak in public at an event. The first call framed things in terms of the speaker. The second framed things in terms of his own frame of reference (i.e. caller go him to envision himself when he was young and realize how important the talk would have been to him). The second call got him to say yes.Talks about an unsuccessful salesman who sold magazines: usually the objection is that they didn’t have the time to read what he sold. He took this and re-framed his sales pitch as ‘reading this means that you get a summary of all news in one short evening of reading’ thus turning the problem on its head: he was actually selling the most precious thing in life, more time. That approach turned his failure into a success.

‘

**When you show a man what he wants, he’ll move heaven and earth to get it.**’ Talks about the importance of phrasing things in terms of what his prospects wanted – not what he wanted.Three important lessons:

The most important secret of salesmanship is to find out what the others want and help them find the best way to get it.

There is only one way to get someone to do something: by making the other person want to do it. There is no other way.

When you show a man what he wants, he will move heaven and earth to get it.

**Elliot Hall’s lesson:**Elliot had been unsuccessful as a salesman and one of the reasons was because he was ‘too positive.’ Realized that instead of trying to be positive and facing objections – whenever a prospect had an objection, he would come up with a question. He would keep asking questions (that got the prospect to say ‘yes’) until coming to just one conclusion – a sound conclusion which was based on facts.Elliot Hall’s questions had just one purpose: ‘

**to help the other fellow recognize what he wants, and then help him decide how to get it.**’ Whenever Elliot would have an objection he would follow up with . ‘Let me help you ____ (resolve the objection) ____.’ He never objected or disagreed with anyone. Learned from Elliot that mastering the art of asking questions was a key to becoming a successful salesman.The author goes through a sale he made by asking follow up questions and noting down some take-aways on making such sales:

**Make appointments:**be expected. You gain a big advantage in doing so. It tells the person that you appreciate the value of his/her time.**Be prepared:**ask the question ‘what is the key issue?’ in helping you prepare.**Questions to ask?**What is the key issue? Or, what is the major point of interest? Or, what is the most vulnerable point?**Make key-word notes.**Unless you make notes, you’re bound to forget what you plan to present. Prior to the interview, rehearse and review the notes over and over again (you should know exactly what you will say and how to say it.)**Ask questions**: this is vital to getting a sale.**Explode dynamite: do something startling / surprising.**(During his earlier example, he insisted that he and his client go to a doctors appointment immediately). It’s necessary to arouse and stir people to take action for their own benefit.**Arouse fear**: there are 2 main factors that move men into action:**desire for gain and a fear of loss.****Create confidence:**Imagine yourself to be a salaried employee at your prospect’s company. You can imagine yourself as being an ‘assistant-buyer’ for the prospect in question which should inspire you to be more confident.

Also: ‘If you were my own brother, I’d say to you what I’m going to say to you now…’ is another confidence gainer.

Praise your competitors.

Create a powerful selling phrase ‘I am in position to do something for you this morning that no other living person can do.’

**Express honest appreciation of your listener’s ability**(people like feeling important and are hungry for praise) but don’t go overboard.**Assume a close: have a winning attitude.****Put YOU in the interview**: it’s a superb way for you to make sure you’re practicing the most important rule of all: seeing things from the other person’s point of view and talking in terms of his/her needs, wants, and desires. Strike out ‘I’ or ‘we’ and change it to ‘you’ or ‘your.’

Talks about the importance of asking your prospect questions and

**getting to know them.**Talks about a case where one man wouldn’t talk with him but he showed up and asked him the question ‘how did you get into this business?’ and the prospect opened up to him (and discussed things that even his wife didn’t know). His follow up and plan after this won him a sale.Six things you gain through the question method:

Helps avoid arguments.

Helps avoid talking too much.

Enables you to help the other person recognize what he/she wants.

Helps crystallize the other person’s thinking (the idea becomes his idea).

Helps you find the most vulnerable point with which to close the sale...the key issue.

Gives the other person a feeling of importance (when you respect his/her opinion, they’ll respect yours).

**Key question to ask when going into a sale: what is the key issue?**How do you identify the key issue?

What is the basic need?

What is the main point of interest; the most vulnerable point?

If you try to argue each of your opponent’s issues one-by-one, you’ll never win. Identify the key one. How do you do so? Get your opponent to talk. He or she will most likely list a few but he’ll keep coming back to the key issue.

The main problem of the sale is to:

Find the basic need, or

The main point of interest.

Then stick to it!

**What’s the most important word in the English language? Why.**If someone objects to you, ask why? Keep asking it and your prospect will may eventually find the answer to his objection.

Milton Hershey had 3 failures before he was 40. He asked the question: why? He realized that the reason was that he didn’t have all the facts. Ever since then, he kept asking the why question.

**The real reason for not buying:**many prospects won’t give you the real reason they won’t buy right away (he collected data from his sales, and realized that 62% of the time the first thing his prospects listed was not the real reason they didn’t want to buy).Always ask: ‘In addition to that, isn’t there something else in the back of your mind?’ after a prospect objects to get them to possibly open up and to find the real reason behind their objection.

A man has two reasons for doing a thing: one that sounds good, and the real one.

The best formula he ever found to draw out the real one is built around these 2 little questions:

**‘Why?’****‘In addition to that…’**

**The importance of being a good listener:**talks about one sale which he had where he simply asked the prospect about his family. After talking for about 3 hours, he was about to say good-bye (without saying anything about himself) – when the prospect turned into a sale. Sometimes, simply listening gets the job done. Always show the other person that you’re sincerely interested in what they’re saying. Listening has become a rare art. Good listeners are rare.Whenever you’re talking and you find someone disengaged or not responding (they’re not really listening to you): stop talking. Nine out of ten times they’ll start talking and say what’s on their mind. Even if there’s an uncomfortable silence, usually the prospect will break it.

When listening: don’t focus in on what you’re going to say next. Really listen.

**Be honest:**The best talker doesn’t win the sale – the most honest one does. To win confidence, you must deserve confidence – and this is done through honesty.**Be the best informed: knowledge is power.**Knew 2 salesman early in his career that brought in 70 percent of the profits out of an office with 16. Those 2 men were continually being consulted by other salesmen – they were the best informed. When asked where they got their info from: ‘I subscribe to services that give all the legal answers, sales ideas, etc… and I read the best journals and magazines.’Always make time to read and learn new things. One of the best doctors he knew was still learning and staying engaged at the age of 66.

In other words:

**know your business and keep on knowing your business.****Praise your competitors:**talks about how he made an important sale where he asked the prospect what companies he was insured with. After answering, he praised the companies and gave good info on all 3 companies and called them world class. After doing so – he made his sales pitch and won the prospect over.**The power of witnesses:**talks about the power of getting other customers to give testimony on the product. He would sometimes make calls in the middle of the sales pitch to existing customers and ask them what they thought about x and y, and the testimony would do a great deal in winning the prospect over.**The importance of grooming and being well dressed:**Unless the man looks the part, people won’t believe what he says is important. Be well dressed and have a professional appearance (also advises to wear suits / shoes / etc.. on alternate days to preserve their life). Ask experts for help in helping you ‘look professional’ and giving you advice on what to wear / buy.Main reminders:

**Deserve confidence:**the real test is: do you believe it, not will the other person believe it.**To win the confidence of others, you have to know your business**… and keep knowing your business. Always study and look to learn new things.Benjamin Franklin’s key motto: “I will speak ill of no man – and speak all the good I know of everybody.” (Praise your competitors).

Cultivate the habit of making understatements; never exaggerate!

**An infallible way to gain a man’s confidence: bring on your own witness**(use a telephone).**Look your best.**

One of the greatest questions to ask your prospects is ‘

**How did you ever get started in this business?**’ Many times it opens doors.**Praise is another way to get through the door**: many businesses and people don’t get enough praise / compliments.He kept cards on the people he met throughout the years and information they told him: many people are amazed that he remembered so much. Careful notes / cards made things easier for him. Advises everyone to do the same.

“If you would win man to your cause, first convince him that you are his sincere friend.” - Lincoln.

**Encourage young men and help them see how they could be successes in life.**Tells a story how he told a young man that he would become president of the company one day – and the prediction came to be true. Try to get men to tell you their highest ambitions in life and help them raise to it.If anyone has inspired you or helped you in any way in life, don’t keep it a secret. Tell other people about it.

**The importance of smiling**: action seems to follow feeling, but really action and feeling go together. Give every living soul you meet the best smile you ever smiled in your life and see how much better you feel and look.**The importance of remembering names:**When you meet someone, your entire focus should be on them (and remembering his/her face and name). If you didn’t get the name, simply ask them to repeat it – then, take a mental snapshot of the face/name so that you’ll remember it.In sum, to remember names:

**Impression**: get a clear impression of his/her name and face.**Repetition**: Repeat his/her name at short intervals.**Association**: Associate it with an action picture, if possible, include the person’s business.

**Brevity: don’t over-talk. Be succinct and get to the point**(long talks bore people).‘At a recent meeting of purchasing agents, we took a vote to find out the biggest reason why salesmen lose business.’ → ‘The vote was three to one that salesmen

**talk too much**.’Before he makes a phone-call, he makes a list of things he wants to talk about and gets to each point: ‘I know you are busy, there are just four things I want to take up with you…’

**If you’re scared, admit it!!**Many big men and huge public figures have been in the same situation – there’s no reason to feel ashamed of it or to hide it, so if you’re speaking to someone big or high-up, you pay him/her a compliment by admitting that you’re scared!**The approach:**the approach which you plan is very important. ‘In my twenty-five years of selling, the best approach I ever found was to first find out about a prospect’s hobby, and then talk about that hobby.’ Also gives a story about winning a prospect over by playing with his young daughter (and getting invited to her birthday party).The first and most important step of selling: ‘

**Sell yourself first.**’ Talks about another salesman that was very successful simply by having a very welcoming and embracing approach and making each customer entering his store feel fantastic.**The Questionnaire**: he had a questionnaire that he would use which would take each prospect 5 to 10 minutes to answer:What minimum monthly income would your wife need in the event of your death?

Minimum monthly income for yourself at age 65?

What is the present value of your estate?

Stocks bonds, other securities?

Real estate (mortgages)?

Cash on hand?

Earned annual income?

Your life insurance?

How much do you pay out each year for insurance?

The above would give him ample information which he could use to win the prospect over whenever he got back to them.

An

**approach must have only one objective: selling the sales interview -- not the product, your interview.**It is the sale before the sale.**Making appointments**: talks about how a barber was able to revive his business simply by asking his customers to make appointments instead of doing first-come first-serve. Many people prefer appointments. They save time – they give appreciation to the customers’ time and it makes each call an event.Making appointments is like getting on-base in baseball: the foundations in sales relies on getting interviews.

**Persistence**: some of the hardest prospects he took on wouldn’t make appointments with him, but he persisted and won the sale over simply by persevering and getting a chance to talk with them / approaching them.**The importance of practice and rehearsal**: takes us through a few stories where he makes it clear that practice makes perfect – many people who made it made it through hard-work and practice. Rehearsing his sales calls with another work-mate made his pitches better.**The best time to prepare a sales talk is right after you made one**: all the things you should have said and should not have said are fresh in your mind. Write them down immediately!**Write your talk out word for word. Keep on improving it.**Read it and reread it until you know it (but don’t memorize it). Try it out on your wife. If it’s bad, she’ll tell you. Deliver it to your manager and another salesman. Give it until you love it.**One demonstration is worth more than a thousand words:**If possible, let the customer / prospect perform the demonstration and let them help you make the sale. Gives a few examples where demos played an important role in landing the sale (including one where he placed a quarter and dime next to a pen to demo how much of the estate goes to a customer’s family – he would say the quarter is what the prospect has, and the dime is how much is left to the family after the executors are done after he passes).**Never forget a customer; never let a customer forget you:**out of the 33 cars he bought, he dealt with 33 different salesmen. Not one of the salesmen he dealt with thought of picking up the phone and following up with him. Chevrolet Motor adopted the motto of never forgetting a customer and were first in sales for 13 / 15 years.Gives another example where a business would follow up with customers on how they were doing soon after making a sale. If the customers need any help – they helped them out and this in turn lead to other possible prospects. ‘If you take care of your customers, they take care of you.’

When is the best time to follow a referred lead? Six days or six weeks? The real answer:

**six minutes!!**Never fail to show appreciation for a lead. Report results whether good or bad.

Play position for the next shot (just as in pool – look for the next opportunity / lead whenever possible). ‘Prospecting is like shaving...if you don’t do something about it every day, first thing you know, you’ll be a bum.’

Earlier in his career – he had issues closing a sale. Most of his sales were being done on the first interview and only 23 percent we successful on follow up second interview (with 7 percent being successful on the third). Learned something important about closing: appeal to action!! Some rules which he used to close a sale:

**Save closing points for the close**(he used to use them too early). Average successful sale goes through 4 steps: 1) Attention, 2) Interest, 3) Desire, 4) Close. Save the enthusiasm / excitement in the closing points for the end of the sale.**Summarize**: a summary should be brief and only take you a few seconds to voice.**The magic phrase: ‘How do you like it?’**After summarizing the pitch for the prospects, he asks this question and if the answer is yes, he asks the prospect a few more questions (and tries to get them to say ‘yes’ a few more times) and usually he already has the sale and is in the process of filling out their application form.**Welcome objections and have persistence**: some of his best sales prospects have been ones who objected. ‘I can’t afford it.’ or ‘come back in January’ were all great news to him: it told him the prospect wanted the product. An objection is not a turn-down.**Why? … in addition to that … ?**If a prospect isn’t willing to commit yet, usually asks the why questions and follows up with more questions (‘if you were my own brother, I’d say to you what I’m going to say to you now’) and the persistence does pay off. ‘Why’ gets the customer talking and brings out his objections. ‘In addition to that’ finds the real reason, or the key issue.Always have an ‘X’ penciled in where the prospect signs.

Get the prospect to commit / pay as soon as you can. Once someone commits money to the sale, they rarely pull out.

Check yourself every day on these closing rules. Apply them until they become habits.

Approach each interview like it will be the best interview you ever had.

If you’re unsuccessful after the prospect interview, make notes and review them.

**Remove barriers to entry**: make it easy for the prospect to buy / sign. Tells a story on how he followed up with a prospect on a second interview by filling out his insurance application form and asking him to sign. The prospect wanted no part of this initially but did end up signing and paying up when he showed up and presented the filled out paper-work:Write up the order or application or contract in advance, even though you may only have the prospect’s name and address on it.

Mark a heavy ‘X’ in each place where he/she is to sign.

Your first words: ‘Is that right, Mr. Blank?’ laying the paper on his desk directly in front of him. If it is a stand-up interview, place the unfolded paper in his hands.

The ball is now down on his/her one-yard line and the momentum is with you.

**Don’t fear failure: embrace it.**Babe Ruth struck out over 1000 times, but he isn’t remembered for that. If you fail, keep going: each week and each month you’re improving. Once day soon, you will find a way to do the thing that looks impossible today.**Courage is not the absence of fear; it is the conquest of it.****Benjamin Franklin’s secret of success:**Franklin chose 13 subjects which he felt were necessary or desirable for him to acquire and try to master, and he gave a week’s strict attention to each subject successfully (and repeated this process 4X per year).Frank Bettinger made his own list of 13 subjects, and they were: 1) Enthusiasm 2) Order: self-organization 3) Think in terms of the others’ interests. 4) Questions 5) Key issue 6) Silence: listen 7) Sincerity: deserve confidence 8) Knowledge of my business 9) Appreciation and praise 10) Smile: happiness 11) Remember names and faces 12) Service and prospecting. 13) Closing the sale: action.

Each week, he carried a card in his pocket and focused on emphasizing that one principle during the week.

‘Whether you are a printer, salesman, banking, or candy pushcart peddler, let’s assume that you select thirteen subjects best suited to you. By concentrating on one thing at a time, you will get farther with it in one week than you otherwise would in a year. A new confidence will take hold of you. At the end of 13 weeks, I know you will be surprised with your progress.’

To build a company generating $100 million you need:

1,000 enterprise customers paying you $100k+ per year each or

10,000 medium-sized companies paying you $10k+ per year each; or

100,000 small businesses paying you $1k+ per year each; or

1 million consumers paying you $100+ per year each or

10 million active consumers who you monetize at $10+ per year each by selling ads

Salespeople sometimes refer to “elephants”, “deers” and “rabbits” when they talk about the first three categories of customers. To extend the metaphor to the 4th and 5th type of customer, let’s call them “mice” and “flies.”

Hunting flies: to get over 10 million users, you need 100+ million downloads AND either 1) have a product that is social and has a high viral coefficient (i.e. Facebook, WhatsApp) or 2) have a ton of user-generated content which leads to large amounts of SEO traffic (i.e. Yelp).

Hunting mice: to acquire 1 million customers paying 100 dollars a year you need 10 to 20 million downloads and a level of virality. Examples of this type of business are Evernote and MailChimp.

Hunting rabbits: to acquire 100,000 customers paying 1K per year, you need 0.5 to 2 million trial sign-ups along with inbound marketing, a focus on promoting your NPS score and sales funnel optimization although there is no silver bullet here.

Hunting deer: to acquire 10,000 customers paying 10K a year, most of the rabbit hunting techniques still apply along with maybe needing an inside sales force to generate or close leads along with an experienced VP of sales.

Hunting elephants: most of the big SaaS companies are elephants (i.e. have a few large enterprises paying for expensive subscriptions) and they include Workday and SalesForce. Remember, here you only need 1,000 customers paying 100K a year which is less than one executive per company (so it’s not that much money).

If you have 100 million in revenue, you’re looking at an exit of anywhere from 500 million to a billion and over.

Overall, going through most of the material was very useful, although the perception I had was that it was mostly centered on founders who wanted to go in the ‘move fast and break things’ category. It’s my firm belief that there are more sustainable approaches to building companies – albeit most of the advice here focuses on start-ups. Approaches that focus on generating income rather than continually growing revenue which YC seems to be obsessed with tend to be more promising approaches when capital / lending is expensive (i.e. when interest rates are high), and we’re currently in a high-interest rate environment. Either way, I did agree with most of the general advice so it’s still extremely great advice for a lot of founders. Some key ideas which really stuck with me are outlined below:

Start ups are hard and not for everyone. If you do start a start up, you are in for a very painful road, albeit one that also has a ton of rewards.

To be successful at starting a startup, you need to be passionate about the idea. Entrepreneurship is hard.

Focus on a small market and aim to make something insanely great. Choose to build something a small amount of users

**really want**rather than a large group of users somewhat want.Keep your expenses low and minimize your commitments.

The key factor which will determine whether a start-up and company will be successful is

**hard work and determination.**This is the number one factor which determines whether you’ll succeed.Ignore the haters and work on whatever you find interesting.

Build something that solves your own problem(s). You have to have passion for the idea that you’re bringing about.

Do things that don’t scale: pay attention to the early users and take extra measures to not only acquire them, but to make them incredibly happy (you can focus on scaling later).

**Live in the future, then build what's missing.**That describes the way many if not most of the biggest startups got started. Try to notice and find things that seem to be missing. Turn off the normal perceptual filters which you use in everyday life and open up.Talk to other people about their unmet needs. Find the gaps that they believe exist in the world. What's missing? What would they like to do that they can't? What's tedious or annoying, particularly in their work?

If you want to create and capture lasting value, don’t build an undifferentiated commodity business and

**avoid markets which have destructive competition**(like the airline or restaurant industries). Look to build differentiated products and a monopoly (like Google).If you have a competitor that’s growing faster than you - you may need to assess how fast the growth is and make sure that you don’t let them capture the entire market (i.e. think Oracle vs. Ingres: Oracle sucked but the obsession on growth allowed it to capture the market).

Keep on adding new product innovations and new features to the core business: what do the users want and how can you make your core product better?

Take Zuckerberg’s advice:

**listen to your users, stay simple, be reliable.**Willpower and cognitive processing draw from the same pool of resources: don’t make your users think. Make the product simple and less cognitively demanding.

Do field studies and usability testing and use them to make it easy for your target users to do what they want.

The press is only a tool and can kill young companies. Things you should focus on: What problem am I solving for customers? Does my startup have a reason to exist? How can I make my service even better? Am I improving things for the economy or society at large? Who is your closest competitor and what do you do differently?

Software is eating the world: many new business are now migrating to being on-line, and it’s now easier to build a software based business than during any other point in history. This is a tremendous opportunity for many entrepreneurs.

How to pitch successfully: make something worth investing in, understand why it’s worth investing it and explain it clearly.

Don’t get addicted to fundraising: you should be listening to your users, not focusing on raising money. Also, don’t raise too much money.

Encourage your employees to be open and to share as much as possible at work. Your job is to deliver happiness to customers, employees and investors.

Build a strong culture by doing things the right way and have trust in your employees to do the right thing. Embrace employee openness and foster communication through sharing. Try to get everyone together (lunch / events) and have fun. Hire only the best.

Hiring is really hard and really important. In fact, it’s most likely the most important thing a founder does. Hire people that aren’t a**holes and that you like. Look for smart and effective people that get things done.

Be generous with compensation – especially when it comes to giving equity. Give your employees a share in the company and shared goals and values that align with your own.

Market is cyclical and investors tend to act in waves: understand that this happens and that many people follow each other, so the waves will come and go. A smart investor ignores them and invests in people who want to build great things and have perseverance.

There’s only one mistake that kills startups: not making something users want. If you make something users want, you'll most likely be fine.

Follow Bill Walsh’s advice: Focus on improving yourself and your team - as well as perfecting the fine details that matter and the score will take care of itself.

Don’t ever compromise the end-product: focus on building and packaging something great for your customers / end-users. If you package something that looks like garbage, it’ll result in garbage. Also, embrace failure and don’t give up: perseverance is the most important factor in determining whether you’ll be successful.

Follow the advice in and read the book 'How I Raised Myself From Failure to Success In Selling’ if you want to find some great tips on being a great salesman. I included detailed notes and a summary above, but reading the book was invaluable and I highly highly recommend it to everyone.

Hopefully you find these notes useful and hopefully they help other people in starting their own company. I’m also extremely sorry for not being able to include more material. My overall favorite talk is creative mornings with Ben Chestnut:

There’s also other great material there which I either may have missed or which I didn’t have time to include. I highly urge any founders to actually go through and read all of the material rather than just skimming my notes. Either way, if you have any improvement suggestions or want me to include anything, please let me know and I’ll be more than happy to do so. If you found these summaries helpful, please make sure to like and share :)

]]>Sam Altman is the current CEO of OpenAI. The organization's mission is to ensure that artificial general intelligence (AGI) benefits all of humanity, and they’re the master-minds behind ChatGPT. Prior to OpenAI, Mr. Altman was the president of Y-Combinator (YC) -- one of the most prestigious startup accelerators in Silicon Valley. Y-Combinator has played a significant role in the growth and success of numerous tech startups (including Airbnb, Dropbox, Reddit, Stripe, Docker, etc… ) and it was originally founded by the one and only Paul Graham. Prior to that, he founded Loopt – a mobile location-based social networking app. The company aimed to connect users with their friends and discover local events. Loopt was eventually acquired by Green Dot Corporation for $43.4 million.

In other words, Altman is a man beaming with success, so I decided to go through his notes on starting a start up and to do my best to summarize some of the advice given throughout his recommended reading list. For anyone who wants to slog through each one manually, you can find them all here:

https://startupclass.samaltman.com/lists/readings/

My own notes and highlights are provided below.

No matter what you choose, build stuff and be around smart people.

“Stuff” can be a lot of different things—open source projects outside of class, a startup, a new sales process at a company you work at. The key: don’t talk about it. Build it.

Don’t be scared of taking risk (most people are too risk averse).

Try to join a company on a breakout trajectory.

Don’t choose a job for salary (like joining Microsoft. Mentions a friend who chose Microsoft and judged him to make a terrible decision since his mental model of Microsoft was most likely a bunch of non-intelligent bureaucrats working on non-interesting things. Ohh Sam, how times change?).

If you start a company, only do so if you have an idea you’re in love with. Startups are a 6-10 year commitment, so you need to be in love with the idea you’re pushing forward.

Keep your expenses low and minimize your commitments (in other words, commit yourself to making the idea happen no matter what circumstances may be in store).

Bad reasons for trying to start a company:

You want to be your own boss. The reality is that starting a startup makes everyone else (the customer, employees, partners, etc…) your boss.

Glamour. Usually any glamour is clouded with a mountain of hard work.

Financial return (for your talent). There are much easier ways to receive a good piece of the financial pie than starting a start up (the 100

^{th}engineer at Facebook made much more money than 95% of Silicon Valley entrepreneurs).

Many people often accuse Silicon Valley of working on stuff that doesn’t matter, but a ton of important things usually start out looking as if they don’t matter.

If some users really love what you’re building and engage with it often in their daily lives, keep working on it!!

Pay no attention to market predictions (they tend to be terrible).

There are 2 primary ways of change the world with tech:

Build something that some people love but most others think is a toy.

Be hyper-ambitious.

Ignore the haters and work on whatever you find interesting. Most critics tend to be ignorant f*cks that aren’t building anything themselves.

Do things that don’t scale (i.e. do things manually and really get involved – at least in the beginning). The most common thing founders have to do at the start is recruit users manually. Airbnb and Stripe are great examples of this (founders went WAY out of their way to recruit new users).

Many founders under-estimate the power of compound growth. If you have 100 users, you need to get 10 more next week to grow 10% a week. If you keep growing at 10% a week, after a year you'll have 14,000 users, and after 2 years you'll have 2 million.

The right question to ask a start up: how fast is the company growing? Forget about changing the world and everything else. Focus on growth.

Build something that solves your own problem(s) – which also opens up the door to finding new users as well (since they’ll be your peers).

Take extraordinary measures to not only acquire users, but to make them incredibly happy (send out thank you notes / do a lot of schmoozing / do whatever it takes to make your initial users happy).

Focus on making your user experience ‘insanely great’ (as coined by Steve Jobs). The product is just one component of that. (As a note, he also mentions that you can make the experience great even with an early, incomplete, buggy product which … well, I tend to disagree with but hey, he’s the pro).

Over-engaging with early users is not just permissible, but necessary. You need them as a feed-back loop in order to make the product better.

Perfectionism is often an excuse for procrastination. Don’t try to make things perfect – focus on shipping.

Sometimes the best early trick is to focus on a

**narrow**market (like Facebook, which start off as a platform for Harvard / university students and focused on building a great experience**for them**initially and expanded afterwards).Most great start-ups focus on a narrow market: the founders build something great for themselves or their friends, and then later realize that it could be expanded to a broader market.

If you can find someone with a problem that needs solving and you can solve it manually, go ahead and do that for as long as you can, and then gradually automate the bottlenecks.

Launches: forget big launches. Most founders believe that a ‘big event’ is necessary to get off to a good start. Most great things don’t start off with a big launch event. They start off quietly and attract a few users.

Don’t try to think of start up ideas. Look for problems – preferably problems that you have yourself.

The very best startup ideas tend to have three things in common: they're something the founders themselves want, that they themselves can build, and that few others realize are worth doing. Microsoft, Apple, Yahoo, Google, and Facebook all began this way.

There have to be users who really need what you’re making (or trying to make). Not just people who see themselves as using it one day – they need to have the urge to use it IMMEDIATELY. The initial user group needing it is usually small, but it doesn’t hinder the idea (if the need was broad – an existing company would have tried to solve it already).

Choose to build something a small amount of users

**really want**rather than a large group of users somewhat want.**Live in the future, then build what's missing.**That describes the way many if not most of the biggest startups got started. As an example, Zuckerberg pretty much lived online prior to starting Facebook. If you asked most people on whether they would like to post their personal info online prior to Facebook, most of them would have been terrified.Be open to new opportunities. Bill Gates and Paul Allen hear about the Altair and think "I bet we could write a Basic interpreter for it." Drew Houston realizes he's forgotten his USB stick and thinks "I really need to make my files live online." The verb you want to be using with respect to startup ideas is not "think up" but "notice." Observe the opportunities that stand unnoticed before you.

Try to be at the leading edge of a rapidly changing field. This is where the real opportunities come from.

Try to notice and find things that seem to be missing. Turn off the normal perceptual filters which you use in everyday life and open up. Try to think of things that make life more efficient or tolerable. You're trying to see things that are obvious, and yet that you hadn't seen.

Try to work on cool projects and build things – even if they seem like toys at first. Live in the future and build what seems interesting.

If you’re going to college, don’t take a class on entrepreneurship. Build things and look for ideas to build and people to build it with.

Don’t worry about being late: good ideas seem like they’re obvious, which means that someone might have already tried to solve it. Don’t let that deter you.

It’s better to have a good idea with more competitors than a bad one without. You don't need to worry about entering a "crowded market" so long as you have a thesis about what everyone else in it is overlooking. Google wasn’t the first to market, but it didn’t keep them from dominating and being the best! A crowded market is actually a good sign, because it means both that there's demand and that none of the existing solutions are good enough.

Turn off your “schlep” (avoidance of tedious tasks) and “unsexy” (avoidance of unglamorous work) filters.

When searching for ideas, look in areas where you have some expertise. If you're a UI expert, don't build a database app.

One good trick is to ask yourself whether in your previous job you ever found yourself saying "Why doesn't someone make x? If someone made x we'd buy it in a second."

Talk to other people about their unmet needs. Find the gaps that they believe exist in the world. What's missing? What would they like to do that they can't? What's tedious or annoying, particularly in their work?

Many people approach the world with a ‘fixed’ mindset (thus lacking imagination and a lack of propensity to dream of how things ‘could’ be).

The world needs death: to get rid of the people who are satisfied with the current ways of doing things and produces new people who see the potential to improve things and dream of new things.

Large companies suck. The people at the top of the company tend to not notice what many people notice at the lower levels of the company – and even when they do tend to notice, it tends to be too late (think IBM). Big companies move too slowly.

As long as new humans keep getting born – there will be many new opportunities for young people to come in and come up with new ideas and for young people to come in an innovate.

What advice would you give to young entrepreneurs?

Find something you’re really passionate about instead of thinking of ‘ideas.’ Ideas are useless. Work on something and find your ideas through work rather than fishing for the next idea or big thing.

More than half of what separates the successful entrepreneurs from the non-successful ones is

**pure perseverance. Most people give up and it’s a hard road to take.**You have to have passion for the idea that you’re bringing about.

We’re all going to be dead soon. Live each day on earth like it’s going to be your last. You never know when you’re gonna go. Look to leave something great behind.

Talent attracts talent. Ambitious and bright people tend to attract each other and work in groups (which thus explains the great success within the Bay area).

You should not be doing any ‘marketing’ at your start up. Instead, your focus should be on sales (narrow and deep) rather than marketing (broad and shallow).

Talk to a small number of users using your product and focus on that. You need to talk individually to early adopters to make a really good product, so engage with the early birds and don’t be scared of getting your hands dirty and being really involved.

Successful startups almost always start narrow and deep.

Get a small number of users initially, and focus in on them.

How to measure if your efforts are effective? Focus on growth rather than absolute numbers.

Make a really good product and go out and find users for it manually.

If you want to create and capture lasting value, don’t build an undifferentiated commodity business and

**avoid markets which have destructive competition**(like the airline or restaurant industries). Look to build differentiated products and a monopoly (like Google).Both monopolies and non-monopolies try to bend the truth. Monopolies don’t want governments getting in the way, so they up-sell their own competition. Non-monopolies want investors, so they overstate their own differentiation and talk down their competition.

Competition pushes people towards ruthlessness or death. Monopolies have the freedom to do what they want and focus on what’s important without prioritizing money.

Not all monopolies are bad: creative monopolists give customers more choices by adding entirely new categories of abundance to the world. Creative monopoly profits provide an

**incentive to innovate.**Monopolies can funnel their profits into long-term thinking and various research projects which non-monopolies don’t have the ability to do.Old school economics is outdated and no longer relevant. Businesses are not static. Static = death. Monopolies are dynamic and usually creative. Monopolies are actually good for the world.

**So why do we value competition?**Our education system brainwashes us into valuing and believing in competition.

Managers never tire of comparing business to war.

Our society hero-worships competition and what it represents, when reality shows the opposite: competition and extremely competitive markets simply suck.

It’s better to merge with competitors than to fight them. Thiel goes on to describe PayPal’s merger with X.com and how it benefited both companies and equipped them to survive the dot-com crash.

Simply put: competition is a force of destruction, so avoid it at all costs.

**So – how do you build a monopoly business?**Why is Twitter valued so much more than the New York Times? The New York times was profitable while Twitter wasn’t. Simply put: cash flow. Great business is valued by the cash flow its able to produce in the future rather than today. Investors expect that Twitter will be able to capture monopoly profits over the next few decades, so the newspaper monopoly days are over.

Most of the value in low-growth businesses (like nightclubs and restaurants) is focused on the near term: they trade in profits which they make today for its future profits (which get eaten away by substitutes and competitors). Technology companies follow the opposite trajectory: they sacrifice today’s profits (and lose) to build something which will generate great value 10 to 15 years into the future.

PayPal and LinkedIn are excellent examples of these sorts of companies: the valuations might seem insane on the surface, but the future cash flows and compound growth tell a good story.

If you focus on near term growth too much, you’ll miss out on the most important question which you should be asking yourself:

**will this business still be around a decade from now?**Every monopoly is unique, but they usually share some combination of the following characteristics: proprietary technology, network effects, economies of scale, and branding.

**Proprietary technology:**Think Google: it’s search algorithm dominated (fast results combined with incredible quality). As a good rule of thumb, the technology should be 10 times better than its substitutes. The best way to make something 10x better is to invest something completely new. Or you can radically improve an existing solution: once you’re 10x better, you escape competition. Amazon made its first 10x improvement in a particularly visible way: they offered at least 10 times as many books as any other bookstore. Apple made a tablet that went from unusable to something that was beautiful and useful.**Network Effects:**Network effects make a product more useful as more people use it. For example, if all your friends are on Facebook which draws even more people towards using Facebook. Paradoxically, then, network effects businesses must start with especially small markets. Facebook started with just Harvard students.**Economies of Scale:**A monopoly gets stronger as it gets bigger: the fixed costs of creating a product (like employee costs or office space) can be spread our over greater quantities / sales. Software can enjoy incredible economies of scale due to marginal costs of making more copies being close to zero. Many business gain only limited advantages from scaling (like service businesses). A good startup should have the potential for great scale built into its first design.**Branding:**today’s strongest brand is Apple. Apple offers products so good as to constitute a category of their own. Apple has a complex suite of proprietary technologies, both in hardware (like superior touchscreen materials) and software (like touchscreen interfaces purpose-designed for specific materials). It manufactures products at a scale large enough to dominate pricing for the materials it buys. And it enjoys strong network effects from its content ecosystem: thousands of developers write software for Apple devices because that’s where hundreds of millions of users are, and those users stay on the platform because it’s where the apps are. These other monopolistic advantages are less obvious than Apple’s sparkling brand, but they are the fundamentals that let the branding effectively reinforce Apple’s monopoly.

**Building a Monopoly:**Every startup is small at the start. Every monopoly dominates a large share of its market. Therefore, every startup should start with a very small market.

Always err on focusing on a small market. The reason? A small market is easier to dominate than a large one.

The perfect target market for a startup is a small group of particular people concentrated together and served by few or no competitors. Any big market is a bad choice, and a big market already served by competing companies is even worse.

Once you dominate a niche market – your focus should be on expanding into related broader markets. Amazon as an example started with books and then moved into other areas.

Don’t disrupt: avoid competition as much as possible. Build a monopoly in a niche market and “study the endgame before everything else.” Also – forget about the ‘first mover advantage.’ Usually, you want to be the last mover – make the last great development in a specific market and enjoy years or even decades of monopoly profits.

Most people and most companies are mostly shit (learned this at AOL).

Spent most of his time investing and playing poker.

Many people try to mask themselves in a ‘veneer of complexity’ but it’s fairly simply: measure some shit, try some shit, test some more shit, throw out some more shit and it’s really not that complicated.

It’s unbelievable the lack of ‘dog-fooding’ that happens in the real world (where dog-fooding refers to people using their own product or the product which they produce).

People in general think of growth in completely wrong way.

Three of the most important questions in regards to consumer products that are asked:

How do you get people into the front door?

How do you get to an ‘aha’ moment as quickly as possible?

How do you deliver the core-product value and engagement?

Only after you address these questions can you focus on growth.

Eliminate ego. Be comfortable with not being rewarded in the short term.

Don’t live on gut-feeling. Most people can’t predict things correctly.

Be honest. Don’t allow bullshit to permeate your culture.

Do you really know what you’re building and why?

Key thing which Facebook focused on: how do we provide value to as many users as possible. They didn’t focus on growth or anything else. Focus was on delivering the core product value to as many users as possible.

Speed to market often involves a trade-off with quality: if you need high quality code, it takes time and Joel usually takes the time to do things right. This has also slowed Fog Creek (his company) down a lot.

He’s OK with reasonable and steady growth over explosive growth (Fog Creek had grown 56% annually for 10 years).

Joel is happy with this growth rate: the company is profitable, their customers love their products and their sales are rising.

But, in the book ‘Crossing the Chasm,’ Moore writes about the battle between Oracle and Ingres in the 1980s: Oracle drove for 100 percent growth while Ingres accepted 50 percent growth. According to executives at Ingres, the company could not grow any faster than 50 percent and still adequately serve their customers. They said that Oracle was over-promising and under-delivering and that their customers hated them. They ended up being wrong and Oracle won the war.

This concerned Joel. Moore also said “Once the apparent leader-to-be emerges, pragmatists will support that company, virtually regardless of how arrogant, unresponsive, or overpriced it is."

This is entirely possible and you can think of it this way: “If you're growing at 50 percent a year, and your competitor is growing at 100 percent a year, it takes only eight years before your competitor is 10 times bigger than you. And when it's 10 times bigger than you, it can buy 10 times as much advertising and do 10 times as many projects and have meetings with 10 times as many customers. And you begin to disappear.”

Many software companies have gone away because they simply didn’t grow fast enough and were over-taken by the market leader (look at Word Perfect or Word Star) - but expanding a business faster than its natural rate is also risky. You have to hire quickly and borrow money from investors as well as rely on outside partnerships.

**Key point: If you want to win, you need to be OK with focusing on growth taking risks.**If you aren’t, someone else will be and they will take your market share.At Fog Creek, they plan to take more risks in the future by:

Focusing on delivering a product that can square off with their competitors; often, they lose a customer because the don’t have X and the competitor does. They plan to address this by adding features to their already excellent product.

Focusing on building up a sales force. Word of mouth isn’t enough and Fog Creek could use a bigger sales team which they plan on delivering soon.

Businesses don’t grow themselves. One of the most important jobs of a CEO is to aggressively define and pursue a growth agenda for his or her business.

When he joined e-bay, he noticed that month over month growth wasn’t there for one month and needed a plan to prioritize growth. Rather than using more marketing (which they were already utilizing) or attempting to acquire a company (it felt desperate to acquire a company), they decided to focus on focusing on

**product innovation**.Decided to look into buying formats: at the time, they only allowed buying through auctions, but research showed that non-male users may not have loved auctions and preferred fixed-price formats. The buy-now option that they added now represents 62% of eBay’s revenue.

He came to call this process of layering in new innovations on top of the core business “adding layers to the cake.” eBay within the U.S. was the company’s original business but the team focused on adding more ‘growth layers’ – the first one being international expansion, followed by payments and finally by acquiring PayPal which all took place between 1996-2005.

At PayPal – they added further layers through international expansion, improving offerings for merchants (who sold outside of eBay) and starting to offer credit on top of payments business.

His earlier success at OpenTable also used the ‘layer on the cake’ growth strategy, and other public companies like Apple and Amazon utilized it heavily during this time (Apple by expanding its product like and Amazon by expanding into other non-merchandise businesses). The key to the success here almost always lied in

**brilliant innovation.**These winning Net companies are incredibly strong at

**product innovation.**The best innovations improve and compliment the core business of a company, taking advantage of and enhancing its most valuable assets.

Like most other start-ups, Facebook didn’t have many resources in its early days. Its very first server back in 2004 cost $85 to rent. They didn’t spend much and were small, tight and had a lot to prove.

They first went to schools that were hardest to succeed in. Zuckerberg figured that if they had a product that was better than all others, it would be worth investing in.

After first growing Facebook inside of Harvard, the plan was essentially to go hard or go home — to launch at universities that already had a really well integrated social network (like Columbia, Stanford and Yale). The thinking was that if Facebook succeeded there, scaling to less integrated schools would be a downhill and easier battle.

Some other great advice which Zuckerberg offered to the crowd:

**listen to your users, stay simple, be reliable.**

The way Pinterest grew had little to do with wisdom and more to do with regular grassroots marketing.

Initially had a small number of users (3000) who really loved the product.

Instead of changing the product (to accelerate growth) – they focused on finding more ‘core users’ who could love the product and what it offered, and it worked.

General advice: don’t simply follow the herd or follow Venture Capitalist advice. Fundamentally, the future isn’t written.

A startup is a company designed to grow fast. If you want to start one, it's important to understand that. Millions of companies are started every year in the US. Only a tiny fraction are startups.

Most startups fail. To grow rapidly, you need to make something you can sell to a big market. To do this, you have to 1) make something lots of people want and 2) reach and serve all those people. Writing software is a great way to solve 2, but you’re stilled constrained with 1.

Most successful startups are started by founders who notice ideas that most other people overlook. In other words, the founders are different and able to notice a blind spot the market has ignored.

The growth of a successful startup usually has three phases:

Initial period of slow growth while startup figures out what it’s doing.

The startup figures out how to make something lots of people want and how to reach them, they go through rapid growth.

Eventually, startup grows into a big company and growth declines again.

Y-Combinator measures growth rate per week. A good growth rate during YC is 5-7% a week. If you can hit 10% a week you're doing exceptionally well. The best thing to measure here is revenue; the next best thing is active users.

It's hard to find something that grows consistently at several percent a week, but if you do you may have found something surprisingly valuable. (1% weekly growth = 1.7x yearly multiple; 5% weekly growth = 12.6x yearly multiple; 7% =33.7x; 10% = 142x).

Why are VCs interested only in high-growth companies? The reason is that they get paid by getting their capital back, ideally after the startup IPOs, or failing that when it's acquired. Many startups fail, but the ones which do make it are growth focused and return a huge multiple of the initial investment.

A lot of startups are acquisition targets. Rapidly growing companies can bring great value (PayPal brought great value to eBay as an example).

If you want to understand startups, understand growth. When you start a start-up, you’re committing yourself to

**finding growth.**

Talks about Facebook’s growth team and its early days (went from 5 people to 150+). Early focus was on expanding market from universities to general population.

Some of their growth strategies focused on:

Partnering with Google and adding Facebook profiles to their search results.

Adding ‘People You May Know’ feature that let others discover more connections.

Making site available in other languages.

The team views a dashboard every morning which shows a running tally of monthly active users.

Next focus is on emerging international markets and Facebook use on mobile phones.

Other companies are also copying Facebook and now have their own growth teams.

In 1999, a study was done asking one group of students to memorize a 7-digit number while another group was asked to memorize only 2 digits. The group that memorized 7 digits were 50% more likely to take cake when being offered cake vs. fruit after the study.

Key finding: Willpower and cognitive processing draw from the same pool of resources.

Same type of study was done but with dogs, and the findings were similar: dogs which exercised more control over their cognitive resources (by waiting) were more likely to give up on solving a toy puzzle than dogs who didn’t need to wait patiently.

Cognitive resources are easily depleted: if you had to spend all day at work exercising self-control dealing with angry customers and co-workers, you’ll have less resources when you get back home.

Think about what this means for your users:

If your app asks users to make choices, you’re draining resources.

If your app is confusing, once again – draining cognitive resources.

Adding new features: yup, you’re taking more cognitive resources.

At each design meeting, ask: ‘Is this a fruit-choosing feature or cake-choosing feature?’ If you’re adding content-marketing to the site, will your users really use it or pay attention to it? Are you draining their resources by making them use your app? How can you make it easy to use and do the best possible job of not taking away from your users’ precious and easily depleted cognitive resources?

The whole article talks about what makes a design ‘intuitive’ – and explains that what’s intuitive to one person may not be intuitive to another. The difference is outlined by a ‘knowledge gap.’

In the grand scheme – there are going to be many users of a design. They can all be lined up on a spectrum ranging from ‘no knowledge’ to ‘all knowledge’. In other words – the knowledge of the interface ranges from knowing everything there is to know about it to knowing nothing about it.

Current Knowledge represents the knowledge the user has when they first approach the interface to complete the task. Target Knowledge is the knowledge the user needs to accomplish the task. Every user will have a different ‘current knowledge point.’

The difference between the target point and current point is normally referred to as “the knowledge gap”. Users can complete the task when their current knowledge = the target knowledge.

Two conditions are needed to bridge the gap:

Both the current knowledge point and target knowledge point are identical (i.e. the design is simple enough so that most users approaching it already are familiar with the elements needed in order to use it).

The points aren’t identical, but the current design helps them ‘bridge’ this gap by supplying enough information on how to use it.

To identify a user’s current knowledge point, it’s valuable to do field studies.

To identify the target knowledge for important tasks, perform user and usability testing.

Sometimes, making something intuitive doesn’t make sense for a company. As an example, Amazon doesn’t make it easy to find its customer service phone #. Instead, it makes return products very intuitive and easy (and uses a simple wizard interface to do so).

Embrace chaos. Managers love controlling things. Creatives don’t and nature needs entropy. Without entropy there is no creation (i.e. no work) so you need to embrace creativity and chaos.

Let people have time to work on stuff and avoid meetings.

Tells people to make small things and keep working on building small prototypes. He believes big ideas come from combining little things from lots of different parts / pieces.

Encourages employees to continually work on and build new things (but tries to put a timeline on it of 2 weeks). Idea: build lots of small things and keep working on new ideas.

Likes going around asking employees what they’re working on – if they’re not working on anything, likes to connect them to employees who are building or working on something and encourages them to help them out.

Doesn’t focus on ‘happiness’ when it comes to running a business. Knows that the job of a business is to generate money so tries to encourage the exploration of many new ideas and building new things which they could use to generate income or deliver to customers.

From a startup’s perspective the press is a tool. Dealing with press is part of the job of building a successful tech company.

At its most basic level, press is about telling stories.

Building great relationships with press people early on can pay great dividends.

Some companies that have used the hype cycle successfully are Uber, Twitter and

Airbnb. When a company reaches a peak, hype-wise, it’s time to start bearing fruit.

While everyone hopes for a Cinderella story, truth is that over 90% of startups fail. 74% percent of these startups fail because of premature scaling (i.e. the hype overwhelming product capabilities).

Hype is still the currency of Silicon Valley. People think Silicon Valley is a magical place filled with amazing technologies, but it also has some of the best marketers in the world.

When you’ve got nothing, hype, manufactured by blog posts and marketing gimmicks actually can help raise VC and capture the imagination of early adopters. Sometimes this works out for startups.

Hype may help companies get investments faster, but at the same time may scare away investors (at too high of a valuation).

For the past five years, the mythological desired outcome is such: Get your

TechCrunch post, get 1 million users, raise a huge round, sell to Google for $100

million. Now that narrative is changing — Get your Product Hunt link, TechCrunch post, Re/Code post, WSJ post, blog on Tech-meme and Hacker News, sell. But the average experience, even when you do sparkle, is fade. In other words, you might not need as much press as you think you might need.

So what’s the best media/hype strategy through all the risk and randomness? Go with Warren Buffet: "Markets are risky, good businesses are not."

In other words, have a positive attitude and design your business intelligently. Always ask:

What problem am I solving for customers?

Does my startup have a reason to exist?

How can I make my service even better?

Am I improving things for the economy or society at large?

Another 3 questions to ask founders:

Who is your closest competitor and what do you do differently?

What are the challenges of doing this?

What are your future plans?

Avoid over-promising and under-delivering.

Many people believe that we have a software bubble, but Marc Andreessen argues the opposite: companies like Apple were trading at around 15 P/E even though they had a highly differentiated product with huge margins.

Believes that we are in a middle of a broad technological and economic shift where software companies will take over large swatches of the economy (which did come true).

More businesses are being run through software delivered online.

The costs to run a tech company have gone down tremendously (used to cost him 150K a month to run Loud-cloud but now it costs 1.5K to run through Amazon’s web services).

With lower start-up costs and vastly expanded market for online services, the global economy will be digitally integrated. Look at the world’s largest bookseller, Amazon as an example. Today’s largest video service is Netflix. Music companies are becoming software companies too (iTunes, Spotify, etc...). Video-game industry and video-game makers are also growing fast. Pixar used to be a software company as well. Mobile phones have replaced cameras. The largest marketing platform today is Google. The fastest growing Telecom company is Skype. LinkedIn is today’s fastest growing recruiting company.

Software is also disrupting value-chains in physical based products (like autos). Modern cars have built-in entertainment systems, maps for guidance, etc...

Walmart uses software to power its logistics and distribution capabilities.

Oil and gas exploration uses software based solutions.

Health care is the next industry to be transformed (along with national defense industry as well).

We should be proud of this development within America!! Still, we face several challenges:

First, every new company is being built in the face of economic headwinds (side note: I totally disagree. Interest rates were at around 0% for a very very very long time).

Secondly, many people in the US and around the world lack the level of education and skills needed in order to benefit from this oncoming software revolution. There’s no solution to this other than to increase education, and we have a long way to go.

The new companies need to prove their worth: they need to build strong cultures, delight their customers, establish a competitive advantage and justify their rising valuations. No one should expect to build such companies easily: it’s brutally difficult to do so.

Instead of questioning valuations, people should seek to understand this new generation of companies and seek to expand them.

Most founders try to convince others to invest into them through a pitch, which is a mistake: instead of making a pitch, let the startup do the work. Explain to investors why the start up is worth investing in and explain it clearly and simply.

Investors are looking for start-ups that are going to be very successful, but many aren’t and only a few make it through. Out of the ones that make it through, a few become giants (distribution follows power law). Most investors want to know if you have a chance of becoming one of the big 15 successes.

How do you convince them that you will be a big success? You need three things:

Formidable founders.

A promising market.

Some evidence of success so far.

How can you be formidable? A formidable person is one who seems like they'll get what they want, regardless of whatever obstacles are in the way. Formidable also equates to being confident as well.

Truth: the way to seem most formidable as an inexperience founder is to speak the truth. Convince yourself that your startup is worth investing in and then learn how to explain it to others.

To evaluate whether your startup is worth investing in, you have to be a domain expert. Know everything about your market.

The time to raise money is not when you need it, or when you reach some artificial deadline, it's when you can convince investors.

Market: you need a plausible path to owning a big piece of a big market. Founders think of startups as ideas, but investors think of them as markets.

If there are x number of customers who'd pay an average of $y per year for what you're making, then the total addressable market, or TAM, of your company is $xy. Make xy big enough and you’ll get interest.

You don’t have to start with targeting a large market: many times it’s often better starting off with a small one that

**can**turn into a big one.One big question: if this is such a great idea, why hasn’t anyone else done it. Key: most start ups notice something other people miss or they ride a trend, and also: most companies seem like only pretty good ideas (including Microsoft) in the first few months after they’re started. A huge element of ‘luck’ plays a role in other words.

Rejection: don’t lie to investors. Also, it's not uncommon for a startup to be rejected by all the VCs except the best ones. That's what happened to Dropbox.

Different: don’t use bullshit language or ‘try’ to convince investors. Believe in your idea and show them it’s worth investing it.

In other words, make something worth investing in, understand why it’s worth investing it and explain it clearly.

Most startups that raise money do it more than once. A typical trajectory might be (1) to get started with a few tens of thousands from something like Y-Combinator or individual angels, then (2) raise a few hundred thousand to a few million to build the company, and then (3) once the company is clearly succeeding, raise one or more later rounds to accelerate growth. This essay is focused on #2.

What makes a company a start up? Rapid growth. If taking on money makes you grow faster then take it on, and if not, don’t raise money.

If you do decide to raise money: do it as quickly as possible so you can focus your attention to getting back to work (of building your company).

Most investors have 2 main fears: the fear of missing out or the fear of investing in a flop. To mitigate investing in flops, normal investors try to wait as long as possible to gain information about the company prior to investing (in order to minimize risk).

If you’re not fundraising and an investor wants to meet with you, don’t accept the invitation.

The best type of intro is from a well-known investor who has just invested in you. So when you get an investor to commit, ask them to introduce you to other investors they respect. The next best type of intro is from a founder of a company they've funded.

When searching for investors, talk to many in parallel and do it through breadth-first search weighed by expected value (where expected value = amount willing to invest * chances of investing).

Never leave a meeting with an investor without asking what happens next. What more do they need in order to decide? Don't be too pushy, but know where you stand.

Getting the first investor to invest in your company is usually the hardest. Once you have one, the rest follow more easily.

Once an investor commits, find out when you will get the money. Don’t say you raised money until the money is in the bank.

When an investor tells you "I want to invest in you, but I don't lead," translate that in your mind to "No, except yes if you turn out to be a hot deal."

If you need to estimate the upper limit on what you should raise, a good rule of thumb is to multiply the number of people you want to hire times $15k times 18 months. In most startups, nearly all the costs are a function of the number of people. If you have additional expenses, like manufacturing, add in those at the end.

Underestimate how much you want: err on underestimating how much you want to raise. Also keep your expenses low.

Be profitable if you can: it will give you a much stronger position in raising money. You want to tell investors that you’ll succeed no matter what, but raising money will help you do it faster.

Don’t try to maximize your company valuation. Y-Combinators best 2 investments had really low valuations when they raised (Dropbox and Airbnb). Beware of ‘valuation sensitive’ investors (investors who won’t invest unless they know your current valuation).

If you're surprised by a low-ball offer, treat it as a backup offer and delay responding to it.

Accept offers greedily: If someone makes you an acceptable offer, take it. If you have multiple incompatible offers, take the best.

Don't sell more than 25% in phase 2: Our rule of thumb is not to sell more than 25% in phase 2, on top of whatever you sold in phase 1, which should be less than 15%.

Have one person handle fundraising (if you have 2 founders, the other one can focus on working on the company). The person who handles this should be the CEO.

You'll need an executive summary (one-page paper describing in simple language what you plan to do) and (maybe) a deck.

Don’t get addicted to fundraising: you should be listening to your users, not focusing on raising money. Also, don’t raise too much money.

This ends part 1 of this series. You can find part 2 below:

]]>Quick sort is very similar to merge sort. The main difference between the two lie in the approach they each use in sub-dividing the input space. Quick sort uses a more elaborate approach in deciding which group each input element gets partitioned into. Instead of simply dividing each group in half, quick-sort picks a pivot element which it uses to create a decision boundary. This boundary or element controls how our input elements are partitioned and we recursively keep dividing our groups until the product becomes extremely easy to merge in sorted order. Quick sort has the same major advantages that merge sort does with the addition that it tends to be faster on modern hardware architectures. Noting this, let’s dive into the quick-sort algorithm.

The algorithm in its very essence is composed of the steps outlined below:

Pick a pivot element.

Partition the input into 3. To do this, we move non-pivot items to the appropriate side of the list, based on whether they’re less than or greater than the pivot value and we end up with:

First part: all elements in this part are less than the pivot.

Second part: contains the pivot itself.

Third part: all elements in this part are greater than or equal to the pivot.

Continue applying the sorting algorithm to the first and the third parts.

Join the first part, the pivot, and the second part.

Let’s go through a simple example to explain how the algorithm works. Let’s say that I have group of industrial robots which are working at my factory. We’re situated in 2150 and Microsoft and its subsidiary OpenAI have taken over the planet. Human labor is no longer needed and factories are now run by robots. The robots are not all in uniform height though – they’re all different models, and the ‘security robot’ has at the front gate has been complaining about constantly raising the ramp for each robot coming through the gate.

To remedy this – the plant manger decides to come up with a plan: instead of letting each individual robot exit the premises individually, he figures that if he sorts the robots from smallest to tallest, he can simply get them to all line-up and march towards the exit. If he gets them to march together with the smallest robot being at the front and largest at the back, the ‘security robot’ then just has to steadily raise the gate as the lined up bots march on through. Once that last tallest robot walks by – the security bot can lower the ramp. We are thus only raising our exit ramp once instead of having to raise it multiple times for each bot. This results in great energy savings and an amazing return in shareholder value!! Microsoft execs would indeed be very happy when reporting this to the CEO up at the head office, so they give the initiative a big thumbs up!

But how do we come up with a sorting plan and how do we instruct the robots to line-up without having to complicate too many things?

Well – we could simply use the merge sort algorithm, which looks very simple and should be easy to execute, but the plant manager decides to come up to a different scheme. He knows that these robots love recursion – so he decides to give them instructions as a recursive formula and in the following manner:

*All right boys, I need you all to pay attention. We need to all line-up, but the line up needs to be in sorted order. To be in sorted order, you all need to follow these instructions.*

*First, pick a random robot out of the group. We will call this robot ‘pivot-bot.’*

*Once pivot-bot is picked, I want it to move into the middle of the room.*

*Once this is done, you all need to figure out which side of pivot-bot to go to, and this will be decided by looking at pivot-bots height.*

*If you are shorter than pivot-bot, please line up to the left. If you are taller than pivot-bot, you need to go into a group located on the right.*

*Got it?*

*Great. Once you’re finished partitioning into different sub-groups, you all need to repeat the steps I outlined above for each of the sub-groups we just created (i.e. both the left and the right sub-groups need to execute the above instructions again)!! We will call each of our resulting sub-groups partitions, with the left subgroup and right sub-group belonging to its own sub-partitions.*

*Keep executing these instructions until each of the resulting sub-groups are only composed of one robot (i.e. one element) and we can no longer create further partitions.*

An example diagram showing this procedure is illustrated below:

*We’re not finished quite yet though. We need to join these groups / partitions now.*

*Once there are no more sub-groups to partition – I want each sub-group to merge together from left to right. The partitions will keep merging until we have all of the robots standing next to each other and until we have one master-partition containing all of our robots.*

Here’s another diagram which illustrates the merge procedure:

The plant manager sits down and watches this process takes place. It’s simple and it involves simple programming for our robots. There’s not much to remember here. The robots just recursively follow the same procedure over and over again until we get to our base case (we only have one robot left in our group). The merging process is extremely simple as well – the instructions are exactly the types of instructions robots are great in executing!!

The plant manager is happy, but something seems off. First off, the sorting mechanism works and the robots end up being in sorted position, but something still seems off. He things to himself – hmm, it looks like there is a LOT of robot movement when we execute this procedure. Is there any way we can make this more energy efficient? In other words, is there a methodology which we can use to not make these robots move so much?

So, he sits down and takes a look at the original algorithm. After sketching it out and thinking carefully about it, he decides to take a little break and finds a video of a Hungarian folk dance which strangely resembles the process he was just thinking about:

That’s it, he says!!

After sketching it out and thinking carefully, he devises another ingenious methodology of programming his robots: instead of picking a random ‘pivot-bot’ and making him/her move to the middle of the room and then making each robot compare themselves to the bot and making each robot move to one side, the manager comes up with an ingenious plan:

*All right bots, plan b. Instead of using a ‘random’ bot to pick out of each sub-group or group, we’re going to simplify things. Instead of using randomization, we’re simply going to use the first bot as the ‘pivot element’ or pivot-bot from each group.*

*Got it?*

*Now, instead of moving each pivot-bot to the center of the room and sub-dividing into 2 groups, I want each one of you to stay in the same position you’re currently located in and to follow the instructions I provide below:*

*We’re going to pass a baton to the left-most bot in the group (the 2 ^{nd} bot) and the right-most bot (last bot) in the group which are located to the right of our pivot-bot. Let’s label the left-most bot ‘left’ and right-most bot ‘right.’ Let me draw a simple visual so that this is easy for you all to see (each integer below represents a robot’s height):*

*We’re going to move each baton in a step by step manner and use the procedure below:*

*If the robot on the***left**side is**smaller**than our pivot-robot (the first robot in the group) – I want you to keep passing the baton forward to the next bot. You’ll keep doing this until you encounter a robot which is**larger**than our pivot-bot.*If the robot on the***right**side is**larger**than our pivot-bot – I want you to keep passing your baton to the left until you encounter a bot which is**smaller**than our pivot bot.*Now – assuming that the left baton is***still on the left**of the right baton, I want each robot with the left baton to exchange places with the bot carrying the right baton (and you’ll also exchange your batons as well!!).*Now, I want you to pass each baton to the bot ahead of you. I want you to keep repeating the above procedure as long as the left baton is still located to the left of the right baton.**If the left baton is to the right of the right baton (or if one robot is holding both batons), I want the robot holding the baton(s) located to the left to exchange places with the robot located at the pivot point.**Now, I want you bots to recursively repeat the same procedure for the group located to the left of the pivot element and for the group located to the right of the pivot as well. You’ll keep repeating the same procedure until all bots are sorted.*

You should be able to notice two primary things from the procedure outlined above.

Once we move our pivot-bot and make it exchange places with the robot holding the left-most baton, both of our sub-groups are partitioned such that the left-group contains all of the elements

**less than**the pivot while the right-group contains all of the elements**larger than**our pivot.We didn’t need to use an elaborate procedure in moving our robots. Whereas in our previous procedure, we had to allocate extra space for our pivot-bot to move to the middle of the room, we now avoid moving our elements and simply use a ‘baton’ or ‘pointer’ to partition each sub-group. Whenever we find elements which are ‘out of order’ - we simply ask our baton-holding bots to swap places and continue.

Noting the cost-savings in terms of space and movement, the manager inserts the new program into each bot and tests out the new procedure. He notes down the improved energy and space efficiency. He gloats once again about his divine optimization abilities and calls the head-office to announce more great news!

Of course, there are many other ways of accomplishing the above procedure. The reason we chose the above steps and illustrations was to introduce the concept of doing an **in-place **version of quick-sort. We also chose to use the first element as our pivot and this introduces issues when it comes to already sorted elements. Ideally, we want to choose a pivot element in a random manner. Each pivot should have around the same amount of elements located to its left side as the amount of elements located to its right. Choosing our pivot in this fashion plays an important role in making quick-sort efficient at doing what it does.

Either way, you should now at least have a good mental intuition of how the in-place version of this sorting algorithm works and why it works as well as it does. Instead of iteratively moving each element from each sub-group into a separate region of memory, we’re essentially now doing things in place and ensuring that our algorithm doesn’t require any extra ‘space’ in order to perform each sub-sort!

Below is a very simple implementation of quick-sort in JavaScript (**not in-place**):

```
// Function to return a random element from the given list
function random(list) {
return list[Math.round(Math.random() * (list.length - 1))];
}
// Function to perform quicksort on a given list
function quickSort(list) {
// Base case: If the list has 0 or 1 elements, it is already sorted
if (list.length < 2) {
return list;
}
// Choose a pivot element randomly from the list
const pivot = random(list);
// Partition the list into three arrays: elements less than, equal to, and greater than the pivot
const less = list.filter(i => i < pivot);
const equal = list.filter(i => i === pivot);
const greater = list.filter(i => i > pivot);
// Recursively apply quicksort to the two partitions and concatenate the results
return [
...quickSort(less),
...equal,
...quickSort(greater)
];
}
// Example usage:
const myArray = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
const sortedArray = quickSort(myArray);
console.log(sortedArray);
// [1, 1, 2, 3, 3, 4, 5, 5, 5, 6, 9]
```

For an **in-place** version, you can take a look at the implementation provided below:

```
// Function to perform quicksort on an array (in-place)
function quickSort(array, left = 0, right = array.length - 1) {
// Check if the array has more than one element
if (left < right) {
// Partition the array and get the pivot index
const pivotIndex = partition(array, left, right);
// Recursively apply quicksort to the left and right partitions
quickSort(array, left, pivotIndex - 1);
quickSort(array, pivotIndex + 1, right);
}
}
// Function to partition the array and return the pivot index
function partition(array, left, right) {
// Choose the pivot (for simplicity, using the rightmost element)
const pivot = array[right];
// Initialize the index to track elements less than the pivot
let i = left;
// Loop through the array to rearrange elements based on the pivot
for (let j = left; j < right; j++) {
if (array[j] <= pivot) {
// Swap array[i] with array[j]
const temp = array[i];
array[i] = array[j];
array[j] = temp;
// Move the index for elements less than the pivot
i++;
}
}
// Swap the pivot with the element at index i
const temp = array[i];
array[i] = array[right];
array[right] = temp;
// Return the pivot index
return i;
}
// Example usage:
const myArray = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5];
quickSort(myArray);
console.log(myArray);
// [1, 1, 2, 3, 3, 4, 5, 5, 5, 6, 9]
```

Let’s go over some advantages of using quick sort over merge sort:

Quick sort is usually faster than merge-sort.

It has better cache-locality than merge-sort.

Uses less memory than merge sort (if implemented correctly).

Some disadvantages of choosing quick sort over merge sort:

Merge sort is stable by design (equal elements keep their original order) while quick sort is not.

Merge sort is more parallelizable than quick-sort (i.e. better for multi-threading and distributing compute).

Merge sort uses (about 25%) less comparisons than quick-sort.

Quick sort’s worst case performance (O(n

^{2})) is worse than the worst case performance for merge sort (O(n log n)).

Overall, both quick sort and merge sort and fantastic algorithms and each have their own advantages. Quick sort tends to be used more often in the real world due to its better overall performance than merge sort on normal workloads. This performance advantage is mostly due to cache-locality.

Cache-locality refers to the micro-processors ability to fetch information / memory blocks in an efficient manner by keeping them close to the CPU. RAM usually tends to be fast, but an L1 cache for example is 100-fold faster than RAM - so keeping information within the L1/L2/L3 cache regions and next to the CPU often pays off. Quick sort takes advantage of this locality and thus executes faster for non-giant data-sets. There are many other nuances and complexities to locality which I won’t get into here, but hopefully you found the above explanations and overview helpful! If you liked this write-up, make sure to like and subscribe!

]]>To answer this question, we need to focus in on the speed. Let’s take a look at what Knuth says about the various sorting algorithms and their run-time performance in his famous book series “The Art of Computer Programming” (Volume 3):

These results indicate that quick-sort is fastest, but it only applies to Knuth's artificial computer! In the real world, things tend to get a lot more complicated. I don’t have time to cover all the nuances, but this post covering branch prediction (or misprediction in this case) does a fine job in showing how modern processors may not handle things as simply as we may perceive them to do:

Note also that the algorithms relate differently for small inputs. Here is a comparison of our previous algorithms for inputs of only a few items:

In other words, the smaller our input, the less it matters which algorithm we choose! The differences really do matter though when it comes to sorting large data sets like those which we can find within a large database table, so choosing the right sorting algorithm really does play an important role in the real world!

Noting this, let’s start off by explaining and going over the **merge sort **algorithm!

Merge Sort is a divide and conquer sorting algorithm which works by recursively breaking down a problem into two sub-problems of the same and related type. This process continues on until the two sub-problems become simple enough to be solved directly. The solutions to the sub-problems are then combined (or merged) to give a solution to the original problem.

You can think of it as the ultimate divide and conquer strategy. Let’s use an example to explain the entire procedure. Let’s say that you have a line up auditioning for a play, and you need to prioritize shorter people rather than taller people. The role involves some sort of gig which starts our with a short people joke, so the directors wanted to try to make sure that the selection process prioritized shorter actors. They asked you to sort the line up based on height, but you’re extremely anti-social, and you don’t even have the organizational skills to handle task of that magnitude! Instead of going through the line up and telling each individual which position to move to based on their height, you instead devise an ingenious plan. The instructions you give to the entire group are written and they go as follows (I’ve also included a sample diagram using integers (instead of actors) illustrating the process to help readers out):

‘*All right guys (you say), I want you to split in half. *

*If there are more than two people within your new group, or sub-group, you need to repeat this same step, and continue splitting the group in half. *

*Keep doing this until you have two or less than two people within your group. *

*If you do have two people left, I want you to stand behind the person in your group if you happen to be taller than them. *

*If you’re shorter, I want you to stand in front of them. *

*After you finish re-arranging yourself in this manner, I want you to line up in this order and merge back into the original group you split from . When you do merge back into a new group, I want you to use this procedure in determining which person to merge first into our new line-up: pick the smallest person from each group. In other words, if the person at the front of group one is smaller than the person at the front of group two, make sure that the person in group one goes first! Keep repeating this step until your two groups are combined together and fully merged into one. *

*Using the above procedure ensures the our newly merged groups will be in sorted order, so it’s important that you keep using this procedure!*

*Understood?*

*Great! You need to keep merging your groups using the above steps until you have only one group left. Once you finish merging, we will use this new order for auditioning for our play.’*

The above ensures that the new line up is in sorted order, and that your divide and conquer strategy keeps things simple: instead of having to keep the whole group order in mind, you sub-divide the group until you reach small enough groups where the actors can use a simple procedure in deciding how to re-arrange themselves using height as a comparison point. The merging process then ensures that the newly formed line ups give priority to shorter actors, and it’s simple enough to be followed without you having to intervene. You simply tell each sub-group to merge into a new group by continually picking the shorter actor from each group being merged. This ensures that our ordering property holds when combining our groups into one, and you’re pretty much set! Using the simple instructions above, if the actors follow the procedure, the final line up will be in sorted order!

Let’s outlined the algorithm steps more formally. Let’s say that as input, we have an unsorted list which we need to sort. In order to sort the list using our merge sort procedure:

We check if there’s only one element in our list. If there is, it is already sorted, so we return the one element.

If there’s more than one element, divide the list recursively into two halves until it can’t be divided anymore.

Merge the smaller lists into new list, making sure that each of the combined products is in sorted order when merging the 2 sub-lists.

A full visual illustrating a marge-sort process is also provided:

```
// Split the array into halves and merge them recursively
function mergeSort(array) {
if (array.length === 1) {
// Return once we hit an array with a single item
return array
}
// Get the middle item of the array rounded down by creating a variable
const middle = Math.floor(array.length / 2)
const left = array.slice(0, middle)
const right = array.slice(middle)
return merge(
mergeSort(left),
mergeSort(right)
)
}
// Compare the arrays item by item and return the concatenated result
function merge(left, right) {
let result = []
let indexLeft = 0
let indexRight = 0
while (indexLeft < left.length && indexRight < right.length) {
if (left[indexLeft] < right[indexRight]) {
result.push(left[indexLeft])
indexLeft++
} else {
result.push(right[indexRight])
indexRight++
}
}
return result.concat(left.slice(indexLeft)).concat(right.slice(indexRight))
}
```

The most important part of merge sort is its performance guarantee which is O(n*log(n)) — the highest performance we can get out of a sorting algorithm. Regardless of the original ordering of the input, we will always get very strong performance. There are no adversary test cases which can make it run much longer than this run-time, which is fantastic! Merge sort is therefore very suitable for sorting extremely large number of inputs!

Another advantage is that merge sort is considered to be a stable sorting algorithm. This means that the same elements in an array / list maintain their original position with respect to each other — so if we want to keep the elements within our original collection relatively ‘stable’ - we should choose merge sort over other algorithms.

**Some not so great properties of merge sort?**

Well, some people say that it is actually not so easy to implement from scratch (although I personally tend to disagree). The second one is that it requires additional O(n) storage during merging operation. It’s thus not very memory efficient and not in-place.

Hopefully you found this intro and overview helpful, and I hope you stay tuned for our next sorting algorithm — the one and only and the king of all sorting algorithms: quick sort!!

]]>Greek mathematicians like Euclid and Archimedes made significant contributions to optimization. Euclid's wrote "Elements" which laid the foundation for geometric optimization, as well as providing a proof that a square is the rectangle with the maximum area for a fixed perimeter.

During the Islamic Golden Age (8th to 13th centuries), Al-Khwarizmi's work on quadratic equations and linear systems contributed to the development of optimization techniques. Furthermore, the use of coordinates comes from René Descartes (1596–1650), who used two numbers to describe a point on a two-dimensional plane.

His insight linked algebra, with its analytic equations, to the descriptive and visual field of geometry. The development of the field of calculus by mathematicians like Isaac Newton and Gottfried Wilhelm Leibniz in the 17th century further provided powerful mathematical tools for optimization and the study of optimization has continued on steadily towards the modern day where we currently employ it quite heavily — from training neural networks to finding the best solutions to wide-ranging problems, the field is evolving to meet the demands of an increasingly complex and data-driven world.

I’m going to try to write a few blog posts which focus in on optimization over the next few months – as well as share some of my own visually annotated notes on optimization, so let’s start things off with a bang and with an overview of the most widely used and one of the most well known algorithms ever devised: Newton’s method.

At around 1670, Newton found an ingenious way of finding roots of equations. Let’s use the same equation which he used in order to demonstrate how his method worked, which is:

Newton explained that the root of this equation is known to be approximately equal to 2, so he therefore changed the x variable in the equation to equal to 2+p. The 2+p denotes the fact that we know that our solution lies near 2. We want to find the variable p such that our approximation gets closer to our real solution.

By substituting 2 + p into our original equation, we get:

Because Newton knew that the solution was close to 2 and that p was small, he completely discarded the p^{3} + 6p^{2 }component:

Once again and to re-iterate – the above assumption is obviously only true as p approaches 0! Since Newton knew this – he knew that as p approached an extremely small value, the square and the cube of the term approached something even smaller and so he completely eliminated these components from the equation. Either way, by discarding these terms, he had the following equation to work with:

which turns into:

Since x = 2 + p, we know that our new approximation turns into:

We now have a better approximation for our roots and x, which is 2.1!!

So, what do we do next? Well, Newton didn’t stop there. He repeated the above process, writing x = 2.1 + q and substituted this term into the main equation once again in order to obtain an estimate for q. Solving this new equation using the same techniques we used earlier, he obtained an estimate for q which is −0.0054 and then used this to obtain his next approximation for x:

By repeating the above process, we can get more and more accurate results for our roots and x, and the above in essence outlines what Newton’s method entails: finding more accurate results using calculus and iteratively improving our equation / function in a step-by-step manner.

Let’s define this process more formally and use a few visuals in order to illustrate exactly how Newton’s method works.

Newton’s method has a geometric interpretation, although Newton himself didn’t provide it. Also - at each stage of his method, Newton had to produce a new equation, which wasn’t very easy to do. In 1690, Joseph Raphson noticed that this wasn’t optimal and so he devised an ‘algorithm’ for deducing each term. Let’s assume that we’re trying to find the root of the function provided below:

We aren’t going to be too precise in defining our exact function. Here we simply want to provide a visual demo on how Newton’s method works. We also want to devise Raphson’s algorithm or the step-by-step procedure we can use for finding the roots for **ANY** equation in a much more elegant manner.

We already know the general outline of how the method works from going through our initial example: in order to find our root, we will take an initial guess (denoted by x_{0} in our visual below). We then add a term to denote our ‘difference’ between our initial approximation and our real root, and substitute it in to find the next term (or terms which we denote by x_{1}, x_{2}, etc… in our diagram) and repeat this process until our approximation ‘hits’ our root:

In the above outline, I’m actually using the Newton-Raphson method to find the next term, and so I drew in the tangents for each point instead of using a ‘difference approximation’ which Newton used earlier. We haven’t really gone through the logic of what Raphson did, but in essence, the outline is provided below:

We start off by providing a guess for our root which is shown by the point labeled

**x**in our diagram below._{0}To come up with a more accurate estimate, we first look at our initial guess located at x

_{0}and denoted by**(x**and we draw a tangent line to this point (which intersects our x axis at point labeled x_{0}, f(x_{0}))_{1}) down so that it intersects our x-axis:After doing this, we can perform the same step again – and so we take the tangent located at our new and more accurate estimate (x

_{1}, f(x_{1})) and travel down towards the next closest point (located more closely to our root):

We repeat the above process until we eventually get to a close enough value which we can call our ‘root’ or which we can say is the estimate for our root. This – in essence, is all that Newton-Raphson method attempts to do.

I skipped over quite a few details in my explanation though. How do we obtain our improved estimates exactly and how do we use the tangent line to get the next approximation? This is where real calculus and differentiation enters the picture.

We know that the slope of the tangent is the derivative of the function. As an example, in the below visual we can see that the derivative of f(x) = x^{2} (which equals 2x) equates to the slope / tangent line passing through each point:

We can use our derivative formula to come up with our next iterate / value. How do we do so? Let’s once again re-visit what the slope means:

So we now know that

and that:

and so rearranging the above gives us:

We can also provide a nice visual of this using our original function by showing how the tangent / slope formula fits in with our updates:

During each step, we can see that our tangent / derivative for iteration n equates to:

and that this matches our derived formula. Using the same methodology, we should be able to see that x_{n} can be updated to:

where f’(x_{n-1}) represents the derivative value of our function we obtain from our previous point, f(x_{n-1}) represents the function value and x_{n-1} is our previous estimate.

This is the formula Raphson came up with! We now have a way of deriving our next approximate value using this simple algebraic equation. Let’s go through an example to show how we can use the above formula to find various values / roots.

Let’s assume that we want to estimate the square root of 3. To do this, we first note that the algebraic formula for deriving this is:

To obtain our solution, we need to find the roots of the above equation – so we simply set f(x) to equal to 0:

We also know that the derivative of x^{2} – 3 is 2x:

Using our Newton formula, we thus get:

Now, we only need to take a good enough initial guess and the above formula will take care of the rest!! Let’s provide an initial guess of 1.2 and see where the above formula takes us. Thankfully, we don’t need to necessarily do all of our computations by hand. Instead, we’ll write a simple Python function to compute the results and output our value along with the approximations after each step:

```
INITIAL_GUESS = 1.2
def newtons_method(initial_guess, tolerance=1e-6, max_iterations=25):
x = INITIAL_GUESS
iterations = 0
while iterations < max_iterations:
# Calculate f(x) and f'(x) for the equation x^2 - 3 = 0
fx = x ** 2 - 3
dfx = 2 * x
# Calculate the new approximation for the root using Newton's method
new_x = x - fx / dfx
# Check for convergence
if abs(new_x - x) < tolerance:
return new_x
x = new_x
iterations += 1
print(f'The new estimate is {new_x} after {iterations} iterations.')
return x # Return the best approximation found within max_iterations
# Calculate the estimate for the square root of 3 using Newton's method
sqrt_3_estimate = newtons_method(INITIAL_GUESS)
print(f"Estimated square root of 3: {sqrt_3_estimate}.")
```

The above code produces the results below:

The new estimate is 1.85 after 1 iterations.

The new estimate is 1.7358108108108108 after 2 iterations.

The new estimate is 1.7320548799091033 after 3 iterations.

The new estimate is 1.7320508075736647 after 4 iterations.

Estimated square root of 3: 1.7320508075688772.

Indeed, with an initial guess of 1.2 and after only 4 iterations, we get an extremely close estimate to the square root of 3! In fact, let’s compare the result to our actual value:

After only 4 iterations, our method has yielded a result that has more than 16-decimal accuracy!

Newton’s method finds the roots of any equation by starting out with a good initial guess (x_{0}) an iteratively applying the below formula to find the next approximation which brings us closer to the root:

The above formula only brings us closer if our initial estimate is close to the root though, so the algorithm / formula is far away from perfect. When it does work though, it works amazingly well!

The method was and is widely used for solving a wide array of problems. It was and can also be used for solving equations that describe all sorts of physical phenomena — such as Kepler's equation, which describes the motion of planets in elliptical orbits. We won’t dive into the full and gory details here – we simply wanted to give a visual intro to this method and to hopefully shine some light on how it was initially devised!

Hopefully you found this intro useful – if you have any further suggestions or notes that you want to add, feel free to leave a comment!

]]>Let’s go back to the beginning so we can provide an in depth analysis on exactly what Huffman coding entails.

To represent strings using digital language, we need to come up with a scheme of encoding our letters and alphabet of characters into a sequence of ones and zeros which can be used to encode each letter in our alphabet. But how do we choose which binary representation to use for each one of our letters? Well, we could just randomize it such that a random and unique sequence of binary digits encodes a letter in an arbitrary fashion. As an example, we could just use the code ‘0’ to represent the letter ‘a’, ‘1’ to represent the letter ‘b’, ‘01’ to represent ‘c’ and so on and so on, but there are **2 big problems **with using such a scheme!

For one, doing this wouldn’t be very efficient! Why? Well, we know that different letters of the alphabet occur at different frequencies, so instead of using a simple random assignment, it would actually make more sense to try to use a shorter binary encoding for letters which occur more frequently while giving the less frequent ones a longer one!

Let’s take a look at the relative letter frequency of each letter in the English alphabet:

In the above example, we see that letters such as ‘e’ and ‘t’ occur much more frequently than letters such as ‘z’ and ‘x’. Knowing this, we should also be able to note that devising an encoding which represents the letter ‘e’ with a shorter encoding (such as ‘0’) and using longer encodings for less frequent letters such as ‘z’ would be a great thing to do! This way, we get an extremely efficient encoding of our string and we thus minimize the amount of information which we need to store / send!

We have another problem with using the scheme we illustrated above though. We can’t simply use a random assignment of binary digits to encode each letter. Assuming that we did, and assuming we used our original premise to encode a as ‘0’, b as ‘1’, c and ‘01’, d as ‘10’ and e as ‘11’ for example, if we had the binary encoding shown below:

Using sing our randomized encoding, we wouldn’t know whether it encoded *‘ababa’*:

or whether it belongs to the encoding shown below (‘*cad*’):

In other words, if we’re not careful, our encoding could be interpreted in **multiple ways, **and we can’t have this level of ambiguity! So how do we mitigate this?

Well, we enforce a rule: we make our binary encoding **prefix free. **What this means is that no encoded letter representation can be the prefix of another encoded letter. This way, any encoding which we encounter is unambiguous, and leads to one translation and only one representation of our encoded string!

As an example, if we made our above a, b, c, d, e encoding a valid prefix encoding — we could represent each letter as:

In the above instance, no prefix is used in encoding any other prefix. Using the above encoding ensures that a string such as ‘1 0 1 0 1’ which we saw earlier can only be interpreted in one way:

Now that we know a technique to use to assign unique encodings to the letters in our alphabet, can we come up with an algorithm or scheme such that the most frequent letters in it are assigned to shorter strings, while the less frequent ones are longer?

Well — this is exactly what David Huffman devised in 1951! This is what we refer to as **Huffman **coding.

Huffman coding works by creating a binary tree which uses the leaf nodes to store our letters, while using the path from our root node to the leaf as the binary representation of the letter. Whenever we transverse our tree using a left child, we use 0, and whenever we transverse to the right, we use 1.

As an example, we can use the below tree to encode the first 3 letters of the alphabet:

In the above instance, the letters are encoded using the binary strings shown below:

How did we get the above representation? Well, as we already stated, we get our encoding by traversing down from the root node of our tree to the leaf node. Each time we encounter a left child, we add the binary digit 0, and each time we encounter a right child, we add a 1:

There’s more to it than this though. We now have to find a method of constructing such a tree in a manner such that our more frequent letters have smaller encoding lengths than the longer ones, so our job is not done!

We need to find a way to encode a message written in an n-character alphabet so that it’s as short as possible. In order to do this, we construct an array of letter frequency counts which we denote as frequency [1 … n] and then we need to now find a way of computing a prefix-free binary code such that it minimizes the total encoded length of the message, and so our optimization function can be described as:

where depth[i] denotes the encoding or tree length of the character at index i while frequency[i] denotes the frequency of this character within our string! More specifically, in order to achieve the objective which we’re trying to achieve - we need to **minimize **the above function! Doing so ensures that our encoding is efficient. Why?

Well, the larger the depth of our letter leaf node, the longer the encoding, so it’s clear here that our frequency will need to be small. Otherwise, we would get a huge sum assignment to this representation, and we wouldn’t be doing much in order to minimize our function. On the other hand, the larger the frequency, the lower our leaf depth will have to be. Here, we need to ensure that no large weights and encodings are used for letters which have a high frequency! Our function thus serves as a great optimization function: in order to get our optimum, we need to make sure that our lower depth nodes (represented with shorter binary encodings) are assigned to more frequently encountered letters, while the less frequent letters are assigned to leaf nodes which have a larger depth (and larger binary encodings).

This is where our Huffman coding algorithm comes in handy. To find an optimum for our function and encode a string in an efficient way, we use the simple methodology outlined below:

Build a letter frequency table which holds the frequency of our alphabet over the string we’re trying to encode.

Look at the table and find the 2

**least frequent**letters / nodes.Merge them into

**one node**such that the 2 nodes above are the children of the new node, and the total weight of this node is equal to the**sum**of the frequencies of the 2 child nodes.Add the above node to our table, and continue repeating steps 2 to 4 until each one of the nodes in our table belong to our binary tree.

The above may initially look a bit complicated, but it’s actually very simple. Let’s take an example an illustrate how our algorithm builds an optimum tree.

Assume that we have the letter frequencies shown below:

Let’s go ahead and build a Huffman tree.

First, we take the 2 least frequent nodes (g and f), merge them into a new node, and add the new combined node back into our frequency table, as illustrated below:

We then repeat the same steps: this time, we see that our least frequent nodes are letters d and c, and so we combine them and add them back to our table:

Once again, we take our least frequent nodes (our combined g and f node with a frequency of 16 and our e node containing a frequency of 17), combine them, and add them back to our ordered frequency table:

We now only have 3 nodes left! We no longer show the insertion step – now, we’ll simply show the merging step of merging our 2 least frequent nodes (27 and 33), resulting with an ordered node list composed of only 2 nodes:

Finally, we merge our last 2 nodes to get our final Huffman tree, with each encoding shown below:

Our final letter encodings are provided below, and we obtain them by traversing down from the root node to the leaf node encoding our letter / symbol:

We can also see that our encoded character length / depth are constructed such that the more frequent characters have a lower depth than the less frequent once. Look at our final table, it’s easy to see that our final tree gives us an optimal construction for minimizing our message length:

Our total cost of encoding this string using our Huffman construction is 24 + 40 + 39 + 42 + 51 + 46 = 242, which means that it will take us 242 bits to encode our original message using our tree construction! Is there a more efficient way of constructing our tree?

Surprisingly, no!!! Huffman coding **always** constructs an optimal prefix free binary code. This means that we can’t beat our above encoding of 242 bits – any tree we construct will always either take 242 or more bits to encode the original message. The reason for this is due to the greedy nature of our algorithm. In each instance, we make the optimal choice of combining the 2 least frequent nodes into one, and recurring onward and continually using the same methodology until we have our final tree. This ensures that the least frequent letters in our encoding are assigned to tree nodes which have the lowest depth, while the more frequent nodes are assigned to leaf nodes located deeper down our tree.

We won’t go into the formal proof, but it’s good to note that we can easily prove that our Huffman coding always provides with an optimal encoding length by showing that our tree construction always optimizes our original cost function which attempts to minimize:

We also note that in order to efficiently construct a Huffman code, we can keep the characters in a priority queue, and use the character frequencies as priorities, which ensures that our algorithm runs in **O(nlog(n))** time.

Let’s provide the code for constructing a Huffman tree using Python to finish things off. Below - we use a priority queue to construct our tree and print out the results after we’re done:

```
import heapq
class TreeNode:
def __init__(self, name, priority):
self.name = name
self.priority = priority
self.left_child = None
self.right_child = None
def __lt__(self, other):
return self.priority < other.priority
def print_huffman_codes(self, code=''):
# Print the character and its Huffman code
if self.name is not None:
print(f"Character: {self.name} -> Huffman Code: {code}")
if self.left_child:
self.left_child.print_huffman_codes(code + "0")
if self.right_child:
self.right_child.print_huffman_codes(code + "1")
def build_huffman_tree(message):
# Create a frequency map
frequency_map = {}
for char in message:
frequency_map[char] = frequency_map.get(char, 0) + 1
# Initialize a priority queue
priority_queue = []
# Create nodes for each character and push them into the priority queue
for char, frequency in frequency_map.items():
new_node = TreeNode(char, frequency)
heapq.heappush(priority_queue, (frequency, new_node))
# Build the Huffman tree
while len(priority_queue) > 1:
left_priority, left_node = heapq.heappop(priority_queue)
right_priority, right_node = heapq.heappop(priority_queue)
# Create a new tree node with combined priority
new_tree_node = TreeNode(
name=None,
priority = left_priority + right_priority
)
new_tree_node.left_child = left_node
new_tree_node.right_child = right_node
heapq.heappush(priority_queue, (new_tree_node.priority, new_tree_node))
# The remaining node in the priority queue is the root of the Huffman tree
root = priority_queue[0][1]
return root
# Example usage:
message = "This is an example for Huffman encoding -- the most common characters are encoded using the least amount of characters."
huffman_tree = build_huffman_tree(message)
huffman_tree.print_huffman_codes()
```

Using the example string we provided above (‘This is an example for Huffman encoding -- the most common characters are encoded using the least amount of characters.’) - we get the encoding provided below:

```
Character: e -> Huffman Code: 000
Character: r -> Huffman Code: 0010
Character: H -> Huffman Code: 001100
Character: l -> Huffman Code: 001101
Character: d -> Huffman Code: 00111
Character: m -> Huffman Code: 0100
Character: t -> Huffman Code: 0101
Character: s -> Huffman Code: 0110
Character: c -> Huffman Code: 0111
Character: u -> Huffman Code: 10000
Character: i -> Huffman Code: 10001
Character: n -> Huffman Code: 1001
Character: o -> Huffman Code: 1010
Character: f -> Huffman Code: 10110
Character: p -> Huffman Code: 1011100
Character: . -> Huffman Code: 1011101
Character: g -> Huffman Code: 101111
Character: -> Huffman Code: 110
Character: T -> Huffman Code: 1110000
Character: x -> Huffman Code: 1110001
Character: - -> Huffman Code: 111001
Character: h -> Huffman Code: 11101
Character: a -> Huffman Code: 1111
```

It’s relatively easy to observe that our most common characters (such as ‘e’ and the space character) are encoded using the least amount of characters while the least frequent ones (such as ‘T’ and ‘x’) have larger encoding lengths!

Isn’t it beautiful? Through such a simple and elegant algorithm, we can construct an optimal tree with minimal effort. In addition to having an easy to construct tree, our decoding process becomes extremely simple. In order to decode any message, we simply traverse down our tree until we hit a leaf node. When we do hit one, we know that our binary encoding represents the letter present in our leaf, and we continue on traversing down again from root to leaf, adding letters as we encounter each leaf node until we get to the end of our encoded string and we have our original / decoded message.

Huffman coding is a fundamental technology that underlies many aspects of our digital lives. It's instrumental in making data storage, data transmission, and various media formats more efficient, allowing us to work with and consume digital content more conveniently and economically. It's the foundation of file compression formats, facilitating the compression of documents, images, and videos. Moreover, it plays a crucial role in networking, data storage, audio compression, mobile communication, and various voice encoding technologies, contributing to faster and more data-efficient digital experiences.

Hopefully you found this overview helpful and you now understanding how Huffman coding works!

]]>I couldn’t help but accidentally wonder into a post on Hacker News titled Terry Tow’s generals. The post discusses Terence Tao’s PhD defense and what he was asked during the 2-hour oral exam. It’s a very interesting interaction since 1) Tao is the world’s greatest living mathematician and 2) the whole interaction shows that he was far from invincible.

The key highlight from the whole interaction is provided below:

After this, they decided to pass me, though they said that my harmonic analysis was far from satisfactory. :( They didn't ask any real or complex analysis, but I guess from my handling of the special topics they decided that wasn't necessary. Besides, we were almost getting snowed in.

It looks like Terence took this feedback very seriously. In fact, Tao actually has a great write-up of how the whole interview impacted him and motivated him to do better and it’s a great read which I highly recommend:

A Close Call: How a Near Failure Propelled Me to Succeed

Key highlight:

After many nerve-wracking minutes of closed-door deliberation, the examiners did decide to (barely) pass me; however, my advisor gently explained his disappointment at my performance, and how I needed to do better in the future. I was still largely in a state of shock—this was the first time I had performed poorly on an exam that I was genuinely interested in performing well in. But it served as an important wake-up call and a turning point in my career. I began to take my classes and studying more seriously. I listened more to my fellow students and other faculty, and I cut back on my gaming. I worked particularly hard on all of the problems that my advisor gave me, in the hopes of finally impressing him. I certainly didn’t always succeed at this—for instance, the first problem my advisor gave me, I was only able to solve five years after my PhD—but I poured substantial effort into the last two years of my graduate study, wrote up a decent thesis and a number of publications, and began the rest of my career as a professional mathematician. In retrospect, nearly failing the generals was probably the best thing that could have happened to me at the time.

In other words – it served as a huge wake-up call for Tao and inspired him to take studying seriously!!

Either way, I got really curious and wanted to look into the topic of Harmonic Analysis since I wasn’t too familiar with it. After doing a bit of Googling and stumbling through various books and lecture notes, I couldn’t for the life of me find **any** good intro which explained the topic in a simple and intuitive manner!! Usually, when this sort of thing happens, I open a few books which I hold dear and make more attempts in order to find great intuitive write-ups on the said topic. After spending a lot of time looking around, I found the perfect introduction to the topic and it was available in the Princeton Companion to Mathematics. The Princeton Companion is a one of a kind reference to mathematics and has introductions and overviews in the key areas of math written by the world’s most prominent mathematicians, so the great write-up and explanation was not surprising at all. What was surprising? Finding out that the author of the intro was low and behold … the one and only **Terence Tao!! **Yes, the same Tao who almost failed his 2-hour exam due to perceived weakness on this topic!

Noting this – I decided to provide an annotated and visual overview of Tao’s introduction. I believe that it is masterfully written and I couldn’t really add too much to Tao’s work – below is a fully annotated version of this introduction (with a few of my own notes and visuals added which hopefully provide help to anyone who isn’t mathematically inclined also wanting to understand it).

Much of analysis tends to revolve around the study of general classes of **functions** and **operators**.

The functions are often **real-valued** or **complex valued**, but may take values in other sets, such as a **vector space** or a **manifold**.

**Real valued functions **are function which output real numbers**:**

**Complex valued functions **output 2-dimensional complex numbers.

**A vector space **outputs multi-dimensional vectors. An example 3-D space is provided below**:**

**A manifold **is any space which looks Euclidean (or ‘flat’) locally.

An operator is itself a function, but at a “second level,” because its **domain** and **range** are themselves spaces of functions: that is, an operator takes a function (or perhaps more than one function) as its input and returns a transformed function as its output.

*Visual which shows a sample domain and range of an operator (function which maps functions to other functions):*

Harmonic analysis focuses in particular on the * quantitative* properties of such functions, and how these quantitative properties change when various operators are applied to them.

What is a “quantitative property” of a function?

Here are two important examples. First, a function is said to be **uniformly bounded** if there is some real number M such that **| f (x) | < M **for every x (i.e. the absolute value of each function output is less than M).

*A visual showing the meaning behind what is meant by a bounded function:*

It can often be useful to know that two functions f1 and f2 are “uniformly close,” which means that their difference f1 − f2 is uniformly bounded with a small bound M.

Second, a function is called **square integrable** if the integral **| f(x) | ^{2} dx** is finite. The square integrable functions are important because they can be analyzed using the theory of Hilbert spaces.

*Example of a square integrable function:*

*Great overview to the question ‘What is a Hilbert Space’:*

https://physics.stackexchange.com/questions/437868/what-is-hilbert-space

A typical question in harmonic analysis might then be the following: if a function **f: R ^{n} → R** (i.e. a function which maps an n-dimensional (n

*Further notes and an example of what f: R^{n} → R means (showing an example in R^{2}):*

*Great gradient visual (showing am example derivative of a scalar field):*

If so, what are the precise bounds one can obtain? That is, given the integrals of **| f | ^{2}** and

Real and complex functions are of course very familiar in mathematics, and one meets them in high school. In many cases one deals primarily with special functions: polynomials (i.e. x^{2}+x-12), exponentials (i.e. 2^{x}), trigonometric functions (i.e. sin(x) or cos(x)), and other very concrete and explicitly defined functions. Such functions typically have a very rich algebraic and geometric structure, and many questions about them can be answered exactly using techniques from algebra and geometry.

However, in many mathematical contexts one has to deal with functions that are not given by an explicit formula. For example, the solutions to ordinary and **partial differential equations** often cannot be given in an explicit algebraic form.

*What are differential equations?*

A **differential equation** is a mathematical equation that involves one or more derivatives of an unknown function. In simpler terms, it's an equation that relates a function to its rate of change or how it varies with respect to one or more independent variables. A **partial differential equation** is a type of differential equation where the unknown function depends on two or more independent variables, and it relates partial derivatives of that function with respect to those variables.

In terms of real world analogies: imagine you have a cup of hot coffee, and you want to understand how the temperature of the coffee changes over time. If you only consider how the coffee's temperature changes with respect to time, you're dealing with an ordinary differential equation (ODE). It's like asking, "How does the coffee cool down over time?" However, if you want to account for how the temperature varies at different points within the coffee, you would need a partial differential equation (PDE) which you can think of as modelling how the temperature function **changes or evolves **with respect to time.

*Some great introductory material on differential equations and partial differential equations for anyone not familiar with what they represent:*

**Great Intro to Differential Equations:**

**Great Partial Differential Equations Overview and Intro:**

In such cases, how does one think about a function? The answer is to focus on its **properties** and see what can be deduced from them: even if the solution of a differential equation cannot be described by a useful formula, one may well be able to establish certain basic facts about it and be able to derive interesting consequences from those facts. Some examples of properties that one might look at are **measurability**, **boundedness**, **continuity**, **differentiability**, **smoothness**, **analyticity**, **integrability**, or **quick decay at infinity**.

*Some examples / useful visuals of function properties mentioned above:*

**Continuity:**

**Differentiability:**

**Integrability:**

One is thus led to consider interesting general classes of functions: to form such a class one chooses a **property** and takes the set of all functions with that property. Generally speaking, **analysis is much more concerned with these general classes of functions than with individual functions!!**

This approach can in fact be useful even when one is analyzing a single function that is very structured and has an explicit formula. It is not always easy, or even possible, to exploit this structure and formula in a purely algebraic manner, and then one must rely (at least in part) on more analytical tools instead.

A typical example is the Airy function:

*What is an Airy function?*

The Airy function is a special function that arises in various areas of physics and mathematics, particularly in the study of optics, quantum mechanics, and celestial mechanics. It is named after the British astronomer Sir George Biddell Airy, who made significant contributions to the field of astronomy.

The two most common forms of the Airy function are the Airy function of the first kind (**Ai(x)** which we showed an example for above) and the Airy function of the second kind (Bi(x)). These functions satisfy a second-order differential equation known as the Airy equation and are used to describe the behavior of waves in various physical systems.

The Airy function of the first kind (Ai(x)) is used to describe the amplitude of light waves as they pass through a circular aperture (e.g., in a telescope or camera) and produce an Airy disk pattern, which is a central bright spot surrounded by concentric rings. It also appears in the description of electron wave-functions in quantum mechanics.

Although the first order airy function Ai(x) is defined explicitly as an integral, if one wants to answer basic questions regarding whether Ai(x) is always a convergent integral or whether this integral goes to zero as x → ±∞, it is easiest to proceed using the tools of harmonic analysis. In this case, one can use a technique known as the **principle of stationary phase** to answer both these questions affirmatively, although there is the rather surprising fact that the Airy function decays almost exponentially fast as x approaches +∞, but only polynomially fast as x approaches −∞.

*The principle of stationary phase is a mathematical concept often used in calculus, particularly in the context of integrals and asymptotic analysis. It's a method for approximating integrals that have oscillatory or rapidly changing behavior. The key idea is to focus on the contributions to the integral from points where the function being integrated is "stationary" or doesn't change much, while ignoring the rapidly oscillating parts.*

Harmonic analysis, as a sub-field of analysis, is particularly concerned not just with qualitative properties like the ones mentioned earlier, but also with quantitative bounds that relate to those properties. For instance, instead of merely knowing that a function f is bounded, one may wish to know how bounded it is. That is, what is the smallest **M > 0** such that **| f (x) | ****<**** M** for all (or almost all) x ∈ R; this number is known as the **sup norm** or **L ^{∞}-norm** of f, and is denoted

*Examples of L^{1}, L^{2}, L^{p}, L^{∞}_{ }norms:*

Norms can be understood as mathematical functions that take a vector as input and return a scalar value, representing the "size" or "magnitude" of that vector. The **L ^{1} norm** of a vector x is defined as the sum of the absolute values of its elements. It represents the "city-block" distance or "taxicab" distance between two points in a grid, where you can only move horizontally or vertically. The

In essence, norms assign a non-negative number (or +∞) to any given function and provide some quantitative measure of one characteristic of that function. Besides being of importance in pure harmonic analysis, quantitative estimates involving these norms are also useful in applied mathematics, for instance in performing an error analysis of some numerical algorithm.

Functions tend to have infinitely many degrees of freedom (infinite amount of inputs and outputs), and it is thus unsurprising that the number of norms one can place on a function is infinite as well: there are many ways of quantifying how “large” a function is. These norms can often differ quite dramatically from each other. For instance, if a function f is very large for just a few values, so that its graph has tall, thin “spikes,” then it will have a very large L^{∞}-norm, but the anti-derivative of **| f (x) | dx** (its L^{1}-norm) may be quite small. Conversely, if f has a very broad and spread-out graph, then it is possible for the anti-derivative of **| f (x) | dx** to be very large even if **| f (x) |** is small for every x: such a function has a large L^{1}-norm but a small L^{∞}-norm. Similar examples can be constructed to show that the L^{2}-norm sometimes behaves very differently from either the L^{1}-norm or the L^{∞}-norm. However, it turns out that the L^{2}-norm lies “between” these two norms, in the sense that if one controls both the L^{1}-norm and the L^{∞}-norm, then one also automatically controls the L^{2}-norm. Intuitively, the reason is that if the L^{∞}-norm is not too large then one eliminates all the spiky functions, and if the L^{1}-norm is small then one eliminates most of the broad functions; the remaining functions end up being well-behaved in the intermediate L^{2}-norm. The idea that control of two “extreme” norms automatically implies further control on “intermediate” norms can be generalized tremendously and leads to very powerful and convenient methods known as interpolation, which is another basic tool in this area.

*Examples of ‘spiky’ and ‘broad’ functions (plotted in Python):*

The study of a single function and all its norms eventually gets somewhat tiresome, though. Nearly all fields of mathematics become a lot more interesting when one considers not just objects, but also maps between objects. In our case, the objects in question are functions, and, as was mentioned in the introduction, a map that takes functions to functions is usually referred to as an operator. (In some contexts it is also called a transform).

Operators may seem like fairly complicated mathematical objects—their inputs and outputs are functions, which in turn have inputs and outputs that are usually numbers—but they are in fact a very natural concept since there are many situations where one wants to transform functions. For example, **differentiation** can be thought of as an operator, which takes a function f to its derivative df/dx. This operator has a well-known (partial) inverse, **integration**, which takes f to the function F that is defined by the formula:

*Further notes on what differentiation and integration mean and the intuitive meaning behind both:*

**Differentiation:**

Suppose that we have a function of one variable which we denote as f(x). What does the derivative df / dx (or f’(x)) mean?

Answer: it tells us how rapidly our function f(x) varies when we change the input x by a tiny amount dx.

It could also be worded as: if we change our input x by an extremely tiny amount dx, then our function (f) output changes by an amount df. The derivative outputs for us a value or function which describes this ratio or proportionality.

In formal language, we define the derivative as:

In geometric terms, we can also visualize this as:

**Integration:**

The motivation behind integration is to find the area under a curve. To do this, we break up a function interval (who’s area we want to measure) into little regions of width Δx and add up the areas of the resulting rectangles.

Here's an illustration showing what happens as we make the width (Δx) of the rectangles smaller and smaller:

Indeed, as Δx approaches 0, we obtain even more accurate values representing the area under the curve! When we take an integral of something, that’s all we’re doing! So, we can think of the integral as being a direct translation shown below:

In other words, differentiation is an operator which outputs the rate of change of a function and which focuses on its local properties while integration provides an output by ‘summing’ up the accumulated changes of a function over a given interval and focuses on the global properties.

This is just a very small sample of interesting operators that one might look at. The original purpose of harmonic analysis was to understand the operators that were connected to Fourier analysis, real analysis, and complex analysis. Nowadays, however, the subject has grown considerably, and the methods of harmonic analysis have been brought to bear on a much broader set of operators. For example, they have been particularly fruitful in understanding the solutions of various linear and nonlinear partial differential equations, since the solution of any such equation can be viewed as an operator applied to the initial conditions. They are also very useful in analytic and combinatorial number theory, when one is faced with understanding the oscillation present in various expressions such as exponential sums. Harmonic analysis has also been applied to analyze operators that arise in geometric measure theory, probability theory, ergodic theory, numerical analysis, and differential geometry.

Tao goes on and shows an example demonstrating how you can take a simple class of functions and use them to approximate a set of much wider functions. He explains how these results can then be generalized from the simple to the ‘broad’ set of functions to obtain outstanding results in harmonic analysis. More specifically, to demonstrate this, Tao provides a special case of Young’s inequality using function norms and principles of continuity. He also goes on to sketch out the basic theory of summation of Fourier series and dives much further into some general themes in harmonic analysis, adding that the themes tend to be local rather than global and that when analyzing a function f, it’s quite normal to decompose it as a sum of k other functions *(f = f1 + f2 + f3 + … + fk).*

Overall, Tao’s introduction showed a mastery of the topic and greatly helped me understand it! The above intro and summary only provide a brief glimpse of what Tao discusses and you can find the full chapter / section which I highly recommend in the Princeton Companion. Since making the blunder during his oral exam, Tao hasn’t been a stranger to advancing math. He’s been awarded the Fields medal and was rightfully labeled as the Mozart of Math. He also has over 100,000 citations and has done an amazing amount of work on **compressed sensing **which has many real world applications!

*You can find a great and intuitive lecture explaining compressed sensing here (once again courtesy of the Mozart of math):*

]]>

**Answer:** it tells us how rapidly our function f(x) varies when we change the input x by a tiny amount dx.

It could also be worded as: if we change our input x by an extremely tiny amount dx, then our function (f) output changes by an amount df. The derivative outputs for us a value or function which describes this ratio or proportionality.

In formal language, we define the derivative as:

In geometric terms, we can also visualize this as:

where h represents an infinitesimally small amount (which approaches 0).

From our above illustration, we can see that the tangent line passing through (a, f(a)) has the same slope as our derivative. We should also be able to see that as h approaches 0 (usually denoted by h → 0), our tangent takes on a more accurate value, and can be used to model the instantaneous rate of change of our function at point a.

In other words, we can think of the derivative as a derived function or number which represents the instantaneous rate of change of a parent function.

To solidify the concept, let’s take the example function f(x) = x^{2}.

The derivative of x^{2} is the function f’(x) = 2x.

We can see from our formula that at point (x, x^{2}), the tangent representing the slope can be modeled by our derivative function. As an example, at point (1, 1), the slope of the tangent is 2 (2x = 2(1) = 2). At point (2, 4), our slope becomes 4 (2x = 2(2) = 4), and so on.

Let’s show how we derived our formula. We can do this by going through our formal definition for the derivative, which is:

We initialize our function to x^{2} and we get:

There are also 2 main notations for the derivative (Lagrange and Leibniz):

Now, we could settle for defining the notion of derivative using algebraic notation end the story there by continuing on and showing an easy set of methodologies we can use to find our derivative functions for each set of equations which we’re looking to model. We won’t end the story here though. It’s important to have a good geometric intuition of what exactly these derivative functions represent.

We’ve already shown that we can think of our derivative as being a function outputting the slope of each tangent line passing at every point within our function f(x), but we shouldn’t limit our geometric notions by using simple 2-dimensional graphs. We can extend our notion to higher dimensions.

We’ll do this by attempting to illustrate how derivatives can be thought of as function ‘surface areas’ while the function outputs themselves can be thought of as ‘areas’ or ‘volumes.’

In other words, we can think of the derivative as asking:

This concept is a bit hard to grasp for anyone seeing it for the first time, so let’s illustrate it by once again using the function f(x) = x^{2}.

We can model x^{2} as a 2-dimensional figure (or area) illustrated below:

In our example above, we model the function x^{2} using a simple square. At an input of 4, the figure outputs a 4 by 4 square and our function output produces the total area (which equates to 16). We aren’t limited to our one input thought. We can image that x can take on any value. As an example, at input or point 5, x would extend out and ‘stretch’ our figure so in this instance, we would get a new output (25 and a figure denoted by a 5 by 5 square):

We could continue on and imagine that any time we modify our input x, we obtain a figure denoted by a symmetric square. We don’t have to limit ourselves to whole numbers either. We can extend our notion to model any real input x, such that we may even be able to obtain ‘fractional squares.’ The reason we stick to whole number inputs is to illustrate the geometric concept which we’re about to derive and which is once again outlined below:

Let’s demonstrate this notion using our earlier example. Assume that we stretch the function space modeled by x^{2} by a very small amount. How could we imagine such a feat? For our current function (x^{2}), we know that these directions can be modeled through the arrows shown below:

We can imagine this as stretching our square grid by the tiny and infinitesimal amount in each x direction. For our example, we can see that each time we try to stretch in the x direction, we increase the figure area by a tiny amount:

In the above instance, we can see that the stretch factor, or the minute increase in area, can be modeled by 2x. Each time we ‘stretch’ our grid space by a small amount, we increase the area (amount) by 2x:

The above notion doesn’t just apply to 2 dimensions. Let’s illustrate this using a 3-dimensional example as well.

Let’s extend our intuition and apply our ‘stretch’ visual to more complicated functions, including ones which may not be open to simple human intuition.

Let’s take the function x^{3}. We can visualize this as a cube in 3-dimensional space:

The x arrows above show our imaginary ‘stretch’ dimensions. For our 3-dimensional example, we can see that each time we try to ‘stretch’ our figure in this direction, we increase the figure volume by a tiny amount. This tiny amount is marked by the 3 square areas present within our figure shown below:

We can see that in the instance above, we increase our volume by 3x^{2}. Each time we stretch in an x direction, we increase our volume by x^{2} and 3 * x^{2 }results in 3x^{2}.

Another fantastic visual of this is provided below (courtesy of 3Blue1Brown):

We can expand this notion to even higher dimensions, although our everyday intuition may not give us a good vantage point in imagining these. We will still do our best although a more intuitive knowledge of ‘dimensions’ will be needed in order to proceed.

We can think of each dimension as being composed of the dimensions beneath it, so that a point (which we denote here as having 0 dimensions) can be thought of as composing a line (which has 1 dimension). We think of a continuous series of points making up the line element:

We can furthermore expand on this and construct the 2^{nd} dimension and describe it as being composed of ‘slices’ of our 1^{st} dimension:

Looking at the second dimension mapping, we can imagine our 2-d grid as being composed of a series of 1-dimensional lines we stack one on top of each other, going out and onto the y direction. Looking at the above, it’s easy to see that this notion easily maps to our normal intuition of function graphs. Normally, when we draw a map of a graph (such as x^{2}), we’re simply mapping the function input (x) onto our x-axis and show how it maps to its output through the y-axis / dimension.

We can furthermore expand this and add a 3^{rd} dimension, which is once again composed of ‘slices’ of our 2^{nd} dimension:

We map our extra dimension onto an element which we call ‘z,’ and we thus have our regular 3-dimensional coordinate system composed of x, y, and z coordinates.

Furthermore, we can image our 2-dimensional grid expanding and mapping into a series of grids which make up our z-coordinate system. At each coordinate plane point, we can thus have a mapping to a 2-dimensional system defined earlier.

The only reason why we’re outlining the aspects above are to illustrate a key point: each higher dimension can be thought of as being composed of a series of the lower dimensional structures!

If we keep on building on this intuition, we can come up with the conception of the 4^{th} dimension. We can imagine our 4^{th} dimension as a ‘series’ of slices in our 3^{rd} dimension, and we thus conceive of what we’re all familiar with: time. We can think of time as a series of 3 dimensional transformations. We humans perceive it as having a ‘flow’ from 1 moment to the next, but in essence, all it represents is a series of transformations mapping objects which appear in 3-d onto a 4-d grid which appears to have a ‘forward’ flow.

Now, we don’t have ‘time’ as of this moment to talk about **time**. It’s an interesting fact that Einstein showed that this ‘flow’ is illusory: all of the 3-d frames that we perceive in everyday reality in other words exists at once, and us humans simply ‘experience’ the flow as being sequential! We won’t discuss time any further. Instead, let’s use our intuition to try to expand our 3-dimensional picture on and into the 4th dimension!

We can expand x^{3} and attempt to model the function as a higher power: x^{4}. How do we visualize this function? Well, let’s try to imagine our earlier 3-d grid expanding outward into a sequence of 3-dimensional ‘objects:’

In the above instance, we demo this conception when x = 4, and we can thus visualize why x^{4} = 256 by counting the number of cubes present in our example image!

The derivative of x^{4} is 4x^{3}. Once again, we won’t go into how we derived this (until later). For now, we want to simply show the ‘intuition’ of why the above function (x^{4}) maps to this derivative function by attempting to visualize the function space in this dimension!!! Can you see why it is that our new 4-dimensional function maps to this derivative?

Tip: as was the case with our previous notion, we need to imagine how our function ‘surface area’ expands as we increment x by a tiny amount!

Take a look at our function space above. We have our normal 3 dimensional directions, but what makes this scenario interesting is that we now introduce a 4^{th} dimensional component which represents a co-ordinate which multiplies our original 3-dimensional map each time we attempt to make a change! What does this mean? In terms of the derivative, each time we increase a coordinate, we not only increase one 3-dimensional surface area / volume, but x many of them!!

This means that for each x-increment, we must multiply our earlier results by x !!

Let’s first demonstrate this by showing the increase in volume which we obtain by stretching our horizontal (x) component:

Each time we expand this component, we expand the surface area by x^{2} times x, as can be seen in our above diagram. This equals x^{3}.

We include our vertical y and z components / dimensions and we can furthermore see that our volume for these components also equate to x^{3}:

We thus have each one of our regular 3-dimensional coordinates ‘expanding’ onto x^{3 }equating to 3x^{3}, but we know that our derivative function maps to 4x^{3 }! Where does the extra x^{3} element come from?

Once again, we have to remind our-selves that in addition to imaging the 3 ordinary dimensions, we imagine a 4^{th} one which expands from a series of our 3-dimensional coordinates!! Thus, although we’ve already taken care of our regular 3 coordinate system ‘expansions’, we still haven’t accounted for our 4^{th} system which encompasses all 3 elements and thus embodies another x^{3} element which we must add to our derivative function!!

With a lot of effort, we now have a ‘visual’ and geometric mapping of how to visualize our derivative functions! Let’s go through a few more derivatives in order to really cement this concept!

Let’s go over another famous derivative, although most of us don’t equate the term ‘derivative’ with the below formula. This one doesn’t map to our regular notion of Cartesian coordinates — it instead demonstrates the relationship between the area of a circle and it’s circumference

We already know that the volume / area of a circle is πr^{2} where π is a constant and r represents the circle radius:

Now, let’s try to imagine what this area function ( πr^{2 }) would ‘expand’ into if we increased it by our dimension r (denoted by Δr):

That’s right – we get the circumference, as expected!! This circumference formula is in essence our derivative as well – and the reasoning is the same as the one we provided previously. If we imagine a circle expanding by a tiny amount (dr) in our radius direction – what do we get? Yes – the circumference!

Think about it: once again, we can imagine that we’re stretching our circle ‘space’ in the r dimension. When we do that, we get the circle surface area (circumference) and we can confirm this algebraically by showing that dA/dr = 2πr, as expected!!

Let’s do a few more examples and try to visualize the results in order to grasp the above concept.

Let’s start off simply and take the derivative of:

**f(x) = 3x ^{2}.**

First of all, let’s visualize the function output in 2 dimensions:

As you can probably notice – we have 3 squares each with the same dimensions and with the sides of each square equal to x. So – what would happen if we were to stretch each one of our squares by a tiny amount in our x direction?

Think about it – if we were to stretch each one of our x^{2 }components in our x direction – the surface are of the component would be increased by 2x * dx (dx being a very small incremental amount approaching 0).

Since dx is infinitely small – we can ignore this component all-together and simply notice that the surface areas of x^{2} increases by 2x!! Since we have 3 of these areas – we simply multiply 2x by 3 and we get our function derivative, although visualizing this is much easier as shown below:

That’s right – we get the below visual and the derivative of our function (6x)!!

Let’s up the difficulty level. Instead of using only 2 dimensions, this time we’ll illustrate the same concept in 3:

**f(x) = 2x ^{3}**

Let’s start off by visualizing our function in terms of geometric areas:

In the above image – we have 2 blocks representing the function component denoted by x^{3}. Now – lets suppose that we stretch our function in the x direction by a tiny amount denoted by dx – can you see how our area increases?

This time – our ‘stretch’ is a bit more difficult to conceptualize. Instead of simply stretching on a 2-dimensional surface – here we are stretching our cube in the ‘outward’ direction and in 3 dimensions!! Imagine in other words that we are expanding our cube and pulling it in the 3 x directions which we provided above. In the process of doing so, we need to visualize exactly how much our function volume increases.

Here is the visual which we come up with for one x^{3} element:

As you can probably see – in terms of stretching x^{3}, we expand our function volume by a factor of x^{2} in each x direction. Since we have 3 separate x directions, our total area increases by 3 * x^{2}. Seeing the fact that we have 2 separate x^{3} volumes, our total volume would increase by 6x^{2} – once again matching our function derivative!!

Keep in mind once again that the total here doesn’t cover all of the smaller dx components which we would get in each corner of our shape by actually stretching it – but these dx components could once again be ignored due to the fact that our limits require us to pretend that dx approaches 0, so we essentially come up with an intuitive visual which does a simple job of showing what the derivative does in the language of geometry:

That’s all the derivative is!

The above in essence explains why the below derivative rule works:

Isn’t it great? From this incredible concept, we get this amazing little algebraic formula summarizing what we’ve been attempting to visualize in great detail and what we’ve been straining to do for quite a while now! At least now we may be able to build an intuitive picture though of why exactly it is that the generalization above works! I remember my university calculus teacher marveling over the wonder of why it was that this formula worked the way it did! I don’t believe that he attempted to try to visualize the function output using our dimensional representation we outlined earlier. He simply marveled at the fact that we got such a neat little formula / rule from and abstraction which didn’t really hint at any intuitive notion of why it is that the analytic concept of derivative would introduce such a divine formulation! Now – we can safely say that at least we made an attempt to visualize why it is that the derivative behaves the way that it does!

Not really. As an example, trigonometric functions aren’t as easy to visualize in terms of our earlier approaches. Also, due to the regular emphasis on drawing 2-dimensional representations of functions – most people today instead tend to prefer visualizing the derivative as a tangent line at each function point, as illustrated below:

The tangent line visualization also has some advantages as well. For example – in the above visual, we can easily see that as the tangent slopes downwards (from positive to negative), our function slopes down while as the tangent increases – our function increases as well.

In addition to this – using the above perspective, we should be able to see something which plays a vital role in calculus: that our maximum and minimum values lie on points where the tangent line slope is 0!

Either way, hopefully you found this intuitive and visual approach useful. The point of this post wasn’t necessarily to outline an intuitive overview of **all **derivative rules. The main point was to provide a visual intuition for why the **power rule **works. Many people have issues with understanding why the power-rule leads to such an elegant solution, and the above illustrations hopefully shine some light into the intuition as to why we get such a simple yet powerful rule!

Let’s say that we have a phone book, and that we need to search for a person’s number. Assuming that we have no index to reference, how do we perform our search?

Well, there’s the good old brute force way of doing it, which involves starting from the first name and checking if it matches with the name we’re looking for. If it does, we read the phone number and we’re done. Otherwise, we read the next word and repeat the same process continuously. If there are 100,000 names in the phone book, we’d have to go through 100,000 names, so this job will be very inefficient!

This is where a binary search would come in handy.

To perform a binary search, we take a look at the phone book, and we realize that the entries in it are sorted in ascending order. It begins with names starting from 'A' on the first page and the last page contains names starting with 'Z'. How can we use this to be more efficient? Well, we use this bit of information to continuously divide up our search space!

Let’s say that we’re looking for someone called ‘Bill’. First, we open the phone book so that we’re close to the middle page. We now look at the starting letters and the names which are present. We see that the entries all start with the letter 'M'! Now, we know that Bill starts with ‘B’, and since it precedes M, we know that we need to keep searching to the left of this page. We no longer need the second half of our phone book, which is to the right of current page, so we’ve effectively halving our search space!

Now, we limit your search to the left half. We find its middle page / element, and observe that the names on this page start with 'G'. Once again, we can see that Bill must be somewhere to the left of our element, so we once again restrict our search space and focus in on the left once again. We continually repeat this process until we find Bill, and we’re done.

Let’s assume that we’re presented with the elements shown below:

Let’s assume that we were looking for 27 in the above array. Instead of having to look at each element – we simply take the middle element, compare it with our search value and keep going down until we find what we’re looking for:

We could therefore visualize a binary search as being a simple tree with each child node or nodes representing the halving our search space:

If we encounter a middle element less than our search value, we transverse the left tree; otherwise, we transverse the right and we repeat this step over and over again until there are no nodes to transverse!

How much more efficient is this than a regular linear search? Well, assuming that the phone book contains 100,000 names, we know that we have to search through just 17 entries (2^{17} > 100,000) to find Bill. The larger the search space, the more benefits we see! Let’s assume that instead of 100,000, we have 1 million names. To perform a linear search, we’d need to go through all 1 million elements. Now, assuming that they’re sorted and we can use a binary search, we would just need to go through 20 elements to find whoever we’re looking for! Amazing, isn’t it?

In other words, since we keep halving our search space, it’ll take 2^{n} steps (20 in the case of a million elements) to find our element instead of the normal n steps (1 million) we’d have to take using a linear search. Halving our search space each time means that our algorithm performance will equal to log n instead of n which makes quite a big difference when it comes to large arrays of elements!!

```
function binarySearch(array, value) {
if (array.length === 0 || array[0] > value ||array[array.length - 1] < value) return undefined;
let left = 0;
let right = array.length - 1;
while(left <= right) {
const middleIndex = Math.floor(left + (right - left) / 2);
const middleValue = array[middleIndex];
if (value === middleValue) {
return middleIndex;
}
if (value < middleValue) {
right = middleIndex - 1;
} else {
left = middleIndex + 1;
}
}
return undefined;
}
```

Binary search is a fundamental algorithm with wide-ranging applications across various domains in the software world. Its efficiency in searching and retrieving information from sorted data sets makes it a cornerstone in optimizing numerous operations. One key area where binary search shines is in searching and querying databases. In database management systems, binary search allows for quick data retrieval from large datasets, enabling efficient execution of queries and reducing response times.

In essence, binary search's ability to significantly reduce the search space in a systematic manner has led to its integration into countless real-world applications and software systems, fostering enhanced performance and responsiveness in a diverse range of computational tasks.

]]>School was excruciating for him. He loved to learn but couldn’t focus or absorb anything in a classroom setting. Instead, he preferred to read on his own — in elementary school, he devoured all 10 volumes of an encyclopedia about living things — and to explore a mountain near his family’s apartment.

He’s not the first to not love school.

“*It is nothing short of a miracle that modern methods of instruction have not yet entirely strangled the holy curiosity of inquiry.” - Albert Einstein*

He wasn’t attracted to mathematics from the get-go and only developed a passion for it late in life.

He was indifferent to the subject, and he dropped out of high school to become a poet. It would take a chance encounter during his university years — and many moments of feeling lost — for him to find that mathematics held what he’d been looking for all along.

On any given day, Huh does about three hours of focused work. He might think about a math problem, or prepare to lecture a classroom of students, or schedule doctor’s appointments for his two sons. “Then I’m exhausted,” he said. “Doing something that’s valuable, meaningful, creative” — or a task that he doesn’t particularly want to do, like scheduling those appointments — “takes away a lot of your energy.”

In fact, it’s mentioned that there are periods where he doesn’t even perform the three hours of focused work that he sets out to perform and instead takes a break to re-read books which he encountered when he was younger.

He finds that forcing himself to do something or defining a specific goal — even for something he enjoys — never works. It’s particularly difficult for him to move his attention from one thing to another. “I think intention and willpower … are highly overrated,” he said. “You rarely achieve anything with those things.”

This shows the great power of letting one’s mind wonder -- not forcing yourself to grind through what you may not want to do, albeit this may not work for everyone. Huh later in his life developed a huge passion for mathematics which I believe allowed him to unconsciously ‘work’ on some of the problems he encountered without necessarily knowing it.

Huh applied to about a dozen doctoral programs in the U.S, but because of his undistinguished undergraduate experience, he was rejected by all of them except for one, and the only reason he got into that one (University of Illinois) is due to a recommendation by his teacher and Fields medal winner Heisuke Hironaka.

Huh’s inadvertent proof of Read’s conjecture, and the way he combined singularity theory with graphs, could be seen as a product of his naïve approach to mathematics. He learned the subject mainly on his own and through informal study with Hironaka. People who have observed his rise over the last few years imagine that this experience left him less beholden to conventional wisdom about what kinds of mathematical approaches are worth trying. “If you look at mathematics as a kind of continent divided into countries, I think in June’s case nobody really told him there were all these borders. He’s definitely not constrained by any demarcations,” said Robbert Dijkgraaf, the director of IAS.

In other words, taking the unconventional approach sometimes can give you a huge advantage over the rest of the field or research area. Exploring things and learning about them using different methodologies can lead to many insights that others might not be able to realize or see.

Another great quote on how his unconventional approach and the pursuit of deeper understanding:

He proceeds just as deliberately when doing mathematics. Wang was shocked when he first witnessed it. “I have this math competition experience, that as a mathematician you have to be clever, you have to be fast,” he said. “But June is the opposite. … If you talk to him for five minutes about some calculus problem, you’d think this guy wouldn’t pass a qualifying exam. He’s very slow.” So slow, in fact, that at first Wang thought they were wasting a lot of time on easy problems they already understood. But then he realized that Huh was learning even seemingly simple concepts in a much deeper way — and in precisely the way that would later prove useful.

Huh’s entire life is built on routine. “Almost all of my days are exactly the same,” he said. “I have a very high tolerance for repetition.” He has trouble staying asleep and usually wakes up at around 3 a.m. He then goes to the gym, has breakfast with his wife and two sons (one is 8 years old, the other just turned 1), and walks his eldest to school before heading to his Princeton office.

Huh likes repetitive, mindless activities like cleaning, dishwashing and the physical act of transcribing what he reads into a notebook. He often works in the public library, in the children’s section, where it’s pretty noisy. “I don’t like quiet places,” he said. “It makes me sleepy.” Huh says this about many things.

He goes for a long walk after lunch each day, then returns to his office to do some more work (unless he’s already hit his three-hour quota) before heading home.

It took Huh six years to graduate. In that sixth year, he enrolled in a class taught by the famed Japanese mathematician Heisuke Hironaka, who won the Fields Medal in 1970. Hironaka was charismatic and Huh quickly fell under his sway. He was vital in leading Huh into his mathematical adventure and late-life success.

Heisuke Hironaka is one of the premier algebraic geometers of the twentieth century.

He is best known for his 1964 work on the resolution of singularities of algebraic varieties over a field of characteristic zero. Hironaka took a strikingly original approach and created new algebraic tools suited to the problem in order to solve it*. *These tools also proved useful for attacking other problems far removed from the original resolution of singularities quest which led to him developing the new tool-set in the first place.

Hironaka has contributed much time and effort to encouraging young people interested in mathematics. In 1980, he started a summer seminar for Japanese high school students and later added one for Japanese and American college students; the seminars ran for more than two decades under his direction and continue to this day. Later on in his life, he wrote a best-selling memoir called The Joy of Learning which inspired a generation of Korean and Japanese students to learn and explore math.

*Coming from my experience as a mathematician, I think that what’s interesting is talking and ideas, not well organized lectures where you sit and listen and take notes. When young people want to have a creative life, they should learn to enjoy talking about ideas, even if the ideas are not well formulated or keep changing. In fact, one of the most interesting and enjoyable parts of a creative activity is that ideas change. This is how the seminars are run.*

On getting students to use their instinct and not just rely on knowledge:

*When a person works, he must have knowledge or he will make terrible mistakes. But at the same time, knowledge alone doesn’t do anything new. You must have instinct and somehow be conscious of making use of instinct. It is an interesting question how to give kids knowledge without having them lose their instinctive power. If you just keep pounding them with knowledge, most lose their instinct and try to depend on knowledge. This balance between knowledge and instinct is interesting.*

You can tell that Hironaka had a great passion for teaching and for letting his students carve their own path. He was known for making up his own theory as he went along rather than using existing theories; for using different approaches to tackling a problem rather than attempting to use a carved path and he left a great mark on Huh.

Hironaka didn’t necessarily leave a great impression on everyone though. Huh noted that although his class began with 200 students in it, only five students (Huh being one among them) were left after a few weeks passed. Although Hironaka’s unstructured and creative approach may not have attracted everyone, Huh loved it and noted that he loved going into every lecture with the suspense, freedom and uncertainty that each one brought, even though he might have not understood a lot of it.

“The math students dropped out because they could not understand anything. Of course, I didn’t understand anything either, but non-math students have a different standard of what it means to understand something,” Huh said. “I did understand some of the simple examples he showed in classes, and that was good enough for me.”

This wasn’t your average re-hashed material available in a normal college course. This was a great mathematician discussing how to tackle difficult problems he was struggling with right in-front of a live audience!

Huh discovered that this kind of mathematics could give him what poetry couldn’t — the ability to search for beauty outside himself in a way that opened him up even more than writing had. He found that unlike when he was writing, he was never motivated by the desire for recognition. He simply wanted to just do math.

When Hironaka was asked about when he got serious about the subject, he answered:

*“I think one of the times when I seriously started thinking about the possibility of becoming a mathematician was in senior high school, when a mathematics professor from Hiroshima University came to my school. He gave a general lecture to the students. It was a bit technical, so I couldn’t understand everything. But he said at the beginning of his talk something like, “ Mathematics is a mirror in which you can project everything in the world.” I was very puzzled by that, but also very impressed. I applied to Hiroshima University because I wanted to study with him.”*

In fact, Hironaka initially went into physics due to being inspired by Hideki Yukawa** **(who was the first Nobel laureate from Japan) but later realized that he was more suited to mathematics (since mathematical problems gave him more excitement than physics based ones).

Hironaka (on working with the best and learning from peers) is quoted as saying:

*“I tell youngsters, if you go abroad or even if you study in Japan, choose the best scholar in the field. But don’t expect you can learn from him! The amazing thing is that with that kind of person, there are many talented young people around, and you learn a lot from them.”*

When asked on what attracted him to the resolution of singularities, he mentioned that although he studies quite a bit of abstract algebra – he found the field itself too abstract and that he was attracted to geometry (even though he might not have been able to develop a good intuition for geometries in higher dimensional spaces). On picking a technique to tackle a problem, he noted:

*“I like basic things. Very clever people tend to jump to the new techniques: something is developing very fast, and you want to be on top of it; and if you are smart, you can be a top runner. But I am not so smart, so it is better that I start something where there are no techniques for the problem, and then I can just build step by step. But actually, it was not so hard. It turned out to be easier than I thought.”*

On solving the problem and learning from Grothendieck:

*“Let me explain a little bit about geometry. Geometry has global problems and local problems. Local problems are usually done by very concrete calculations. For instance, if you have an equation, then you can write down the equation, take its Taylor expansion, look at the terms, play with them. But then when you go back to the global problem, the local solutions do not fit each other. That is one of the problems that Zariski had. He had extremely local techniques: you have some geometric object, you modify it, and you localize it. If you localize it, then you can do many tricks, but then later you cannot connect it to have a global solution. With the resolution of singularities, Zariski had a hard time even in dimension 3, and finally he gave up. Generally speaking, it’s easy when you have one equation. But if you have many equations, then it’s difficult, or people had the impression that it’s difficult. But I observed that one can use induction to handle many equations. So I started from dimension 1, but with many equations. Then I noticed that the next dimension might have many equations, but it’s the same style. It’s a very simple observation, but that helped my local theory. Still, the global problem was there. You can’t have global coordinates; only locally do you have coordinates and equations. So I had a problem there, but Grothendieck—Grothendieck is an amazing fellow! He doesn’t look at the equations. He just looks at everything globally from the beginning.”*

In other words – **start simply and seek feedback from your peers (or from the best).**

In fact, Huh used a similar approach in earning his fields medal. He did extremely important work on the Rota conjecture with two collaborators —Karim Adiprasito, a master at combinatorics from Hebrew University of Jerusalem and Eric Katz, a combinatorial algebraic geometry and arithmetic geometry specialist from Ohio State University and they were both vital in proving the conjecture.

It takes great dedication and passion to do what June Huh did, and you can get a glimpse of what a great teacher in this Numberphile video where they ask Huh to explain the g-conjecture and he does an amazing job in doing so while keeping things extremely concrete and simple:

Quanta Magazine did an amazing job in reporting this story and full credit goes to them for the above content. I simply wanted to highlight what notes I took while reading about Huh – so hopefully this helps inspire someone who may not have much confidence or may be starting late in math. Some key lessons which I learned by reading about both Huh and Hironaka are provided below:

It’s never too late to do what you’re passionate about and to succeed in it.

There’s nothing wrong with being unconventional and seeking different paths.

Have ambitions and seek out difficult problems to solve.

When solving problems, don’t be afraid to explore and invent – seek a deeper understanding and look for unexplored pathways and problems which haven’t been probed.

Last but not least: seek out the best and learn and work with them.

Huh accomplished what many people would only have dreamed of. There are many more lessons and takeaways that can be both from him and his amazing teacher Heisuke Hironaka – here I simply outlined what I found important. If there’s anything I missed, feel free to leave a comment! Thank you for reading!

**References:**

https://www.quantamagazine.org/june-huh-high-school-dropout-wins-the-fields-medal-20220705/

https://www.quantamagazine.org/a-path-less-taken-to-the-peak-of-the-math-world-20170627/

https://www.daviddarling.info/encyclopedia/A/algebraic_geometry.html

]]>Invented in 1957 by Frank Rosenblatt, a perceptron is the simplest neural network possible. It’s supposed to represent the computational model of a single neuron. A perceptron consists of one or more inputs, a processor, and a single output.

The inputs are sent into the neuron, processed, and result in an output. We say that this follows a “feed forward” model.

Let’s go through an example of how a 2 input perceptron could process its inputs to produce a desired output by going through one example. Our toy model here is going to be similar to the perceptron we diagrammed earlier, although we will add a weight for each input neuron since these weights are the key to modeling our perceptron state:

To make things even simpler, we’ll rename our input state from ‘input 1’ and ‘input 2’ to x1 and x2 an our weights for each input from ‘weight 1’ and ‘weight 2’ to w1 and w2 – going forward, we’ll also use the variable y to model our resultant output:

So – how does our perceptron produce an output?

Simple. For this perceptron, we’ll just take the inputs (x1 and x2) and multiply them by their respective weights w1 and w2. The weighted inputs are then summed up along with a bias term (denoted as ‘b’ in the formula below) to produce the total input to the perceptron, denoted as z:

**z = (w1 * x1) + (w2 * x2) + b**

*Note: The bias term is a trainable parameter that allows the perceptron to adjust the output even when all the input values are zero. You can think of it as how much we need to ‘pivot’ or move our resulting function up or down in order to get it to produce the correct output.*

Next, the total input (z) is passed through an activation function to generate the output (y):

**y = activation_function(z)**

The output y can be thought of as the final result if the perceptron is used in isolation. Either that, or it can be passed to another layer of perceptrons in a multi-layer neural network, and this is usually how perceptrons are used today.

Overall – the end state diagram is provided below:

Let’s go through a simple example to show how it might process a real set of inputs.

Let's assume we have a perceptron with the following weights and bias:

**w1 = 0.6**

**w2 = 0.4**

**b = -0.3**

And suppose we have the following input data:

**x1 = 0.8**

**x2 = 0.2**

In the case of a simple binary output, the activation function is what tells the perceptron whether to “fire” or not. Activation functions can be quite complex, but we won’t add any more complexity here than we need to. To provide an output (or whether our neuron will ‘fire’), we will simply use a step function as the activation function.

A step function produces an output of 1 if the input is greater than or equal to zero and an output of 0 otherwise, so it’s a very simple activation function:

Now, let's calculate the output (y) of the perceptron:

**z =** (0.6 * 0.8) + (0.4 * 0.2) + (-0.3) = **0.66**

**y =** activation_function(z) = **step_function(0.66) = 1 (since 0.66 >= 0)**

So, for the given inputs (x1 = 0.8 and x2 = 0.2), the perceptron produces an output (y) of one. This is a simple example of how a perceptron works with two inputs and one output. In practice, perceptrons are combined to form more complex neural networks capable of solving a wide range of tasks.

In our above example, we used random weights and a random value for our bias in order to initialize our perceptron – but is using a randomly initialized function … well … useful? Given a set of data points in a binary classification task (i.e. where each point is classified as being in either one of two available categories) – can we use the above perceptron to make valid classifications based on our input?

Well, I wouldn’t be here writing this introduction / post if this wasn’t possible, so here it goes: yes, indeed we can! And how do we do this?

To train a neural network to answer correctly, we’re going to employ the method of supervised learning. With this method, the perceptron is provided with inputs for which there is a known answer. This way, the perceptron can find out if it has made a correct guess. If it’s incorrect, the network can learn from its mistake and adjust its weights. But how do we know how to adjust the weights (i.e. in what direction) as well as our bias to provide the correct outputs?

Let’s walk through an example and show the exact steps below:

**Step 1: Data Preparation**

We need a dataset with labeled examples. Let's generate some random binary data for this example:

Input Data (x1, x2) | Target Output (Class)

----------------------------------------------

Input Data: **(0.2, 0.3)** | Target Output: **0**

Input Data: **(0.8, 0.6)** | Target Output: **1**

Input Data: **(0.5, 0.9)** | Target Output: **1**

Input Data: **(0.4, 0.1)** | Target Output: **0**

**Step 2: Initialize Weights and Bias**

As in our previous example, we'll use the same initial weights and bias:

w1 = 0.6

w2 = 0.4

b = -0.3

**Step 3: Define the Activation Function and Learning Rate**

For this example, we'll use the step function as the activation function, and we'll set the learning rate to 0.1. The learning rate determines how much the weights and bias are updated during training. The step function once again simply outputs 1 if our input is positive and 0 otherwise.

**Step 4: Training Loop**

We'll iterate through the data-set multiple times (epochs) to train the perceptron. In each epoch, we'll calculate the output of the perceptron for each input and compare it with the target output. We'll then adjust the weights and bias based on the error.

The perceptron’s error can be defined as the difference between the desired answer and its guess.

**ERROR = DESIRED OUTPUT - GUESS OUTPUT**

In the case of the perceptron, the output has only two possible values: 1 or 0. This means there are only three possible errors.

If the perceptron guesses the correct answer, then the guess equals the desired output and the error is 0. If the correct answer is 0 and we’ve guessed 1, then the error is 0-1 = -1. If the correct answer is 1 and we’ve guessed 0, then the error is 1.

Let’s go through a few epochs and try to visualize what our perceptron is doing in each training step below:

Training steps:

Calculate the total input (z) for each input (x1, x2).

Pass the total input through the step function to get the predicted output (predicted_y).

Compute the error (target_output - predicted_y).

Update the weights and bias using the following update rules:

**NEW WEIGHT 1 = OLD WEIGHT 1 + LEARNING_RATE * ERROR * INPUT 1****NEW WEIGHT 2 = OLD WEIGHT 2 + LEARNING_RATE * ERROR * INPUT 2 BIAS = BIAS + LEARNING_RATE * ERROR**Or, if we want to model this in terms of our example it’s simply:

**w1 = w1 + learning_rate * error * x1****w2 = w2 + learning_rate * error * x2****b = b + learning_rate * error**Repeat steps 1 to 4 for each example in the dataset.

*Note: In practice, it's common to use more sophisticated activation functions (like the sigmoid or ReLU) and optimization techniques (like stochastic gradient descent) for training neural networks. However, for this example, we'll keep it simple with the step function and basic weight updates.*

Now, we would go ahead and apply the training steps to the provided data-set for a few epochs. After training, the perceptron should be able to classify new data points into one of the two classes based on the learned weights and bias.

Let's go through a couple of iterations of the training loop using the example we provided above:

**Iteration 1:**

For the first example (0.2, 0.3) for which the target output is 0:

Calculate total input (z)

= (w1 * x1) + (w2 * x2) + b

= (0.6 *

**0.2**) + (0.4 ***0.3**) + (-0.3)=

**0.09**Predicted output =

**step_function (0.09) = 1 (since 0.09 >= 0)****Error**= target output – predicted output =**0 - 1 = -1**(since the target class is 0 and predicted class is 1).Update weights and bias:

**w1 =**0.6 + (0.1 * -1 * 0.2) =**0.58****w2 =**0.4 + (0.1 * -1 * 0.3) =**0.37****b =**-0.3 + (0.1 * -1) =**-0.4**

**Iteration 2:**

For the second example **(0.8, 0.6)** for which the target output is **1**:

Calculate total input (z)

= (w1 * x1) + (w2 * x2) + b

= (0.58 *

**0.8**) + (0.37 ***0.6**) + (-0.4)**= 0.4**Predicted output =

**step_function(0.4) = 1 (since 0.4 >= 0)****Error**= target output – predicted output =**1 - 1 = 0**(since the target class is 1 and predicted class is 1).Update weights and bias:

**w1 =**0.58 + (0.1 * 0 * 0.8) =**0.58****w2 =**0.37 + (0.1 * 0 * 0.6) =**0.37****b =**-0.4 + (0.1 * 0) =**-0.4**

In this instance – you can probably observe that there are no updates since our expected output matches the perceptron output!

**Iteration 3:**

For the third example **(0.5, 0.9)** with target output **1**:

Calculate total input (z)

= (w1 * x1) + (w2 * x2) + b

= (0.58 *

**0.5**) + (0.37 ***0.9**) + (-0.4)**= 0.365**Predicted output =

**step_function(0.365) = 1 (since 0.365 >= 0)****Error**= target output – predicted output =**1 - 1 = 0**(since the target class is 1 and predicted class is 1).Update weights and bias: Since our predicted output matches our target output – there are no updates that need to be made in this iteration so we skip this step and go to the next iteration!

The fourth input (0.4, 0.1) produces a similar output to our previous 2 steps since the target output (0) matches our prediction using our current network weights, so we skip this step entirely. You get the point though. After more additional iterations, the perceptron continues to update its weights and bias based on the errors it encounters. The weights are continually adjusted to find the decision boundary that separates the two classes as accurately as possible. The training process would continue for more epochs until the model converges and the error becomes sufficiently small.

Let’s try to plot the evolution of our perceptron function throughout this example by plotting how our weights are moved in order to attempt to match our target output to our resultant output.

Our initial perceptron is basically divided a 3-d space according to the following rule / function:

```
def perceptron_function(x1, x2):
return
```**0.6** * x1 + **0.4** * x2 – **0.3**

We can plot this and visualize it in 3-d space using the Python script provided below:

```
import numpy as np
import matplotlib.pyplot as plt
# from mpl_toolkits.mplot3d import Axes3D
# Define the function modeled by the updated perceptron
def perceptron_function(x1, x2):
return 0.6 * x1 + 0.4 * x2 - 0.3
# Generate data points for plotting the function
x1_values = np.linspace(0, 1, 100) # Range of x1 values (0 to 1)
x2_values = np.linspace(0, 1, 100) # Range of x2 values (0 to 1)
x1_grid, x2_grid = np.meshgrid(x1_values, x2_values)
y_values = perceptron_function(x1_grid, x2_grid)
# Input data points
input_data = np.array([
[0.2, 0.3, 0],
[0.8, 0.6, 1],
[0.5, 0.9, 1],
[0.4, 0.1, 0]
])
# Create a 3D plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Plot the function surface
ax.plot_surface(x1_grid, x2_grid, y_values, cmap='viridis', alpha=0.8)
# Plot the input points
for data_point in input_data:
x1, x2, target_output = data_point
color = 'red' if target_output == 1 else 'blue'
ax.scatter(x1, x2, perceptron_function(x1, x2), color=color, s=50)
# Add labels and title
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('f(x1, x2)')
ax.set_title('Perceptron Function Model with Input Points')
# Show the plot
plt.show()
```

The plot which we get (after doing a couple of small rotations) is provided:

As you can probably see – our function simply divides our input space by using a linear 2-d plane. Any points which are above the 0,0 division below this plane are automatically labeled as 0 while any points above this are labeled as 1. Looking more closely at our 2 bottom points, we can see that initially our first point (0.2, 0.3) is drawn above our binary division plane (0,0,0):

Because of this, our current perceptron classifies the input (0.2, 0.3) as being 1 instead of 0. To account for the error, our weights are adjusted and our plane slope is lowered while the bias is increased to produce the new function shown below:

```
def perceptron_function(x1, x2):
return 0.58 * x1 + 0.37 * x2 - 0.4
```

After our training / update, we can visualize the new function and we should be able to see that our 2 points are now below the (0,0,0) plane and are thus pushed towards being classified as 0 (instead of 1) – which is now the correct label for our 2 inputs:

In other words – our perceptron essentially attempts to model a linear function. We adjust the linear function ‘weights’ to try to fit our inputs to produce the correct output.

We provide our target function output and initially assign a random function to ‘map’ our inputs to these outputs. Based on our target outputs – we then attempt to fit our weights to produce this target – and that in essence is all that a perceptron tries to do!!

Yes, a perceptron can have multiple inputs, but it is still a lonely neuron. The power of neural networks comes in the networking itself. Perceptrons are sadly incredibly limited in their abilities. If you read an AI textbook, it will say that a perceptron can only solve linearly separable problems. What’s a linearly separable problem?

Well – we just illustrated an example of one right above. It’s a problem within which we can divide our input space using a linear function. A simple one which can be visualized in 2-D space is provided below:

If you can classify the data with a straight line, then it is linearly separable (left). On the right, however, is non-linearly separable data. You can’t draw a straight line to separate the black dots from the gray ones. So how can we extend our perceptrons to be able to classify more complex data?

Easy!!

We extend our network and use multiple perceptrons!

The above diagram is known as a multi-layered perceptron -- a network of many neurons! Some are input neurons and receive the inputs, some are part of what’s called a “hidden” layer, and then there are the output neurons from which we read the results.

Training these networks is much more complicated that training a single perceptron! With one perceptron, we could easily evaluate how to change the weights according to the error. Here, there are so many different connections that we need to careful consider what direction to update our weights to fit our final output!

The solution to optimizing weights of a multi-layered network is known as backpropagation.

The backpropagation algorithm was a major milestone in machine learning. Before it was discovered, optimization methods were extremely unsatisfactory. One popular method was to perturb (adjust) the weights in a random direction (ie. increase or decrease) and see if the performance of the ANN increased. If it did not, one would attempt to either a) go in the other direction b) reduce the perturbation size or c) a combination of both. The above methodology though takes a very long time to discover the optimal weights and bias adjustments which we would need in order to provide accurate classifications!

Once again, the goal of any machine learning problem is to select weights and biases that provide the most optimal estimation of a function output which models the training data we feed in. For a simple perceptron – this may seem simple, but for a multi-layered one – things get a bit more complicated! Instead of having to update the weights and biases of one single neuron – we need to find a way to update the weights and biases of all of the neurons in our network layers!

Thankfully – there is a great tool which we can use to do this! It’s called calculus! How can we use calculus to adjust our weights?

Here is a sample diagram of a single-layered, shallow neural network:

As you can see, each neuron is a function of the previous one connected to it. In other words, if one were to change the value of w1, both “hidden 1” and “hidden 2” (and ultimately the output) neurons would change. Because of this notion of functional dependencies, we can mathematically formulate the output as an extensive composite function:

**output = activation(w3 * hidden 2)**

**hidden 2 = activation(w2 * hidden 1)**

**hidden 1 = activation(w1 * input)**

And thus we get:

**output = activation(w3 * activation(w2 * activation(w1 * input)))**

Here, the output is a composite function of the weights, inputs, and activation function(s). It is important to realize that the hidden units/nodes are simply intermediary computations that in actuality can be reduced down to computations of the input layer.

Let’s also attach a black box to the tail of our neural network which represents the error and call it ‘J’. This black box will compute and return the error (using a cost function) from our output:

If we were to take the derivative of the function with respect to some arbitrary weight (for example w1), we would iteratively apply the chain rule (once again using calculus) in order to attempt to **minimize the error (J) **of our above output.

The derivative of the error with any arbitrary weight can be modeled using the derivative function below:

Each of these derivatives can be simplified once we choose an activation and error function such that the entire result would represent a numerical value. At that point, any abstraction has been removed, and the error derivative can be used in an algorithm which is called **gradient descent.**

We don’t have the time to dive into the full details of backpropagation and gradient descent here, but I hope that you get the gist of what’s going on. If you’re looking for more info or a basic intuition on gradients and gradient descent, you can find my introduction to both concepts available in the link below:

A Brief Visual Introduction to Gradients and Gradient Descent

After doing so, you can get a great overview of how gradient descent and backpropagation work by viewing these great videos provided by 3Blue1Brown:

Gradient descent, how neural networks learn

What is backpropagation really doing?

You can also find a very simple implementation of a neural networks using the Numpy library in my AGI repository:

Another implementation which allows you to visualize the neural network architecture and decision boundary is available here**.** Using this implementation, we can see that the decision boundaries for multi-layered networks are definitely non-linear.

Neural networks had a slow start and faced significant challenges in their early days. Mostly due to limitations in computing power and a lack of data, their practical applications were severely restricted. It wasn't until recently that they truly began to showcase their potential and transform various aspects of modern life and business.

One of the key figures instrumental in propelling neural networks into the spotlight is Jeffrey Hinton, whose groundbreaking work in the 1980s and 1990s laid the foundation for modern deep learning. Hinton's played a vital role in developing the backpropagation algorithm which allowed neural networks to be trained much more efficiently than when they were originally conceived. Ever since then, there has been a resurgence of interest in them. Some notable areas where neural networks have transformed the modern world are provided below:

**Natural Language Processing (NLP):**Neural networks, especially large language models like ChatGPT have revolutionized NLP. These models can understand and generate human-like text and attempt to solve complex real-world problems in a fraction of the time that it takes humans to do so (albeit they also hallucinate and generate a lot of wrong answers as well).**Computer Vision:**They power self-driving cars, medical image analysis, and quality control in manufacturing.**Financial Analysis:**They’re used in predicting stock prices, fraud detection, credit scoring, and risk assessment, providing valuable insights for financial institutions.**Healthcare:**In medical diagnosis, neural networks aid in identifying diseases from medical images, analyzing patient data, and discovering patterns in genetic data, leading to personalized treatment options.**Marketing and Customer Insights:**Neural networks enable businesses to analyze vast amounts of customer data for personalized marketing, recommendation systems, and customer sentiment analysis.

In fact – the above answer was generated by ChatGPT itself! At the current moment, there are many other uses of neural nets. There are even discussions around their use in generative and generalized intelligence agents, but that discussion is a bit out of scope of our current introduction.

We’ll end our write up by simply stating that the usefulness and applicability of neural networks is unlimited and that human beings have only begun to unravel their incredible potential! Over the next few years, expect a lot more to come out from this incredible landscape as neural networks continue to evolve.

Hopefully you found this intro helpful! If you have any further question or suggestions, please feel free to leave a comment and I’ll make sure to address them as soon as I can!

]]>For scalar fields, the gradient of a scalar field f represents the direction of fastest ascent. As an example, if we take the scalar field / function plotted below:

The gradient plotted over its surface is:

As you can probably tell, the resultant vector field points in the direction of fastest ascent (i.e. the slope where it is increasing most) while its length represents the slope of the function at that point. In other words, if you keep following the gradient, you will eventually reach a **local** maximum!

The gradient can be represented as a vector of multiple partial derivatives. As an example – in 2 dimensions or in terms of a 2-dimensional vector field, the gradient can be written as:

or:

Let’s take an example function provided below:

f(x, y) = 3x^{2} + 8xy – 3y^{5}

To find the gradient, we first have to find the partial derivatives of this function:

df / dx = 6x + 8

df / dy = 8 – 15y^{4}

The gradient of the above function is therefore:

∇**f ****= ⟨ **df/dx , df/dy**⟩ = ⟨ **6x + 8, 8 – 15y^{4 }**⟩**

Gradients are extremely useful. They’re currently being utilized almost everywhere in the context of machine learning and AI, as well as mathematical optimization / physics / many other domains. Backpropagation (a technique for calculating gradients) enables neural networks to adjust their weights and biases such that they can ‘learn’ and reach their desired end-state. One of the key algorithms used to do this is called gradient descent.

Gradient descent is a first-order optimization algorithm which is used to find a local minimum of a function. Using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point. Depending on where you start, the descent converges to the local optimum.

Another intuitive way to think of gradient descent is to imagine the path of a river originating from top of a mountain. The goal of gradient descent is exactly what the river strives to achieve - namely, to reach the bottom most point (at the foothill) climbing down from this mountain.

The algorithm is outlined below:

Define a model depending on W (the parameters of the model).

Define a loss function that quantify the error the model does on the training data.

Compute the gradient of this loss.

Adjust W to minimize the loss by following the direction of the computed gradient.

Repeat until:

Convergence or

The model is good enough

Let’s illustrate how gradient descent works by wring a mini-python script which uses this algorithm in order to reach the function minimum:

```
import numpy as np
import matplotlib.pyplot as plt
# Hyperparameters
LEARNING_RATE = 0.1
NUM_ITERATIONS = 15
INITIAL_GUESS = -8
# Define the target function to be optimized
def target_function(x):
return x**2 + 5*x + 6
# Define the gradient of the function (the derivative of our target function)
def gradient(x):
return 2*x + 5
# Gradient Descent algorithm
def perform_gradient_descent(learning_rate, num_iterations):
x = INITIAL_GUESS
updates = [x]
for _ in range(num_iterations):
gradient_value = gradient(x)
x = x - ( learning_rate * gradient_value )
updates.append(x)
return updates
# Function used to plot our updates and function
def plot_results(updates):
# Generate x values for the function plot
x_vals = np.linspace(-10, 2, 400)
# Calculate corresponding y values using the target function
y_vals = target_function(x_vals)
# Create a new figure for the plot with a specific size
plt.figure(figsize=(10, 6))
# Plot the function curve
plt.plot(x_vals, y_vals, label='Function Curve')
# Plot the updates made by gradient descent as red points
plt.scatter(updates, target_function(np.array(updates)), color='red', label='Gradient Descent Updates')
# Set labels for the x and y axes
plt.xlabel('x')
plt.ylabel('f(x)')
# Set the title of the plot
plt.title('Gradient Descent Converging to a Local Optimum')
# Display a legend to differentiate between the function curve and updates
plt.legend()
# Display a grid to aid visualization
plt.grid(True)
plt.show()
updates = perform_gradient_descent(LEARNING_RATE, NUM_ITERATIONS)
plot_results(updates)
```

The output we get from running the above script is provided below:

As you can probably see – our script updates our x value from our initial guess (-8) to values which gradually approach our function minimum (-2.5). In order to do this – we simply follow the direction of our gradient down towards our optimum by continuously updating our x value by a subtracting the gradient at each updated x point multiplied by a constant we call the ‘learning rate.’ The learning rate determines the step size we take in order to reach our minimum, and it plays a vital role in machine learning and gradient-based optimization. We won’t dive into too many details in how this parameter is adjusted or needed. Here – we simply wanted to provide an extremely simple demo outlining how the gradient descent algorithm works!

There are many real world applications of gradients. In the study of heat transfer, gradients are essential for understanding how heat flows between objects. In engineering, gradients are used to analyze stresses and strains in materials. They’re also used in computer graphics to create realistic terrain rendering, giving depth and texture to virtual landscapes and are vital in edge detection algorithms which identify object boundaries in images used in object recognition and segmentation. We could keep going. Here – we simply scratched the surface. Hopefully though, readers new to gradients found this brief into overview useful!

]]>On Urban dictionary in-fact, the number 1 up-voted answer is this:

The article in question is provided below:

https://www.washingtonpost.com/lifestyle/2023/06/13/astrology-millennials-gen-z-science/

Let’s get going.

Kaelen Larocque was born on Jan. 1, 1996. To her, it’s far more than merely her date of birth.

It’s the date that determines parts of her personality, her passions, her romantic interests and, to some degree, her life decisions.

Like others who subscribe to bullshit, she is a firm believer that celestial objects have a direct effect on human life. Learning about bullshit, Larocque explained, has enhanced her self-awareness and deepened her bonds with others. She knows who she is cosmically compatible with — and, conversely, with whom she is not.

“I’ll never date a Pisces again,” she half-jokingly declared.

Bullshit is a useful introspective tool “for people who want to delve deeper into themselves,” said Larocque, 27, a Boston-based middle school teacher and yoga instructor. “The best thing we can do as people is to learn about ourselves and try to improve.”

Larocque is not alone in turning to bullshit as a means of self-improvement, introspection and personal growth. The bullshit field is booming — a trend that has been driven by younger generations, experts say, and is evidenced by the countless websites and platforms that cater to the astrologically inclined. These include Co–Star personalized bullshit — which is ranked among the top 40 lifestyle apps in the country — along with zodiac-centric dating apps, dozens of bullshit podcasts, best-selling books and myriad bullshit meme accounts on social media. “Mercury in retrograde” has become a household phrase.

According to Allied Market Research, the global bullshit industry was valued at $12.8 billion in 2021, up considerably from $2.2 billion in 2018. By 2031, it’s expected to rise to $22.8 billion.

Bullshitters say the field has surged in popularity for several reasons, the most salient of which is better accessibility through technology. Next is the pandemic, and the perilous mental health crisis it propelled. Research has shown that people are more likely to be drawn to divinatory practices in times of tumult and uncertainty.

“Definitely over the past decade there’s been a rising interest, but even more so since the pandemic,” said Tracey L. Rogers, a Philadelphia-based bullshitter and life coach. “There were a lot of people reaching out and wanting some guidance on how to navigate those times.”

Not everyone views bullshit’s rising prominence as a positive development. Although market research firm YouGov found that a quarter of Americans claim to believe in it, there is no scientific evidence that supports bullshit. Some experts say that while finding joy and fulfillment in reading bullshit materials is one thing, basing major life decisions entirely on disproven science is risky.

For Larocque, incorporating bullshit into her life isn’t necessarily about buying into every word she reads in her horoscope. She doesn’t view bullshit as one-size-fits-all, but rather an adaptable ideology. “You pick and choose what resonates with you, and what doesn’t,” she said.

Larocque, a Capricorn, became bullshittingly curious seven years ago in the wake of a bad breakup with — unsurprisingly — a Pisces. Now, it would take a lot of convincing for her to get involved with a potential partner of that sun sign.

Rogers noted, as Larocque did, that many people tap into bullshit to help them handle hardship or cope with change.

“Being armed with this information, you can empower yourself and make the most of things,” Rogers said.

Bullshit has been practiced since the 3rd millennium B.C. and has undergone waves of popularity over centuries in different parts of the world. Since the early 1700s, bullshit — which was once tied to astronomy, the study of celestial objects — has been widely rejected by the scientific community.

“There was a parting of the ways between bullshit and astronomy,” said Sten Odenwald, an astronomer and the director of STEM resource development at NASA. “Bullshit doesn’t statistically work. The premise is wrong; the physics are wrong.”

Bullshitters purports that everyone has a sun sign, a moon sign and a rising sign — commonly called the “Big 3” — each of which says something different about a person. The sun sign embodies the essence of an individual’s personality, the moon sign signifies their emotional demeanor, and the rising sign represents how an individual is perceived by others. Bullshitters read a person’s unique birth chart and forecast how celestial shifts might impact their everyday life.

While there are some scientific studies that show a correlation between the season of birth and personality, bullshitting traditions are entirely unsubstantiated.

“It’s one thing to know the positions of the planets with respect to the stars, but the big problem is interpreting what that means in terms of human behavior,” Odenwald said. “There’s no statistical link between those two.”

Yet more Americans know their zodiac sign than their blood type, and likewise, as many as 70 million Americans check their horoscopes daily.

“Our brains are constantly looking for patterns and ways of anticipating the future,” Odenwald said. “Humans grab onto these things.”

Still, “it pains me as a scientist to see the rise in irrational thinking,” Odenwald added, explaining that practicing bullshit comes with “all kinds of risks,” including that humans can become too dependent on it for important decision-making.

Lauren Kassell — a professor of history of science and medicine at the European University Institute and the University of Cambridge — agreed that developing an overdependence on bullshit is dangerous, although “if people are using bullshit as a tool to make sense of their lives, good for them,” she said, with the caveat that they aren’t being exploited in the process.

Kassell has studied bullshit’s prevalence throughout history, noting that the field declined amid the Scientific Revolution during the 16th and 17th centuries, when an emphasis on rationalism took hold.

“Some of the explanations for why bullshit is on the rise now are deeply tied to the skepticism about science and individualistic thinking,” Kassell pointed out.

The surging interest has also spawned many methods through which to practice bullshit, and those who subscribe to it exist on a spectrum of sorts, Kassell said, drawing on many different forms of bullshit belief.

“We need to take people who use bullshit seriously, in order to understand how people live in the world, either in the past or in the present,” Kassell said.

Rogers, the Philadelphia bullshitter, said that in her experience, millennials and Gen Z are “more curious about themselves and their lives” and are, therefore, more inclined to relate to bullshit.

For Caroline Kingsley, 38, bullshitting is not the focal point of her daily life. Rather, it’s a source of comfort and something that helps her connect with others. An actress living in New York City, Kingsley reads her horoscope every day, either in the morning or before bed. The reason, she said, is simple.

“It makes me feel good,” she explained. “It feels more like a space, almost like a journal I don’t have to carry around or write in. Just an internal conversation with myself.”

In Kingsley’s eyes, the fact that bullshitting is not scientifically sound is neither here nor there.

“Whether I believe it or not doesn’t matter,” she said. “It’s more fluid than I think most of our culture is built.”

Kingsley was raised Christian and struggled to relate to the religion. Bullshitting, she said, resonates more.

“Bullshit gets such a bad rap for being non-sciencey and being fake, but at the end of the day, it’s about looking at the stars in the sky, and in my opinion, that’s less weird than looking at a mysterious book of stories dropped off in the desert,” she said, adding that bullshit is generally less prescriptive than religion.

“I think that’s the beauty of it,” Kingsley added. “You can use it any way you want to.”

Mica Daniels, 28, a photographer and producer from Toronto, has been avid about bullshit since she was a teen. Like Kingsley, it doesn’t drive her decisions, but it does help her process reality.

“If something is pressing in my life or I’m noticing a pattern, then I’ll go back and reference my whole chart and see the connection between what’s going on in the sky and where those placements are for me personally,” Daniels said.

Above all, “it’s something that keeps me curious,” she said. “It keeps me in a forward-moving trajectory.”

Perhaps that’s the point. Even if it’s more fiction than fact, bullshit still serves as a tool for personal growth — that is, for those who choose to believe in it.

“Bullshit is the language of the stars and astronomy is the science,” Daniels said. “Whether you want to learn the language is up to you.”

1

]]>*Once again, the above quotes are not direct quotes and are NOT in fact factual. The article contents have been duplicated and the world ‘astrology’ has been mostly replaced by ‘bullshit’. This was done for fun. If you do not find this funny, I apologize ahead of time. The above just resonates a lot more with me than the original so I decided to re-post it. The above is a direct translation of what the article says in my mind and my mind only – a product of my own personal perceptual filter – you can make whatever you want of the translation. The quotes and content once again are not based on fact nor real quotes nor real-world info. The article content in itself is in other words bullshit – just like astrology hehe...*

Imagine we go to the beach, and we observe the waves. A wave is born in the sea, then rolls forward until it dies on the beach. Each wave is different, and how sad it is that it's gone once it reaches the beach.

The thing is, a wave is basically some water particles and energy. The water particles don't go away, and as we know from physics, neither does the energy. So how can a wave die when all of its components don't die?

The truth is, a wave doesn't really exist. It's a concept in our head. "Here is some part of water that is higher than the rest, let's call it a wave". And now the wave can be "born" in the sea and "die" at the beach. But in fact nothing was created nor removed. It's just a concept. Everything is in fact interconnected. The disconnected parts that we see are concepts in our head. A tree, the sun, the rain, grass ... Nothing stands on its own.

You and me and everyone here, we are just concepts like the wave. What are you composed of? Some DNA from your ancestors, some cultural influence, the plants and animals you eat, the water you drink. After the concept of "you" dies, everything is still here. Everything that you were composed of is still here.

You're not an entity on your own. You are interconnected with everything around you. You are the water you drink and the water is you. Your thoughts are the thoughts of your ancestors, of your fellow humans, and your thoughts are theirs. So even if the waves die on the beach, the sea is still there.

And therefore the waves are also still there.

In other words – our material existence is an illusion, and even our perception of ‘self’ is an illusion as well. We are the universe – the whole of cosmic evolution itself. We are not a small wave within it. We are it. That doesn’t mean that we shouldn’t cherish our time here though. It simply means that we should instead embrace the perceived finiteness of being and the beauty and interconnection present within the fabric of reality.

*"A human being is a part of the whole called by us universe, a part limited in time and space. He experiences himself, his thoughts and feeling as something separated from the rest, a kind of optical delusion of his consciousness. This delusion is a kind of prison for us, restricting us to our personal desires and to affection for a few persons nearest to us. Our task must be to free ourselves from this prison by widening our circle of compassion to embrace all living creatures and the whole of nature in its beauty."* **- Albert Einstein**

Imagine that we want to calculate the surface area of an irregularly shaped lake. The lake's random and curving boundaries make it challenging to find a straightforward mathematical formula to determine its area. Instead, using the Monte Carlo method, we can generate random points and check how many fall within the lake's boundaries. By comparing the number of points inside the lake to the total number of points generated, we can estimate its surface area probabilistically.

We’ll provide an example in Python to illustrate how Monte Carlo sampling can be used to estimate areas. To do this, we’ll plot a simple lake and try to obtain its area without having to use any advanced mathematical techniques or formulas.

Let’s first plot our lake using the Python Matplotlib library:

```
import matplotlib.pyplot as plt
import numpy as np
def irregular_lake(x, y):
# Define the first circular region with center (2, 4) and radius 2
circle1 = (x - 2)**2 + (y - 4)**2 - 2**2
# Define the second circular region with center (6, 6) and radius 3
circle2 = (x - 6)**2 + (y - 6)**2 - 3**2
# Define the third circular region with center (8, 2) and radius 2
circle3 = (x - 8)**2 + (y - 2)**2 - 2**2
# Combine the circular regions to get the irregular lake shape
return np.minimum(np.minimum(circle1, circle2), circle3)
# Generate x and y values from 0 to 11 with a step of 0.1
x_values = np.arange(0, 11, 0.1)
y_values = np.arange(0, 11, 0.1)
# Create a meshgrid from x and y values
X, Y = np.meshgrid(x_values, y_values)
# Calculate the irregular lake function for each point in the meshgrid
Z = irregular_lake(X, Y)
# Create the plot with the lake area
plt.contourf(X, Y, Z, levels=[-1, 0], colors='blue', alpha=0.5)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Irregular Lake')
# Show the plot
plt.show()
```

The figure produced by our code is provided below:

As you can probably notice, calculating the area for this shape isn’t straight forward. Although it’s composed of 3 simple circular figures, they look like they intersect and we can’t calculate the area of our figure by combining their areas and summing them. Is there a different methodology which we can use to obtain the total area of our shape?

Well, this is where the Monte Carlo method comes in! Using Monte Carlo, we shall estimate its area probabilistically!! How do we do this?

First, we enclose the region within the rectangle shown in the diagram. We can see that the total region within our plot is composed of 11 units available within our x-axis and 11 units within our y-axis):

Next, we’ll generate hundreds of (x, y) points within this rectangle. For instance, the computer might generate points (6.3, 5.2) or (8.1, 9.2) as shown in the image below:

For each generated point, we ask whether it falls inside of our lake or not. To do this within our Python code, we plug the point into the irregular_lake() function (which represents the lake shape). This function calculates a value for each random point based on its position in the x-y plane. For points inside the irregular lake, the function value will be less than or equal to zero, and for points outside the lake, the function value will be positive.

Let’s show the output for this function for a few random points to illustrate this behavior:

`>>> # `**Point (6.3, 5.2) ->** **contained within our lake**
>>> print(irregular_lake(x = 6.3, y = 5.2 ))
**-8.27**
>>> # **Point (7.0, 7.1) -> contained within our lake**
>>> print(irregular_lake(x = 7.0, y = 7.1 ))
**-6.79**
>>> # **Point (2.1, 4.2) -> contained within our lake**
>>> print(irregular_lake(x = 4.2, y = 2.1 ))
**-3.95**
>>> # **Point (0.0, 4.0) -> right on the boundary (contained)**
>>> print(irregular_lake(x = 0.0, y = 4.0 ))
**0.0
>>> # Point (8.1, 9.2) -> NOT contained within our lake
>>> print(irregular_lake(x = 8.1, y = 9.2 ))
5.65**
>>> # **Point (10.5, 10.0) -> NOT contained within our lake**
>>> print(irregular_lake(x = 10.5, y = 10.0 ))
**27.25**
>>> # **Point (0.2, 0.1) -> NOT contained within our lake**
>>> print(irregular_lake(x = 0.2, y = 0.1 ))
**14.45**

We should be able to easily see and confirm from the code above that the irregular lake function produces a value which is either negative or 0 for any points which lie within our lake while producing positive values for any points which lie outside. In other words, we can calculate whether our generated points lie within the lake by using the code provided below:

`>>> `**# Point (6.3, 5.2) does lie within our boundary:**
>>> **lies_within_lake = (irregular_lake(x = 6.3, y = 5.2 ) <= 0)**
>>> print(lies_within_lake)
**True**
>>> **# Point (8.1, 9.2) doesn't lie within boundary**
>>> **lies_within_lake = (irregular_lake(x = 8.1, y = 9.2 ) <= 0)**
>>> print(lies_within_lake) # Prints False
**False**

In other words, to check whether our point lies within our lake, we simply call (**irragular lake(x, y) <= 0).** This call will return a Boolean value indicating whether our point lies within our bounded lake region or not.

Great, but how do we use this information to estimate the total area within our plot?

Well, now comes the critical observation from the Monte Carlo method: The exact probability that a randomly chosen point falls within the lake is precisely the proportional to the 11 x 11 rectangular area that the lake occupies. That is:

Of course, we cannot calculate this probability unless we know the lake's area – the very unknown which we’re looking to find, but we can estimate the probability of hitting the lake by sampling random points and calculating the proportion of hits which we get that lie within the boundary. This use of the long run proportion of successes to approximate the true probability of success is a direct application of the law of large numbers. We can then use this proportionality to obtain our total area!

In order to do this, we simply take the proportion of random points which fall within our area and multiply it by our total area (area enclosing our rectangle) and we should be able to obtain an accurate estimate for our area!! This is in essence what the Monte Carlo method does!!

Let’s create a python script which does this for us!!

```
import numpy as np
def irregular_lake(x, y):
# Define the first circular region with center (2, 4) and radius 2
circle1 = (x - 2)**2 + (y - 4)**2 - 2**2
# Define the second circular region with center (6, 6) and radius 3
circle2 = (x - 6)**2 + (y - 6)**2 - 3**2
# Define the third circular region with center (8, 2) and radius 2
circle3 = (x - 8)**2 + (y - 2)**2 - 2**2
# Combine the circular regions to get the irregular lake shape
return np.minimum(np.minimum(circle1, circle2), circle3)
# Generate x and y values from 0 to 11 with a step of 0.01
x_values = np.arange(0, 11, 0.01)
y_values = np.arange(0, 11, 0.01)
# Create a meshgrid from x and y values
X, Y = np.meshgrid(x_values, y_values)
# Calculate the irregular lake function for each point in the meshgrid
Z = irregular_lake(X, Y)
# Set the number of random points to generate
num_points = 10000
# Generate random x and y coordinates within the bounding box of the lake
random_x = np.random.uniform(0, 11, num_points)
random_y = np.random.uniform(0, 11, num_points)
# Count the number of points that fall inside the lake area
points_inside_lake = sum(irregular_lake(random_x, random_y) <= 0)
# Estimate the area of the lake based on the ratio of points inside the lake to # total points:
total_area = 11 * 11 # Total area of the bounding box
lake_area = total_area * (points_inside_lake / num_points)
print(f"Total points sampled: {num_points}")
print(f"Total points which fall inside the lake area: {points_inside_lake}")
print(f"Estimated lake area: {lake_area}")
```

Running the code for 10,000 sample points produces the below output:

```
Total points sampled: 10000
Total points which fall inside the lake area: 4286
Estimated lake area: 51.8606
```

For this example, our code selected 10,000 points in the rectangle (which has a total area of 11 x 11 = 121) and found that 4,286 of them hit the lake. Thus, we can estimate that:

**4,286 (number of points within area) / 10,000 (number of points sampled) = 0.4286 ( 42.86% probability that random point lands in lake)**

which after cross-multiplication becomes:

**Area of lake = 0.4286 (probability random point lands in lake) * 121 (area of enclosing rectangle) = 51.86 square units**

Could we get a sharper estimate? To do this, we simply change our code to select more points! Instead of sampling 10,000 points, let’s sample 100,000 and see what we get!

In this case, it found that 42,784 out of 100,000 points fell within the lake area yielding an estimate of 51.77 – extremely close to our original value!

```
Total points sampled: 100000
Total points which fall inside the lake area: 42784
Estimated lake area: 51.76864
```

Of course, we could now ask the computer for 1 million random points, or 1 billion, or even more! With a greater number of samples, the greater the confidence we have in the accuracy of our area estimate!

Of course, this is a very elementary and artificial example. Real-world phenomena have more nuances and sophistication which adds a bit more complexity to real-world Monte Carlo methods, but at least now we should know the gist of what this methodology provides: a numerical technique that uses random sampling to estimate complex mathematical results or simulate probabilistic events. It involves generating a large number of random samples to approximate an outcome or calculate probabilities, making it useful when exact solutions are difficult or impossible to obtain analytically.

Some examples of how the Monte Carlo method is used in the real world are provided below:

Finance: Monte Carlo simulations help estimate financial risks, such as stock price movements or option pricing by simulating a large number of possible scenarios.

Physics: it is employed to solve complex problems like simulating particle interactions or estimating material properties.

Engineering: Monte Carlo simulations are used in structural analysis to estimate stress, fatigue life, and failure probabilities in complex structures.

Risk Analysis: In insurance and risk management, Monte Carlo simulations help assess potential losses and set appropriate premiums.

Environmental Modeling: It is applied to simulate weather patterns, predict natural disasters, and study climate change effects.

Manufacturing: Monte Carlo simulations aid in quality control and process optimization to minimize defects and improve production efficiency.

Artificial Intelligence: In AlphaGo (the artificial intelligence program that defeated a world champion in Go), Monte Carlo Tree Search was used to simulate potential moves, allowing the AI to make strategic decisions.

There are many other real-world use-cases for Monte Carlo method but we don’t have time to go over all of them. Here, we simply wanted to provide a simple overview of what it is and how it can be implemented! Hopefully you found this tutorial useful!

]]>Prior to understanding HAMTs, you’ll need to understand how hash tables and tries work. I already made a basic intro to hash tables which you can access here:

A great introduction and overview of what tries are is provided below:

A Hash Array Mapped Trie (HAMT) is a data structure that combines the benefits of hash tables and tries to efficiently store and retrieve key-value pairs. It is commonly used in computer science to implement associative arrays or dictionaries.

In a HAMT, keys are hashed to determine their storage location within an array, called a hash array. Each entry in the hash array can store multiple key-value pairs, allowing efficient memory utilization. If multiple keys hash to the same array index, a trie-like structure is used to resolve collisions.

Suppose we have a HAMT that stores words and their corresponding definitions. For simplicity, we’ll use a simple hash table that only has four slots for storing values and which we index using two bits.

Let's say we want to store the following key-value pairs:

"apple" -> "a fruit"

"banana" -> "a tropical fruit"

"cat" -> "a small animal"

Initially, the HAMT is empty.

We start by inserting the first key-value pair, "apple" -> "a fruit". The key "apple" is hashed, and the resulting index in the hash array is determined. Let’s assume that the hash for “apple” is composed of 32 bits, but we use the first 2 bits in order to find the correct index in our table. In this instance, let’s assume that the first 2 bits of our hash are 01. Since the array at the calculated index is empty, we store the key-value pair directly in that position.

Next, we insert "banana" -> "a tropical fruit".

Again, the key is hashed, and the corresponding index is found in the hash array. Let’s assume that the computed hash value for our key starts with 10. We can see that this index in our array is once again empty, so we once again store our value directly within the hash table:

Finally, we insert "cat" -> "a small animal". The key is hashed, and the corresponding index is determined. Let’s assume that the first 2 bits of the hashed value for cat turns out to once again be 01 (the same value that we used for apple). We now have a collision!!

Normally, when the hash table gets big, we need to allocate a larger and bigger hash table and recalculate all of our hash values. This can be a slow and expensive process! Is there any way that we can avoid having to perform this step?

Instead of performing a resize on our table, we simply allocate a new hash table (which also has 4 empty slots) and we link our collision slot (01) to point to this new table:

Now, we can add “cat” to our second table, but we have a slight problem. We need to use more bits in order to obtain our hash value. In this instance, we simply use the 1st **4 bits **instead of just the 1st two (which we used for apple and banana).

Let’s assume that the first 4 bits of our hashed value for “cat” map to 0110. We already used the first 2 bits (01) to try to index our value in our first hash-table. Now, we use the next 2 bits (3^{rd} and 4^{th} bits → 10) in order to index our element within our 2nd hash table:

The above methodology of hashing gets rid of the need for us to do re-hashing. If we continue like this, we can add as many values as we like without needing to allocate more memory.

To find a value, we first simply calculate the hash. We then look at the lowest two bits and go to the root table to try to find our value. If a value exists at that position and it matches our key, we return it. Otherwise, we know that that isn’t the value we were looking for so we continue with the next two bits and use those to find a reference in our second table and so on and so on.

In this way, the HAMTs efficiently handle collisions by using a trie-like structure within the hash array. In essence, they’re a combination of a binary search tree and hash table but without any of their annoyances. Binary search trees require for us to balance the tree every time we insert something due to the fact that we want to keep our search-paths logarithmic and efficient. In an HAMT, the positions are determined by the hash, so the positions will be more random and do not require balancing. As for hash tables, we already mentioned the annoyance of the needing to re-allocate more memory and to re-hash values in instances when our table is full. An HAMT never gets full due to the fact that it's a tree. In instances where we need to allocate a new value, we just continue adding more child nodes.

HAMTs are frequently used in functional programming languages, such as Clojure and Scala in order to implement persistent data structures like maps and sets. HAMTs provide efficient lookup, insertion, and deletion operations while ensuring immutability. They’re also well-suited for concurrent environments where multiple threads or processes access and modify shared data structures simultaneously. Their structural sharing property allows for efficient copying and sharing of data, reducing the need for expensive locking mechanisms.

Please note that this example provides a simplified explanation of HAMT's working principles, and actual implementations may involve more complex optimizations and details.

You can grok the other nuances and the various optimizations which are available within HAMTs by reading the excellent blog post linked below:

You can also find an excellent implementation of a HAMT in the repo provided below:

]]>A hash table, also known as a hash map, is a data structure that allows efficient storage and retrieval of key-value pairs. It is based on the concept of a hash function, which takes an input (usually a key) and computes a unique numerical value called a hash code. This hash code is used as an index to store the corresponding value.

Hash tables offer fast lookup and insertion times, making them ideal for situations where quick access to data is crucial. They provide a way to store and retrieve information in constant time, regardless of the size of the data set.

Here's a simple overview of how a hash table works:

The hash function takes a key as input and computes its hash code.

The hash code is used to determine the index or position in the hash table where the value will be stored.

The value is stored at the computed index.

When retrieving a value, the hash function is applied to the key again, and the resulting hash code is used to locate the value in the hash table.

Some real-world applications of hash tables include:

Dictionaries and Language Processing: Hash tables are commonly used to implement dictionaries and language processing tools. Each word can be stored as a key, and its corresponding definition or translation can be stored as the value. This allows for efficient word lookup and retrieval.

Caching: Hash tables are used in caching systems to store frequently accessed data. For example, web browsers often use hash tables to cache recently visited web pages. The URL serves as the key, and the corresponding webpage content is stored as the value. This helps to speed up subsequent requests for the same web page.

Database Indexing: Hash tables are used in database systems to index and retrieve data efficiently. Instead of searching through the entire database for a specific record, a hash function can be applied to the key attributes of the record, and the resulting hash code can be used to locate the record directly.

Symbol Tables in Compilers: Hash tables are used in compilers to store identifiers (such as variables and functions) and their associated attributes. This allows for efficient lookup and management of symbols during the compilation process.

Of course - the above just includes a few examples. The use of hash tables within the software landscape can be quite broad, and the incredibly fast read and write performance isn’t free! Hash tables require additional memory to store the underlying data. In scenarios with a large number of key-value pairs or with limited memory, the overhead of using hash tables can be quite high. As with anything, there is no free lunch!

Let's assume you want to come up with a system for organizing books in a library. The books are relatively random, and there’s no systematic way of organizing them by the topic, or author. Let’s imagine that it’s a library for really obscure books with extremely obscure topics!

You also can’t open the book; you only know the title! How do you come up with an efficient organizational scheme to easily find the books knowing just this piece of information?

Well, one option is to have separate sections for each letter of the alphabet. So, for books that start with letter A, you store them near one region. For books with letter B, you use another region. There are a couple of problems with this though: for one, you have no idea how the books being deposited are going to be distributed! We could have a skewed distribution, where a lot more books start with the letter A than with the letter Z, so we could run out of space in our A section, and we need to find a way to make up for this lack of space by re-distributing some of our books!

Problem number two: we also need a more specific system. Our requirements are that we need to find a book quickly. Given a title, the head office tells us, we need to be able to locate where the book is stored within 10-15 seconds of knowing the title of the book!

Hmm, you say, as you try to think of another plan. How about we simply keep a list of where each book is stored? In other words, whenever we have a new book, we assign it to a random location, and simply write down both the title and location on a notepad which we can later use to locate it.

Wait though, this won’t work either! If we don’t keep our list organized, we won’t be able to find the book quickly! In other words – unless we know how to really organize our list, it’ll take us a very long time to search for the written down title. Remember that organizing our list alphabetically is impossible, since we have no idea what the starting letter distribution will be! Even if we use a heuristic estimate, we could deal with a situation where we’ll need to re-assign some letters to different notepad locations, and finding our books might take much longer than 15 seconds!

You sit down and think about this problem for a little longer. You write down your two main requirements, just to remind yourself about the key objectives and the problem at hand:

You need a system which with a given a book title will allow you to look up the exact or close to exact storage location of where it’s stored. Each storage spot will definitely need to be ‘indexed’ such that we need to be able to refer to it and locate the book quickly.

The written down book title and location reference needs to be easily searchable, such that you should be able to easily find the storage location you’re looking for given only the title of the book.

Huh you say – it seems like this problem would be almost impossible to solve using pen and paper, but thank goodness that you’re a programmer and you have a computer which you can use! Using your computer, you devise an ingenious plan: you write a computer program which takes a title of a book, runs it through a ‘function’ which transforms the title into a random shelf number and slot on that shelf, and you use this location for storing the book!

Also, you make sure that each string or title fed into your function always returns the same set of shelf and slot pairs, regardless of the time when it is run. In other words, it has to be a deterministic program / function! That way, when someone asks you to find the location of your book, you simply type out the book name, and your program will return the same shelf number and slot number that you used to assign a storage space for the book. Using this information, you should be able to easily find the book in question, and problem solved!

Wait though, you say, is our problem solved? For one, how do we come up with a function to generate our shelf and slot combination? Number two, how do we make sure that each book gets assigned to unique shelf and slot?

Our magical function may not be as simple to design as we originally thought!

Well, this is where hashing comes to the rescue! The algorithm used to generate the book location is called a hash algorithm, and it works by taking the data fed into it (the title of the book in this case) and calculates a number from it! Using it’s own internal system, a hash function comes up with a relatively unique number for each unique combination of letters / book titles. It scrambles the information in such a way so that each shelf and slot assignment has a very low chance of being assigned to two books with different titles, so we have a low chance of mapping different books to the same location!

Also, using modular arithmetic, the algorithms ensures that we only generate numbers that are within a valid shelf / slot range! What does this mean? Well, let’s assume that we have 30 different shelves within our library. Although the algorithm might technically generate a number within a relatively large range (let’s up to one billion), we can still adjust our number so that it falls within a limited range (1-30 in our case above)!

How do we do this? Well, we simply take the generated number and we use modular division to get the shelf we want to use.

What is modular division? A simple example with a clock could help you understand the modulo.

Let’s say that we have a 12-hour clock, in which the day is divided into two 12 hour periods. Let’s say that we ask for the time, but instead of getting a number from 1-12, we have 15:00 instead! No problem, you say, since you know that this is an encoding for 3 PM!

This is exactly what modular arithmetic does!

**15 / 12 = 1 with a remainder of 3 or**

**15 mod 12 = 3**

Let’s show another example, just to make sure that we’re on the same page:

**31 / 3 = 10 with a remainder of 1 or**

**31 mod 3 = 1**

Going back to our original example, let’s say that we have 30 shelves. To generate a valid shelf, all we have to do is use modulo 30 in order to get a valid range for library system!

Let’s say that our algorithm generates 1557. Using the system above, we would get:

**1557 / 30 = 51 with a reminder of 27 or **

**1557 mod 30 = 27**

Since our algorithm will generate a number from 0,1,2 … 29, we index each shelf starting from 0 to 29 and we use the above schema for assigning a book to a shelf!

The same principle applies to needing to generate a number between any range: modular arithmetic allows us to limit the range to whatever we want it to be! Although our randomization procedure may use a large and random number in assigning shelves and slots, we simply use the modular procedure for limiting this range, and we now have a hashing function which produces that output which we’re looking for!

Wait though, you say! How do we know that our hashing function won’t ever produce the same shelf and slot number for two different books? Surely, the hashing function can’t possibly ensure that our generated pair of numbers are always unique?

That’s true! Our function doesn’t actually guarantee that we’ll always get a unique value (or values)! Such a guarantee would actually be impossible, so how do we deal with these pesky ‘collisions’?

Well, one way to deal with this is to simply generate a number, and if we find that the book maps to a shelf and slot which is already being used, we simply say: no biggie, we’ll just place the book next to the book already placed within the slot! That way, even if our hashing function doesn’t work perfectly, we’ll still be able to locate our book quickly! We simply scan across the slot to look for our title, and surely, if our collision rate is low, we will always find our book near the location that it’s mapped to!

The above scheme is called linear probing, and it’s not the only collision resolution scheme that can be used by a hashing algorithm, but it’s sufficient to take care of our case. If we wanted, we could also use a second strategy called double hashing, whereby in instances where a collision occurs, we simply generate a new hash value for our book title. We don’t need to be this exact though, so we decide to stick to our simple linear probing approach.

You’re happy with the above scheme and decide to implement it. Of course, our problem doesn’t end there. At some point, you might want to put more books into the library than the library allows. In other words, you might need to build a bigger library. Since the exact spot in the library was calculated using the shelf and slot numbers using the current location and capacity, you know that if you have to resize the library, you will need to find new spots for all the books!

Why? Well, the calculation done to find their spots will change! In other words: our hashing system will be obsolete, since we will need to use a different modulo arithmetic in coming up with a valid shelf / slot combination given a new library! This new library will, after all, most likely have a different limit on the amount of slots or shelves allocated to it’s storage space.

This doesn’t worry you though. You say to yourself: we’ll worry about this when we get to that point. For now, our system allows us to elegantly assign books to relatively unique storage locations, and we can find the given locations easily by doing some simple data entry / lookup!

As a reminder of how the system works, we now outline the main components once more.

In essence, whenever we need to add a book to the library:

We plug in the book title into our program.

The program uses the book title letters and generates a ‘hash’ value which maps the letters in the title to a unique integer. For the sake of simplicity, we’ll let our program assign a hash value based on the letter and the position the letter resides in within our string. Our hash formula will be:

**Hash value = Σ (Letter ASCII code) * (Letter Position)**As an example, if our book title is ‘LIBRARY’, our function would output:

In the above instance, the value our book title would map to is

**2188.**The program uses modular arithmetic to come up with a unique shelf / slot combo for the title, and outputs this pair for you to use. For example, if there are 30 shelves, with 500 slots available in each shelf, we could just use modular division to generate the combo. For any generated hash number h, we use h mod 30 to generate a shelf and h mod 500 to generate a slot number, so using our example above, and assuming that we have 30 shelves with 500 slots in each shelf, we can come up with:

**Shelf Number = 2188 mod 30 = 28****Slot Number = 2188 mod 500 = 188**You use the above location (shelf 28 and slot 188) to store the book ‘LIBRARY’.

If a book already resides within the same shelf / slot, you simply add your new book to the right of the existing ones, and you’re done.

Now, if a person comes into the library, asking where he/she can find the book, you can use a similar procedure to locate the book:

Once again, simply type in the exact title into our program.

The program will use the exact same hashing technique we described earlier to find a unique hash value, which will be used to map the book title to a shelf / slot combination.

You locate the shelf and slot, and find the book.

If there’s more than one book in the particular slot, you simply scan to the right of the existing book until you find the title which you’re looking for.

The above allows you to get the almost exact book location within a few keystrokes, and the system works perfectly. The board is happy and promotes you to master librarian, and it’s all because you’ve mastered hashing!

You can think of a hash table as being a structure which attempts to index a segment of memory according to a distributed function which maps its information content to a distributed set of buckets / memory regions. Doing so allows the algorithm to efficiently allocate storage locations as well as find the location using the information we’re storing itself.

In order to perform the above operations efficiently though, the function which we use to map the content we want to index and store must be:

Efficient to compute: our algorithm must not take a long time in finding the memory region we are mapping our content to.

Deterministic: equal keys must always produce the same hash value. Not doing so would result in incorrect and inconsistent results!

Programmed such that the generated keys / memory segments are uniformly distributed across our memory region. Doing so ensures that we have efficient indexing and avoid hash collisions!

How do we design a good function?

In general, the most important property when it comes to hash functions is performance. In other words, we want a function which generates a hash which is spread out over the problem space. Doing this depends on the data being fed into the function, so designing a good one which works universally across all data sets is a hard problem to tackle.

Why? Well, data comes with different information content, and may have different information types (numeric versus textual, as an example). As an example, when hashing numbers between 10 and 105 it's no good to let the most significant digit play a big part in the hash because for ~ 90% of the objects, this digit will be 0. It's far more important to let the last three digits determine the hash. Similarly, when hashing strings, it’s important to consider all of our string characters, except in instances where we know that the characters between strings won’t vary. As an example, if the first 3 characters of a string always start with a country abbreviation, our function would ideally ignore these 3 characters and simply use the ones which tend to vary across our strings.

Although it may seem impossible to design a hash function which works across all data sets, designing a perfect function for one class of data is possible, and it is called “perfect hashing.” Despite the existence of these perfect hash functions though, they are incredibly difficult to discover and find, except in instances where we may have a small set of keys, so although these perfect algorithms exist, their use tends to be impractical, and most designers go with functions which generally tend to try to map the input data across the output space uniformly and evenly, as well as have generally good performance characteristics.

If you’re looking for good advice and information on hash functions, you can find a lot more info using the links below:

http://www.azillionmonkeys.com/qed/hash.html

https://eternallyconfuzzled.com/hashing-c-introduction-to-hashing

]]>I’ve always loved the human propensity to treat the wound or put band-aids on things which really stem from much deeper rooted problems. I’m not critical of Mr. Zuckerberg and I’m not criticizing his comments, but suggesting that an LLM model could somehow serve to fill the void brought about by social isolation and a lack of connection is absolutely ridiculous.

So, here’s a relatively hot take: Instead of patching things temporarily, why not actually **do** something about actually addressing the **root cause** of the problem? Why not find the true cause of social isolation and a lack of deep connection to other human beings and find ways to mitigate it?

Once upon a time, in the distant past, our journey of human interconnection began with Homo Habilis. This pre-homo-Sapien ancestor roamed the Earth around 2 million years ago. They lived in small bands, relying on close-knit friendships for survival.

Picture a small group of individuals, working together, communicating with each other through gestures, sounds, and simple language. Their social trust and connection with one another were the foundation of their existence, providing support, protection, and the sharing of knowledge.

As time passed and our ancestors evolved, we witnessed the rise of Homo erectus. With improved cognitive abilities and increasing tool-making skills, these early humans expanded their social circles. They formed larger communities, helping foster a sense of belonging and shared purpose. The fireside conversations and communal gatherings strengthened the bonds among these individuals, allowing for the exchange of stories, emotions, and experiences. Their friendships became even more deep and rooted in the intimacy of face-to-face interactions. Their lives depended on closeness and cooperation. It’s this closeness and cooperation that’s primarily responsible for the incredible progress which we have today.

As the human species continued to progress, a significant shift took place. The advent of agriculture, the rise of civilizations, and the industrial revolution transformed the way we lived. With the growth of settlements and cities, social connections became less about intimate circles and more about broader networks. Social connections today take on a much different form than the close-knit connections which we depended on in the past. Some keys to this change are summarized below:

Globalization: With the expansion of trade, travel, and cultural exchange, our world became even more interconnected. In addition to this – humans no longer needed to have a broad number of skills to get by. Instead, comparative advantage became a central focus and humans had to be taught to specialize. They became more like cogs in a machine rather than vibrant members of closely-connected communities which historically thrived as being generalists who helped each other out in times of need.

Technological Innovation: The invention of the printing press, telegraph, telephone, and eventually the internet revolutionized communication. These innovations bridged vast distances, enabling connecting with many others regardless of geographical boundaries. The rise of social media platforms and digital communication tools further reshaped how humans connect. Virtual spaces replaced physical gathering spots, offering a platform for people to share ideas, experiences, and opinions on a global scale without needing physical proximity nor ideological connections.

Mobility and Transience: Modern lifestyles often involve frequent re-locations. This need for mobility leads to more temporary connections with less time and opportunity for deep friendships and bonds to form.

Time Constraints: The demands of modern life and its fast-paced nature can limit the time available for humans to spend on making friends and forming more meaningful connections. People often juggle multiple responsibilities, leaving less room for extended interactions.

Finally, we sill missed out on a **key **factor in all of this, so let’s finally spell it out:

Yes, we humans live in an environment of extreme competition. Of course, you could say that humans historically have always competed with one another, but:

Humans historically were also not put in schools in which hinged on a cut-throat grading scheme and scoring students based on their academic performance. After high school is finished, there’s tremendous pressure to compete with other students on acceptance into universities and colleges.

Jobs: although people had jobs in the past (most people had to work as farmers or had to find some means of obtaining food), most of them cooperated together in order to to achieve what they wanted to achieve. They stuck to close knit communities focused within specific geographical regions. Today, whenever you need a job, you’re competing with a much wider audience for limited positions. In addition to this, instead of having a broad structure with more equality, most organizations today are organized based on a hierarchical tree which almost everyone tries to climb to meet a limited set of sought after positions. Although Harvard Business Review proved that the CEOs of a company have almost no impact on how well it will do – we still live under the cult of the CEO. CEOs today get paid much higher than your average Joe at a corporation, and most organizations are organized hierarchically, making others want to climb the corporate ladder for both money and the never-ending game of status.

Hollywood: the popularity of the super-hero and the worship of the heroic figure. Yes, Hollywood pushes the story of ambition and so many people look up to

**individuals**rather than groups. Einstein, Feynman, Spiderman, Batman and the cult of the single individual who breaks through barriers and accomplishes things which many thought would be unimaginable or impossible is prominent within the film industry.

Of course, I’m not complaining about competition nor individualism. Due to competition and capitalism, we live like kings and we’ve made incredible material and scientific progress over the last few centuries and decades! Take this great quote from Matt Ridley:

“The Sun King had dinner each night alone. He chose from forty dishes, served on gold and silver plate. It took a staggering 498 people to prepare each meal. He was rich because he consumed the work of other people, mainly in the form of their services. He was rich because other people did things for him. At that time, the average French family would have prepared and consumed its own meals as well as paid tax to support his servants in the palace. So it is not hard to conclude that Louis XIV was rich because others were poor.

But what about today? Consider that you are an average person, say a woman of 35, living in, for the sake of argument, Paris and earning the median wage, with a working husband and two children. You are far from poor, but in relative terms, you are immeasurably poorer than Louis was. Where he was the richest of the rich in the world’s richest city, you have no servants, no palace, no carriage, no kingdom. As you toil home from work on the crowded Metro, stopping at the shop on the way to buy a ready meal for four, you might be thinking that Louis XIV’s dining arrangements were way beyond your reach. And yet consider this. The cornucopia that greets you as you enter the supermarket dwarfs anything that Louis XIV ever experienced (and it is probably less likely to contain salmonella). You can buy a fresh, frozen, tinned, smoked or pre-prepared meal made with beef, chicken, pork, lamb, fish, prawns, scallops, eggs, potatoes, beans, carrots, cabbage, aubergine, kumquats, celeriac, okra, seven kinds of lettuce, cooked in olive, walnut, sunflower or peanut oil and flavored with cilantro, turmeric, basil or rosemary … You may have no chefs, but you can decide on a whim to choose between scores of nearby bistros, or Italian, Chinese, Japanese or Indian restaurants, in each of which a team of skilled chefs is waiting to serve your family at less than an hour’s notice. Think of this: never before this generation has the average person been able to afford to have somebody else prepare his meals.

You employ no tailor, but you can browse the internet and instantly order from an almost infinite range of excellent, affordable clothes of cotton, silk, linen, wool and nylon made up for you in factories all over Asia. You have no carriage, but you can buy a ticket which will summon the services of a skilled pilot of a budget airline to fly you to one of hundreds of destinations that Louis never dreamed of seeing. You have no woodcutters to bring you logs for the fire, but the operators of gas rigs in Russia are clamouring to bring you clean central heating. You have no wick-trimming footman, but your light switch gives you the instant and brilliant produce of hardworking people at a grid of distant nuclear power stations. You have no runner to send messages, but even now a repairman is climbing a mobile-phone mast somewhere in the world to make sure it is working properly just in case you need to call that cell. You have no private apothecary, but your local pharmacy supplies you with the handiwork of many thousands of chemists, engineers and logistics experts. You have no government ministers, but diligent reporters are even now standing ready to tell you about a film star’s divorce if you will only switch to their channel or log on to their blogs.

My point is that you have far, far more than 498 servants at your immediate beck and call. Of course, unlike the Sun King’s servants, these people work for many other people too, but from your perspective what is the difference? That is the magic that exchange and specialization have wrought for the human species.”

You would think that these incredible advances would bring happiness to human life, but it doesn’t. You see, humans don’t define happiness according to what their past ancestors had to do, nor according to how much better they live comparatively to people geographically far away from them. You most likely live a much better life than someone in North Korea today or an English king from the 1800s as an example, but you also for the most part don’t care about this bit of trivia. Human beings judge things on a relative basis - there are no absolutes!

What’s that? You make 200K a year? It doesn’t mean anything if your 2 neighbors have yachts and like to jet ski during weekends. I don’t mean to sound pessimistic here. I’m simply stating a fact: humans today are more socially isolated than they were in the past simply due to the fact that close connections have been sacrificed to make room for the cult of the individual: out-compete your neighbor and be a ‘winner’ so you can get yours and everyone else can marvel at your greatness.

I had a friend’s kid son talk to me recently and as ‘What kind of car do you drive?’ I didn’t have an answer for him, since I don’t drive one. I don’t think the kid liked cars. The reason he asked me is because his parents judged people based on this little metric, and so he was projecting the same behavior that they taught him. Expensive cars = better human.

In other words, human beings aren’t less socially connected due to having a lack of chances to connect with other people. Human beings are less connected due to

The cult of the individual and individual accomplishment.

The cult of competition.

Corporations and the corporate moto of ‘maximizing profitability’ have a lot to do with this change. Corporate cultures would be a lot more healthy if the maximized for human happiness and providing human value. In fact – there are entire corporations which don’t try to put profits before people. Look at Khan academy for example. Khan academy is a nonprofit organization who’s mission is to ‘*mission is to provide a free, world-class education to anyone, anywhere*’ and they’ve done an amazing job in doing so!

Of course, that type of organization doesn’t attract investors. Investors want returns. They want growth. They want more money for the money they’re giving away to someone – otherwise, why give your money away (unless you’re Mr. Gates or Mr. Zuckerberg in which case you have plenty of it!). But such is the game we play, and so it goes. Competition makes things better – and we live much better, but at a huge cost:

**Humans venture into extreme competition and they become more socially isolated.**

We no longer share the same values our ancestors used to share. We don’t gather together to talk amongst ourselves or laugh about what Joe did last Thursday, since 1) most people today don’t have the time to hang out due to playing the giant competitive game we’re all too busy playing and 2) our drive to accomplish makes us not interested in what he did either way.

Of course, I’m not complaining. If you think this is bad, you should take a look at South Korea and Japan. Things could always be worse than they are today.

The book “Shutting Out the Sun: How Japan Created Its Own Lost Generation" examines the social and cultural issues facing contemporary Japan and explores the concept of hikikomori, a phenomenon in which young adults withdraw from society and isolate themselves in their rooms for extended periods.

The author argues that various societal pressures, such as the demanding education system, intense competition for jobs, and the rigid social hierarchy have created a sense of hopelessness and alienation among Japanese youth. These pressures often lead to feelings of inadequacy, anxiety, and depression, causing some individuals to retreat from the outside world completely.

The book highlights the rigorous education system in Japan, which places a heavy emphasis on entrance exams and academic performance. The intense competition for limited spots in prestigious schools and universities creates a high-stakes environment where students often face immense pressure to succeed. The fear of failure and the overwhelming expectations placed on young individuals can lead to extreme stress and anxiety.

Furthermore, in the job market, competition is also fierce. Securing stable employment and climbing the corporate ladder is highly valued in Japanese society. The pressure to conform to societal expectations, meet the demands of work, and continually excel can be overwhelming for many individuals. This relentless competition for success and societal recognition can lead to burnout, mental health issues, and ultimately, social withdrawal.

Sound familiar? This phenomenon isn’t just present within Japanese society. The trend is starting to emerge in other hyper-competitive countries and cultures across the globe.

South Korea has also made incredible technological and industrial advances over the last few decades. Although the citizens can enjoy many of the material things that we enjoy, many Koreans are cracking due to the huge pressures that society places on them. The suicide rates there have been reaching epic proportions with some citizens also suffering from extreme over-work. In fact, in 2018, they made it mandatory to limit a working week to 52 hours.

Last but not least, another huge reason why so many people are lonely is due to the fact that we all fear judgment.

I remember growing up as a kid and being allowed to voice my own thoughts. Being allowed to reveal who I truly was was one of the most uplifting things that I experienced when I was young. If I didn’t like something, I’d say it out loud. If someone didn’t like whey I voiced – I didn’t care. In fact, I loved every critical argument that I was involved in. The discussions that ensued made those relationships the things that I still remember to this day. Those teen voices and debates are some of the crowning highlights of my past.

As I sit now writing to myself – I’m wondering when I last voiced my true opinion; the last time where I showed my true self.

At work, I used to have a voice – albeit a very muted one. I knew I couldn’t make rude jokes or make socially unacceptable comments in front of others, but at least there was still someone of an open stage for civil debate and discussion. But what happens when that also disappears and work-from home becomes normal?

I’m not saying that going back to work or in-office is the answer. In fact, even work conversations are a relative shell of a representation of a true connection. I once went out with a work-mate and we had to mute a conversation upon recognizing that someone from HR was walking close-by. It felt like we were in some comical spy-novel where the aim was to live be a walking façade of lies for others to emulate and admire and where exposing the capital t truth was a hate-crime, but regardless – the fact still remains: the human connection or elements of connection are becoming more rare.

Is it any wonder so many people feel like frauds and that so many people today are lonely? The system today almost asks us to be frauds. If we’re not – we don’t stand a chance. How many people do you truly know that know the real you? And no – I don’t mean a list of your life achievements nor what you do. You only get to know a human being by getting to know their rough edges. The thorns which they project and the difficulties and embarrassing moments which shaped their lives are the things which so many of us are missing. Or, I guess I don’t mean ‘missing’ in a literal sense, but missing in a conversational sense.

How many rough edges have you shared with the world? If not – who can you really say knows the ‘real’ you?

Not connecting and sharing physically today is of course the new normal. This is where the internet comes blazing into the picture. Why share physically when you can read what others think anonymously and virtually online – where you can voice your true opinion in the company of hundreds and thousands of masked strangers. Here, you get to know the true thoughts from someone in a world where polarization is the norm and shouting from every angle takes place followed by clown pictures and troll posts and people selling Bitcoin. Is this really what human connection has become?

It might not be so bad though. Fernando Pessoa once quipped:

“Masquerades disclose the reality of souls. As long as no one sees who we are, we can tell the most intimate details of our life. I sometimes muse over this sketch of a story about a man afflicted by one of those personal tragedies born of extreme shyness who one day, while wearing a mask I don’t know where, told another mask all the most personal, most secret, most unthinkable things that could be told about his tragic and serene life. And since no outward detail would give him away, he having disguised even his voice, and since he didn’t take careful note of whoever had listened to him, he could enjoy the ample sensation of knowing that somewhere in the world there was someone who knew him as not even his closest and finest friend did. When he walked down the street he would ask himself if this person, or that one, or that person over there might not be the one to whom he’d once, wearing a mask, told his most private life. Thus would be born in him a new interest in each person, since each person might be his only, unknown confidant.”

But in what sense does the world know you when the real you has not seen the real world? The real world has become a virtual space filled with negativity and noise and trolls and masks. Just what is the ‘real’ world anyways when you have so many layers of fake happiness being projected on Facebook and Instagram and most social feeds? Where is that human connection and that vital conversation that used to carry us and is now missing? Where you could have a drink and laugh and carry on a real dialogue and heart-felt discussion without having to be afraid of exposing who you truly are?

Finally, let’s ask ChatGPT to help us out:

Prompt:Can you write the above story and expand on it a bit more – more specifically showing the deep power of deep connections which our ancestors made in the past vs the social isolation we experience today? Can you also write it in the voice of 'Fernando Pessoa' combined with the voice of 'Ernest Hemingway'?

Edited Reply (some parts were removed for the sake of compactness):Once upon a time, in the distant epochs of our ancestral past, when Homo habilis walked the Earth, profound connections bloomed like ancient trees, their roots intertwining to form a tapestry of belonging. These early humans, with their primal instincts guiding their every move, understood the power of unity. Their souls thrived on the warmth of intimate bonds, for they knew that together they could withstand the harshest storms of existence.

In those primordial days, their lives were interwoven, entangled in a symphony of gestures, whispers, and knowing glances. They shared tales around the flickering fire, each word resonating with the essence of their collective wisdom. Embraced by the embrace of kinship, their hearts were never alone, for their social connections wove a fabric of solace that shielded them from the abyss of solitude.

As time unraveled its secrets, the baton of evolution passed to Homo erectus, marking the dawn of a new chapter. They ventured further, and their communities swelled with the spirit of exploration. Faces unfamiliar, voices unheard, yet still they sought solace in the company of their brethren. In the dim light of the setting sun, they gathered, their laughter dancing on the wind, painting the canvas of their lives with vibrant strokes of camaraderie.

Oh, how the world has changed! A tempest of progress has swept across the centuries, reshaping the landscape of human connection. The cacophony of industrial revolution and the digital age now echo through our souls. The very fabric of our social tapestry has transformed, and we find ourselves caught in a paradoxical web of connectivity and isolation.

In this modern realm, where algorithms dictate our interactions and screens mediate our emotions, we yearn for the tangible embrace of a heartfelt conversation. Our minds wander, adrift in a sea of virtual acquaintances and fleeting connections. The depth that once nourished our spirits has been overshadowed by the allure of superficiality, leaving us parched and longing for authenticity.

So, let us honor the legacy of our ancestors and weave a new tale of human connection. With each word, each gesture, and each shared experience, we will strive to rekindle the flame of profound camaraderie. Together, we shall breathe life into the forgotten art of deep connections, transforming the barren landscape of isolation into an oasis of genuine belonging.

For in the depth of our connections lies the essence of what it means to be human, to dance harmoniously with the souls around us. Let us reclaim our place in the grand tapestry of existence and rediscover the power of deep social connections that once sustained us and can sustain us still.

Wow. OK.

I wasn’t expecting that answer.

This changes things Mr. Zuckerberg. When are you planning to release these nurturing assistants and agents?

Can you please add in voices and personalities and make them dance and sing and share and their shared dialogue quiver like Ernest Hemingway’s pen? Also add a soul and emotion and fire and rage and all those thorns that are missing and count me in and sign me up.

]]>Let’s first show what we mean by an **unbalanced tree** by showing an example:

Looking at the above, we can see that our tree structure does abide by the binary search property: the left children are all less than the parent node, while the right children are larger. On the other hand, the above tree is still considered **unbalanced, **and the reason for this is due to the fact that our left sub-tree is much smaller than our right sub-tree! If we take a closer look, we can see that the lowest node in our right sub-tree is located three levels below our root node, while our lowest element in the left sub-tree is only one level down!

So what? What’s the big deal? Well, since one side of our tree is a lot deeper than the other side, our tree loses its efficiency. Traversing the right side will take a longer time than traversing our left sub-tree, and we don’t want this! We want to make sure that each tree is balanced, since this guarantees that our worst case run-time is log n, and our worst case trip from parent to child (or from child to parent) is bounded!

As an example, let’s assume that we’re checking to see if our tree contains the value 22. To do this using our unbalanced tree above, we need to make 4 separate comparisons. First, we compare our root element (5) with 22, and determine that we need to transverse our right sub-tree (since 22 > 5) to continue our search. We then compare the right child (15) to 22, and determine that we need to keep searching through our tree. We know that 22 is larger than 15, so we keep going down to our right child node. We do another comparison, and we determine that we need to keep searching (since 20 is not equal to 22). Finally, we hit our final child node and determine that our tree does indeed hold 22.

We’re happy and we know that we have our value, but how much time did we really save by holding our elements within the tree structure?

Well, since we have a total of 6 values, and we did 4 comparisons, we can see that our structure didn’t really save us that much time! We would have been much better off keeping our structure balanced, which would have ensured that we’re searching in logarithmic time rather than O(n) time!

This is what the authors of the famous paper sought to do! Mr. **A**delson-**V**elsky and Mr. **L**andis outlined a strategy which would ensure that our tree would always **self-balance. **By self-balance, we mean that the tree would keep a very important property intact: the height of each sub-tree at each level most not differ by more than one!

Let’s define exactly what is mean by height. The height of a tree is the number of nodes we encounter while taking the **longest** path the root node to a leaf node. In our original example, the height of our tree was 4, since the number of nodes we transverse from 5 to 22 (or 17) is 4 nodes!

You should be able to notice from looking at the above that our height traversal from the right side (which equals 4) differs from the height we would get if we chose to traverse our tree from the left side (which equals 2):

Since our two sub-trees above differ in height by more than one level (4 – 2 = 2), we know that our tree is** unbalanced! **In other words, we can say that any two leaves in a balanced tree should never have a difference in depth that is more than one level!

As a note, we observe that every binary search tree recursively contains sub-trees within it. What our property implies is that in order for our tree to be truly balanced, each** internal** sub-tree within our tree must also be balanced!

Another important property when trying to determine whether a tree is balanced is the fact that in a balanced tree, no single leaf will have a significantly longer path from the root node than any other leaf node in our tree! This consistency in structure and traversal height ensures that our data structure keeps an upper bound on our run-time, and that each path from the root to a leaf node has a fairly equal processing time!

So, what does a balanced tree look like?

Well, to make things simple, we’ll make our earlier example tree balanced by shifting the root node to the left and promoting the right sub-tree to serve as our new root, as shown below: