The journey to becoming a developer

My future is created by what I do today, not tomorrow.

Computer Science/Crash Course

The First Programming Languages: Crash Course Computer Science #11

Millie 2021. 10. 13. 05:41

Last Episode

So far, for most of this series, we’ve focused on hardware -- the physical components of computing -- things like: electricity and circuits, registers and RAM, ALUs and CPUs.

Today's Episode

But programming at the hardware level is cumbersome and inflexible, so programmers wanted a more versatile way to program computers - what you might call a “softer” medium. That’s right, we’re going to talk about Software!


Machine Language (Machine Code)

As we've seen, computer hardware can only handle raw, binary instructions.

This is the “languagecomputer processors natively speak.

In fact, it’s the only language they’re able to speak.

It’s called Machine Language or Machine Code.

In the early days of computing, people had to write entire programs in machine code.

Pseudo Code

More specifically, they’d first write a high-level version of a program on paper, in English.

For example,

“retrieve the next sale from memory, then add this to the running total for the day, week and year, then calculate any tax to be added” ...and so on.

An informal, high-level description of a program like this is called Pseudo-Code.

Then, when the program was all figured out on paper, they’d painstakingly expand and translate it into binary machine code by hand, using things like opcode tables.

After the translation was complete, the program could be fed into the computer and run.

Mnemonics

As you might imagine, people quickly got fed up with this process.

So, by the late 1940s and into the 50s, programmers had developed slightly higher-level languages that were more human-readable.

Opcodes were given simple names, called mnemonics, which were followed by operands, to form instructions.

So instead of having to write instructions as a bunch of 1’s and 0’s, programmers could write something like “LOAD_A 14”.

We used this mnemonic in Episode 8 because it’s so much easier to understand!

Of course, a CPU has no idea what “LOAD_A 14” is. It doesn’t understand text-based language, only binary.

Assembler

And so programmers came up with a clever trick.

They created reusable helper programs, in binary, that read in text-based instructions, and assemble them into the corresponding binary instructions automatically.

This program is called an Assembler.

It reads in a program written in an Assembly Language and converts it to native machine code.

“LOAD_A 14” is one example of an assembly instruction.

Over time, Assemblers gained new features that made programming even easier.

One nifty feature is automatically figuring out JUMP addresses.

This was an example program I used in episode 8:

Notice how our JUMP NEGATIVE instruction jumps to address 5, and our regular JUMP goes to address 2.

The problem is, if we add more code to the beginning of this program, all of the addresses would change.

That’s a huge pain if you ever want to update your program!

And so an assembler does away with raw jump addresses, and lets you insert little labels that can be jumped to.

When this program is passed into the assembler, it does the work of figuring out all of the jump addresses.

A new level of Abstraction

Now the programmer can focus more on programming and less on the underlying mechanics under the hood enabling more sophisticated things to be built by hiding unnecessary complexity.

The limitation of Assembly Languages

However, even with nifty assembler features like auto-linking JUMPs to labels, Assembly Languages are still a thin veneer over machine code.

1. In general, each assembly language instruction converts directly to a corresponding machine instruction – a one-to-one mapping – so it’s inherently tied to the underlying hardware.

2. And the assembler still forces programmers to think about which registers and memory locations they will use. If you suddenly needed an extra value, you might have to change a lot of code to fit it in.

Grace Hopper

This problem did not escape Dr. Grace Hopper. As a US naval officer, she was one of the first programmers on the Harvard Mark 1 computer, which we talked about in Episode 2.

This was a colossal, electro-mechanical beast completed in 1944 as part of the allied war effort.

Programs were stored and fed into the computer on punched paper tape.

By the way, as you can see, they “patched” some bugs in this program by literally putting patches of paper over the holes on the punch tape.

The Mark 1’s instruction set was so primitive, there weren’t even JUMP instructions.

To create code that repeated the same operation multiple times, you’d tape the two ends of the punched tape together, creating a physical loop. In other words, programming the Mark 1 was kind of a nightmare!

After the war, Hopper continued to work at the forefront of computing.

Arithmetic Language Version 0 (A-0)

To unleash the potential of computers, she designed a high-level programming language called “Arithmetic Language Version 0”, or A-0 for short.

Assembly languages have direct, one-to-one mapping to machine instructions. But, a single line of a high-level programming language might result in dozens of instructions being executed by the CPU.

To perform this complex translation, Hopper built the first compiler in 1952.

This is a specialized program that transforms “source” code written in a programming language into a low-level language, like assembly or the binary “machine code” that the CPU can directly process.


So, despite the promise of easier programming, many people were skeptical of Hopper’s idea.

She once said, “I had a running compiler and nobody would touch it… they carefully told me, computers could only do arithmetic; they could not do programs.”

But the idea was a good one, and soon many efforts were underway to craft new programming languages -- today there are hundreds!

 

Sadly, there are no surviving examples of A-0 code, so we’ll use Python, a modern programming language, as an example.

Let’s say we want to add two numbers and save that value.

Remember, in assembly code, we had to fetch values from memory, deal with registers, and other low-level details.

But this same program can be written in python like so: Notice how there are no registers or memory locations to deal with -- the compiler takes care of that stuff, abstracting away a lot of low-level and unnecessary complexity.

The programmer just creates abstractions for needed memory locations, known as variables, and gives them names.

So now we can just take our two numbers, store them in variables we give names to -- in this case, I picked a and b but those variables could be anything - and then add those together, saving the result in c, another variable I created.

It might be that the compiler assigns Register A under the hood to store the value in a, but I don’t need to know about it! Out of sight, out of mind!

It was an important historical milestone, but A-0 and its later variants weren’t widely used.

FORTRAN

FORTRAN, derived from "Formula Translation", was released by IBM a few years later, in 1957, and came to dominate early computer programming.

On average, programs written in FORTRAN were 20 times shorter than equivalent handwritten assembly code.

Then the FORTRAN Compiler would translate and expand that into native machine code.

The community was skeptical that the performance would be as good as hand written code, but the fact that programmers could write more code more quickly, made it an easy choice economically: trading a small increase in computation time for a significant decrease in programmer time.

 

Of course, IBM was in the business of selling computers, and so initially, FORTRAN code could only be compiled and run on IBM computers.

And most programming languages and compilers of the 1950s could only run on a single type of computer.

So, if you upgraded your computer, you’d often have to re-write all the code too!

COBOL

In response, computer experts from industry, academia and government formed a consortium in 1959 -- the Committee on Data Systems Languages, advised by our friend Grace Hopper -- to guide the development of a common programming language that could be used across different machines.

The result was the high-level, easy to use, Common Business-Oriented Language, or COBOL for short.

To deal with different underlying hardware, each computing architecture needed its own COBOL compiler.

But critically, these compilers could all accept the same COBOL source code, no matter what computer it was run on.

This notion is called write once, run anywhere. It’s true of most programming languages today, a benefit of moving away from assembly and machine code, which is still CPU specific.

The biggest impact of all this was reducing computing’s barrier to entry.

Before high level programming languages existed, it was a realm exclusive to computer experts and enthusiasts. And it was often their full time profession.

But now, scientists, engineers, doctors, economists, teachers, and many others could incorporate computation into their work .

Thanks to these languages, computing went from a cumbersome and esoteric discipline to a general purpose and accessible tool.

At the same time, abstraction in programming allowed those computer experts – now “professional programmers” – to create increasingly sophisticated programs, which would have taken millions, tens of millions, or even more lines of assembly code.

Programming Languages (1960~2000)

Now, this history didn’t end in 1959.

In fact, a golden era in programming language design jump started, evolving in lockstep with dramatic advances in computer hardware.

In the 1960s, we had languages like ALGOL, LISP and BASIC.

In the 70’s: Pascal, C and Smalltalk were released.

The 80s gave us C++, Objective-C, and Perl.

And the 90’s: python, ruby, and Java.

And the new millennium has seen the rise of Swift, C#, and Go.

Anyway, some of these might sound familiar -- many are still around today.

It’s extremely likely that the web browser you’re using right now was written in C++ or Objective-C.

That list I just gave is the tip of the iceberg.

And languages with fancy, new features are proposed all the time.

 

Each new language attempts to leverage new and clever abstractions to make some aspect of programming easier or more powerful, or take advantage of emerging technologies and platforms, so that more people can do more amazing things, more quickly.

Many consider the holy grail of programming to be the use of “plain ol’ English”, where you can literally just speak what you want the computer to do, it figures it out, and executes it. This kind of intelligent system is science fiction… for now. And fans of 2001: A Space Odyssey may be okay with that.

Now that you know all about programming languages, we’re going to deep dive for the next couple of episodes, and we’ll continue to build your understanding of how programming languages, and the software they create, are used to do cool and unbelievable things.


이번 에피소드에서는 하드웨어보다 소프트웨어, 그 중에서도 프로그래밍 언어의 역사에 대해서 간략하게 알아볼 수 있었다. 하드웨어의 동작 원리를 파헤치는 에피소드보다는 수월했다. 

다만 Assembly Code의 동작 원리, COBOL, Fortran 등은 처음 알게 되었고 이 부분을 좀 더 알아봐야겠다.