Concept

Skip to new experiment - polarization filters if you already know the double slit experiment and basic quantum mechanics.

double slit experiment

You probably already know the experiment ().

image

The electron here can't be described using known physics (). That's where quantum physics comes in.

electron is a wave

It looks like the probability distribution of finding the electron at a location on the screen was made by a water wave going through the two slits. So we conclude that that's exactly what happened: a wave did go through the two slits.

The electron is literally a wave.\notag \text{The electron is literally a wave.}
The probability of finding the electron somewheredepends only on the height of the wave there.\notag \text{The probability of finding the electron somewhere}\\\text{depends only on the height of the wave there.}

Here's an example ().

Here's what the evolution looks like :

People say "the electron is at multiple positions at once", because the electron is literally a wave, and waves occupy many positions at once. Scientists say the electron is in a "superposition" over multiple positions.

the screen is special

There's something very special about the screen. The screen causes the electron to occupy a specific position. Scientists say the screen "measures" or "observes" the wave, and causes it to "collapse" .

Right after the electron hits the screen, the wave looks like this :

all particles

In the universe, there's light and there are things with mass.

It turns out that all massive particles (like the electron) behave like waves. The electron isn't special.

It turns out that light also behaves the same way. If you know a little physics, you can reason through this .

1. The electron is a wave, occupying multiple positions at once, in a superposition of multiple positions.

2. The screen measures or observes the electron, aka it decides a specific position for the electron based on the wave's height and collapses the wave.

section takeaways

example - half screen

Say we perform the double slit experiment, but only with the right half of the screen ().

What happens when the wave hits the screen? Well, obviously, the universe needs to decide whether the electron hit the screen or not! We can use similar reasoning as above to determine what happens:

Of course, the electrons that hit the screen still make the pattern .

If you measure the wave in a particular region, the wave either gets localized to a point in the region (based on the wave height there), or now has height of 0 in the region:

section takeaways

starting point

Unfortunately, the double slit experiment is the best way to see superposition and measurement, but not the best starting point for describing these things mathematically, because it has superposition and measurement over position, which is infinite.

On the other hand, the "light filter experiment" (explained here) has superposition and measurement over just two things, so it's much simpler. So we'll start with this light experiment.

Maxwell's Equations for light

We can use Maxwell's Equations to determine what light looks like.

Light has an electric field ("E-field") and a magnetic field ("B-field"), and both are perpendicular to each other. Here's a simple case (). Since the two fields oscillate together and are always perpendicular, if we know one field, we can fully determine know the other. So we only need to describe one of them. Let's just consider the E field . In general, the E-field is given by:

E(z,t)=a1cos(kzwt+ϕ1)x^                          +a2cos(kzwt+ϕ2)y^(1)\tag{1} \vec E(z,t) = a_1 \cos(kz - wt+\phi_1) \, \hat x \\ \;\;\;\;\;\;\;\;\;\;\;\;\;+ a_2 \cos(kz - wt+\phi_2) \, \hat y

This is just two perpendicular cosine waves added together! Both cosines travel together with the same speed and frequency (described by kk, ww, zz, and tt). We can change each cosine's individual size/"amplitude" a1a_1 and a2a_2, and its offset/"phase", ϕ1\phi_1 and ϕ2\phi_2.

() We can get "linearly polarized" light by setting ϕ1=ϕ2\phi_1=\phi_2.

() We can get "circularly polarized" light by making the waves totally out of phase with the same amplitude (ϕ1±π2=ϕ2\phi_1\pm \frac{\pi}{2} = \phi_2 and a1=a2a_1 = a_2).

Or we can get "elliptically polarized" light, which is anything in between linearly and circularly polarized.

Here's a picture of everything:

basis

Your choice of coordinate systems is called a "basis". Obviously light behaves the same no matter what coordinates you pick. So it doesn't matter what basis you use.

polarization filter experiment

Now that we know how light works, we can do an experiment with light particles to try and figure out how quantum mechanics works.

To do the experiment, you send a linearly polarized light particle through a "polarization filter". We observe that the light particle either passes through the filter and aligns in the same direction as the filter, or it gets blocked by the filter.

It turns out there's a probability that a photon makes it through the filter. Interestingly, it only depends on the angle θ\theta between the filter and the E-field. !

P[photon passes through]=cos2θ\notag P[\text{photon passes through}] = \cos^2 \theta

One more important idea: The filter only allows photons to pass through if they align with the filter. And half a photon isn't allowed to pass through. So either 100% of the photon aligns with the filter and the photon passes through. Or 0% aligns and the photon gets blocked. If 0% aligns with the filter, the E-field still needs to be somewhere - specifically, it must be perpendicular to the filter.

In other words, the filter causes the photon to either:

1. align parallel.2. align perpendicular.\notag \begin{align*} &\text{1. align parallel.} \\ &\text{2. align perpendicular.} \end{align*}

(in case 2., the photon gets blocked).

superposition and measurement

This experiment is starting to look like the double slit experiment, where the electron became localized to a single position when it hit the screen. Here the photon is becoming localized to a single direction when it hits the filter!

We can reason that the photon's E-field is in a superposition over its two cosines. When the photon hits the filter, it gets measured and collapses either parallel (y^\hat y), or perpendicular (x^\hat x) to the filter.

why does cos2θ\cos^2\theta show up?

Measurement is probabilistic. The filter needs to pick a probability for the incoming photon to align with x^\hat x, and a probability for it to align with y^\hat y. The only reasonable thing for the filter to do is to look at how much the incoming photon currently aligns with x^\hat x and y^\hat y, and pick the probability based on that.

In other words, decompose the photon into a1x^+a2y^a_1 \hat x + a_2 \hat y, and pick probability based on a1a_1 and a2a_2.

image

This forces the universe to square the amplitudes to get the probabilities, and we can see that P[pass]=Prob2=(a2a)2=cos2θP[\text{pass}]=\text{Prob}_2=\big(\frac{a_2}{a}\big)^2 = \cos^2\theta!

collapse picture

Here's an analogy to the double slit experiment. The photon is a "wave" that collapses when it hits the filter, based on the height of the wave at each location! Same exact thing as the double slit experiment, just over 2 things, not infinitely many!

image

A photon's E-field is in a superposition over its two cosines. A filter measures the incoming photon, causing its E-field to either collapse into a cosine that aligns with the filter, or a cosine that aligns perpendicular.

P[pass]=P[photon is measured to align with the filter]P[\text{pass}] = P[\text{photon is measured to align with the filter}].

P[blocked]=P[photon is measured to align perpendicular]P[\text{blocked}] = P[\text{photon is measured to align perpendicular}].

section takeaways

what does the filter care about?

To get a better idea of what the filter cares about, we can rewrite (1) as:

E(z,t)=Re{(a1eiϕ1x^+a2eiϕ2y^)      ei(kzωt)}(2)\tag{2}\vec E(z,t)= \text{Re}\Bigg\{ \Big( a_1 e^{i \phi_1} \hat x + a_2 e^{i \phi_2}\hat y \Big)\;\;\; e^{i(kz- \omega t)} \Bigg\}

The exponent term just shifts where the E-field is in its cycle, which the filter doesn't care about. The only thing the filter cares about is the "shape" of the photon, given by a1eiϕ1x^+a2eiϕ2y^a_1 e^{i \phi_1} \hat x + a_2 e^{i \phi_2}\hat y! Note that the shape is usually called the "state" of the photon.

Let's notate the incoming photon's state as

a=a1eiϕ1x^+a2eiϕ2y^\notag \vec a = a_1 e^{i \phi_1} \,\hat x + a_2 e^{i \phi_2}\,\hat y

and let's notate the state of the filter's pass-through photon as

p=p1eiγ1x^+p2eiγ2y^\notag \vec p = p_1 e^{i \gamma_1} \, \hat x + p_2 e^{i \gamma_2} \,\hat y

probability = projection2^2

Note that when the filter decomposed the light into y^\hat y and x^\hat x components, it was really projecting the incoming photon onto x^\hat x and y^\hat y.

Putting this in terms of a\vec a and p\vec p, cos2θ\cos^2\theta is what we get when we project a\vec a onto p\vec p and square the result for linearly polarized light.

This projection idea is how we'll extend our result from linearly polarized light to all polarized light.

image

normalize

First of all, the filter doesn't care about the magnitudes of either state. So we'll have to ignore them when we get the probability. It's common to notate the magnitude of a vector as a=a=a12+a22a = |\vec a| = \sqrt{a_1^2+a_2^2}. Typically we choose to always ignore the magnitudes by setting a=1a=1 and p=1p=1.

projection

Now, the goal is to figure out how exactly to perform the projection of a\vec a onto p\vec p.

The projection's phase is clearly irrelevant to computing probability, so we should ignore it.

Projection is typically done with the dot product.

If we combine these two ideas, we get the initial guess of (projection of a onto p)=ap(\text{projection of } \vec a \text{ onto } \vec p)= |\vec a \cdot \vec p|.

This works for linearly polarized light to give cosθ\cos \theta, but it gives nonsensical results for arbitrary a\vec a and p\vec p. But it's an easy fix: all we need to do is conjugate one of the vectors before taking the dot product .

The projection of v1\vec v_1 onto v2\vec v_2 is defined as:

v1v2\notag |\vec v_1^* \cdot \vec v_2|

Defining probability as pa2|\vec p^* \cdot \vec a| ^2 works for all cases of linearly polarized light and one case of circularly polarized light, and we can reason that it works in general!

Now that we're dealing with complex vectors, we don't say "perpendicular", we say "orthogonal". Naturally, vectors v1\vec v_1 and v2\vec v_2 are orthogonal when the projection of one onto the other is 00:

v1 is orthogonal to v2    v1v2=0\notag \vec v_1 \text{ is orthogonal to } \vec v_2 \iff \vec v_1^* \cdot \vec v_2 = 0

big result

Putting this all together,

P[pass]=P[(a1eiϕ1a2eiϕ2) is measured as (p1eiγ1p2eiγ2)]=(p1eiγ1p2eiγ2)(a1eiϕ1a2eiϕ2)2\notag P[\text{pass}] = P\Bigg[ \begin{pmatrix}a_1 e^{i\phi_1}\\ a_2 e^{i\phi_2}\end{pmatrix} \text { is measured as } \begin{pmatrix}p_1 e^{i\gamma_1}\\ p_2 e^{i\gamma_2} \end{pmatrix}\Bigg] = \Bigg|\begin{pmatrix}p_1 e^{i\gamma_1}\\ p_2 e^{i\gamma_2} \end{pmatrix}^* \cdot \begin{pmatrix}a_1 e^{i\phi_1}\\ a_2 e^{i\phi_2}\end{pmatrix} \Bigg|^2

This is a picture of measurement in general, for any incoming light described by a\vec a, and any filter that lets through light described by p\vec p (of course, p\vec p is orthogonal to p\vec p_\perp). Note that the phase is ignored!:

examples

example - lin going into lin filter

(Note: )

example - circ going into lin filter

(Note: )

example - circ going into circ filter

example - lin going into circ filter

example - anti circ going into circ filter

The projection of a complex vector onto another one just requires conjugating one of the two vectors before we take their dot product. This gives:

P[pass]=P[a is measured as p]=(p1eiγ1p2eiγ2)(a1eiϕ1a2eiϕ2)2\notag \begin{align*} P[\text{pass}] &= P[\vec a \text { is measured as } \vec p] \\& = \Bigg|\begin{pmatrix}p_1 e^{i\gamma_1}\\ p_2 e^{i\gamma_2} \end{pmatrix}^* \cdot \begin{pmatrix}a_1 e^{i\phi_1}\\ a_2 e^{i\phi_2}\end{pmatrix} \Bigg|^2 \end{align*}

We reasoned that "orthogonal" for complex vectors should mean that the projection of one onto the other equals zero, i.e. v1 is orthogonal to v2    v1v2=0\vec v_1 \text{ is orthogonal to } \vec v_2 \iff \vec v_1^* \cdot \vec v_2 = 0.

section takeaways

definition

The wavefunction is defined as a full description of the particle.\notag \text{The wavefunction is defined as a full description of the particle.}

wavefunction for photon polarization

It shouldn't come as a surprise that for the light polarization experiment, the wavefunction is defined as :

a1eiϕ1x^+a2eiϕ2y^\notag a_1 e^{i \phi_1} \hat x + a_2 e^{i \phi_2}\hat y

wavefunction for double slit experiment

The relevant quantity to measurement in the double slit experiment is:

a(x)eiϕ(x) for every position x\notag a(x) e^{i \phi(x)} \text{ for every position } x

Even though in the double slit experiment positions are not physically orthogonal to each other, we describe measurement as if they are orthogonal, just like in the photon polarization experiment! All the math is the same!

Consider the double slit with only 3 possible positions for the electron. Then, measuring the electron's position would look like this:

Clearly, the wavefunction for the double slit experiment with only 3 possible positions is this: a1eiϕ1x^1+a2eiϕ2x^2+a3eiϕ3x^3 a_1 e^{i \phi_1} \hat x_1 + a_2 e^{i \phi_2} \hat x_2 + a_3 e^{i \phi_3} \hat x_3 .

If we generalize this to 1010 possible positions, the double slit's wavefunction is:

i=110aieiϕix^i(3)\tag{3} \sum_{i=1}^{10} a_i e^{i \phi_i} \hat x_i

In the real experiment we have infinitely many positions, and the wavefunction is:

a(x)eiϕ(x)x^(x)  dx(4)\tag{4} \int_{-\infty}^{\infty} a(x)e^{i \phi(x)} \hat x(x) \; dx

normalization

Each basis vector is orthogonal and normalized, as always.

Below I notate complex numbers as a letter with an underscore, a\underbar a, which makes things much cleaner. a=aeiϕ\underbar a = a e^{i \phi}.

Here's how we normalize (3) and (4), respectively:

i=110ai2=i=110ai2=1\notag \sum_{i=1}^{10} a_i^2 = \sum_{i=1}^{10} |\underbar{a}_i|^2 = 1
a(x)2  dx=a(x)2  dx=1\notag \int_{-\infty}^{\infty} a(x)^2 \; dx = \int_{-\infty}^{\infty} \big|\underbar{a}(x)\big|^2 \; dx = 1

probability

In the discrete case, what's the proabability that we measure a\vec a in the third position x^3\hat x_3? Well, if all the math is the same in both experiments, then it should be (a\vec a projected onto x^3\hat x_3)2^2.

Here's the probability for (3) and (4), respectively:

P[a is measured to be in state x^i]=x^ia2=ai2\notag P[\vec a \text{ is measured to be in state } \hat x_i]= \big|\hat x_i^* \cdot \vec a\big|^2 = \big|\underbar a_i \big|^2
P[a is measured to be in state x^(x0)]=x^(x0)a2dx        (=0, continuous!)\notag P[\vec a \text{ is measured to be in state } \hat x(x_0)] = \big|\hat x^*(x_0) \cdot \vec a\big|^2 dx \;\;\;\; (= 0 \text{, continuous!})

The dxdx makes the continuous case go to 00 as expected.

probability in a region

We can get the probability that the electron is in a region by just summing up all the probabilities there.

Here's the probability in a region for (3) and (4), respectively:

P[a measured between xi and xj]=k=ijx^ka2=k=ijak2\notag P[\vec a \text{ measured between } x_i \text{ and } x_j] =\sum_{k=i}^{j} \big|\hat x_{k}^* \cdot \vec a\big|^2 = \sum_{k=i}^{j} \big|\underbar a_k\big|^2
P[a measured between x1 and x2]=x1x2x^(x)a2  dx=x1x2a(x)2  dx\notag P[\vec a \text{ measured between } x_1 \text{ and } x_2] =\int_{x_1}^{x_2} \big|\hat x(x)^* \cdot \vec a\big|^2 \; dx = \int_{x_1}^{x_2} \big|\underbar a(x)|^2 \; dx

general measurement

Above, the screen only measured the position of the electron, so it measured in the basis of x^(x)\hat x(x). But what if we have a measurement apparatus that measures, say b^\hat b?

I haven't given any other examples besides position, but I figured I'd put this in to be complete.

Here's the probability for (3) and (4), respectively:

P[a is measured to be b^]=b^a2=i=110bi  ai2\notag P[\vec a \text{ is measured to be } \hat b]=|\hat b^* \cdot \vec a|^2 = \big|\sum_{i=1}^{10} \underbar b_i ^* \; \underbar a_i \big|^2
P[a is measured to be b^]=b^a2=b(x)a(x)  dx2\notag P[\vec a \text{ is measured to be } \hat b]=|\hat b^* \cdot \vec a|^2 =\big|\int_{-\infty}^{\infty} \underbar b^*(x) \underbar a(x)\; dx \big|^2

Here are the full details for the continuous case.

bra-ket notation

Rather than using vectors and writing ap\vec a^* \cdot \vec p, Paul Dirac invented bra-ket notation. It just changes our notation from vectors to matrices. When we start writing matrices and not just doing dot products, this notation becomes much easier to use than vectors and star.

It's called bra-ket notation because:

bracket"=bra-ket"=        \notag ``\text{bracket}" = ``\text{bra-ket}" = \braket{\;\;|\;\;}
bra"=    \notag ``\text{bra}" = \bra{\;\;}
ket"=    \notag ``\text{ket}" = \ket{\;\;}

a=a=(a1a2a3..)\vec{a}=\ket{a}=\begin{pmatrix} \underbar a_1\\ \underbar a_2 \\ \underbar a_3\\.\\. \end{pmatrix}

a=a=a=(a1a2a3...)\vec a^\dagger = \vec{a}^{\intercal *} =\bra{a}=\begin{pmatrix}\underbar a_1^*& \underbar a_2^* & \underbar a_3^*&.&. &.\end{pmatrix}

ab=ab=ab=(a1a2a3...)(b1b2b3..)\vec{a}^* \cdot\vec{b} =\bra{a}\ket{b} =\braket{a|b} =\begin{pmatrix}\underbar a_1^*& \underbar a_2^* & \underbar a_3^*&.&. &.\end{pmatrix} \begin{pmatrix} \underbar b_1\\ \underbar b_2 \\ \underbar b_3\\.\\. \end{pmatrix}

everything written in bra-ket

If you're confused about a result, just look at the part that's not written in bra-ket notation and compare.

The wavefunction for light polarization was a=a1eiϕ1x^+a2eiϕ2y^\vec a = a_1 e^{i \phi_1} \hat x+ a_2 e^{i \phi_2}\hat y. Now we can write it as ψ=a1+a2\ket \psi = \underbar a_1\ket \leftrightarrow + \underbar a_2 \ket \updownarrow.

For the discrete case of the double slit experiment, the wavefunction was a=i=110aieiϕix^i\vec a = \sum_{i=1}^{10} a_i e^{i \phi_i} \hat x_i. Now we can write it as ψ=i=110aii\ket \psi = \sum_{i=1}^{10} \underbar a_i \ket{i}.

For the continuous case of the double slit experiment, the wavefunction was a=a(x)eiϕ(x)x^(x)  dx\vec a = \int_{-\infty}^{\infty} a(x)e^{i \phi(x)} \hat x(x) \; dx. Now we can write it as ψ=a(x)xdx\ket \psi = \int_{-\infty}^{\infty} \underbar a(x) \ket x dx.

All of these states ψ\psi were normalized so that the sum of the squares of the aia_is was 11. Even the basis states are normalized. Typically, we normalize everything. A state ψ\ket \psi is normalized if and only if

ψψ=1\notag \braket{\psi | \psi} = 1

Two states ϕ\ket \phi and ψ\ket \psi are orthogonal if and only if

ϕψ=0\notag \braket{\phi| \psi } = 0

Measurement always takes place in an orthogonal basis. For the 3 examples above, that means  ⁣=0\braket{\leftrightarrow \!| \updownarrow} = 0, ij=0\braket{i | j} = 0, and xx=0\braket{x | x'} = 0. And a measurement basis is always normalized, i.e. ii=1\braket{i|i}=1.

Putting these two ideas together, a measurement basis b1b2...\ket b_1 \ket b_2 ... always satisfies

bibj=δij\notag \braket{b_i | b_j} = \delta_{ij}

And of course, the probability of measuring any state ψ\ket \psi to be in the state ϕ\phi is

P[ψ is measured to be in state ϕ]=ϕψ2\notag P\Big[\ket \psi \text{ is measured to be in state } \ket \phi \Big] = \big| \braket{\phi | \psi} \big|^2

A wavefunction is described by a sum or integral over basis vectors:

ψ=iaii\notag \ket \psi = \sum_i \underbar a_i \ket i

We assume every wavefunction is normalized (to be clear, this applies to basis vectors too):

ψψ=1\notag \braket{\psi | \psi}=1

We also assume all basis vectors are orthogonal, so that:

ij=δij\notag \braket{i|j}=\delta_{ij}

The probability of measuring ψ\ket \psi in state ϕ\ket \phi is:

P[ψ is measured to be in state ϕ]=ϕψ2\notag P\Big[ \ket \psi \text{ is measured to be in state } \ket \phi \Big] = |\braket{\phi|\psi}|^2
section takeaways

global phase

eiϕ(a1+a2)e^{i \phi} (\underbar a_1\ket \leftrightarrow + \underbar a_2 \ket \updownarrow) gives the same exact probability as (a1+a2)(\underbar a_1\ket \leftrightarrow + \underbar a_2 \ket \updownarrow).

eiϕe^{i \phi} is called "global phase" because it adds phase globally to the state. The "global phase" doesn't matter to the probability, because we take an absolute value.

A common question is if the global phase is unknowable, or if it's just irrelevant. The answer is that it's irrelevant. The global phase for the photon was the place in the E-field cycle, which we can certainly figure out. It just isn't relevant to probability.

more intuitions

From (2) you should easily be able to reason why 12(+i)\frac{1}{\sqrt 2} (\ket \leftrightarrow + i \ket \updownarrow) gives circularly polarized light \ket \circlearrowleft. The y^\hat y or \ket \updownarrow component just lags 90°90 \degree behind the x^\hat x or \ket \leftrightarrow component!

This idea extends to all wavefunctions. In the double slit experiment, you can think of the phase of each basis state as the relative offset of the cosine wave at that position.

next time

The next note goes over the more physics-y side of things: position, momentum, spin (Stern Gerlach experiment), and Schrodinger's equation, which tells you how the wave evolves without measurement, i.e. how it evolves until it reaches the screen. Schrodinger's equation just says that wavefunction evolves and interferes the way you'd expect - the wave has the same shape as the E\vec E field we saw here, and radiates spherically at all points that it occupies.

The interesting thing is that the wavefunction really is a full description of the particle. The wavefunction at one instant in time dictates how it will evolve in the future (assuming no measurement takes place).

There's also the less physics-y side of things: quantum computing. Stay tuned.

That completes the note, although we still have more to go: the Schrodinger Equations, the Stern Gerlach Experiment, entanglement, the Bloch Sphere, intuitions on quantum teleportation and quantum computing, and more.

Let me know if you liked this note, or if there were any places I should improve (just leave a comment!). If you got stuck somewhere, I urge you to leave a question/comment in that location.

section takeaways