This is basically a summary of ch1 of Kaye's Models of Peano Arithmetic which lays out some basic ideas. Let $\cL_A$ be the language consisting of constant symbols 0 and 1, binary function symbols $+$ and $\cdot$, and a binary predicate symbol $<$. $\N$ can be made in an $\cL_A$-structure in the obvious way. $\Th\N$ denotes the set of all sentences satisfied by $\N$, i.e. $\{\sigma\mid \N\vDash\sigma\}$.
Let $M$ be a model of $\Th\N$, in other words $\N\vDash\sigma\implies M\vDash\sigma$. I will go through some basic properties of $M$. Firstly $\N$ and $M$ are elementarily equivalent, because $M\vDash\sigma\implies \N\vDash\sigma$ is equivalent to the implication $\N\vDash\neg\sigma\implies M\vDash\neg\sigma$. In particular, given a formula $\psi(x)$ we have $M\vDash\exists x\,\psi(x) \implies \N\vDash\exists x\,\psi(x)$. This means that a formula with a solution in $M$ also has a solution in $\N$, even though $M$ might be strictly larger (as explained next).
Secondly, $M$ can contain non-standard or `infinitely large' elements. To be precise, let $\underline{n}$ denote the $\cL_A$-term $(((1+1)+1)+\ldots+1)$ (where 1 is repeated $n$ times), and let $\underline{n}^M$ be its interpretation in $M$. Then there is a model $M$ and an element $a\in M$ such that for $a>\underline{n}^M$ for all $n\in\N$. The proof is a standard application of the compactness theorem: extend $\cL_A$ to $\cL'$ by adding a new constant symbol $c$, and observe that the set of $\cL'$-sentences
$$\Th\N \cup \{c>\underline{n}\mid n\in\N\}$$
is finitely satisfiable, hence satisfied by some model $M$. Then we set $a=c^M$ (the interpretation of $c$ in $M$).
Even though $M$ may be larger than $\N$, there is an obvious embedding $f:\N\to M$ mapping $n\mapsto\underline{n}^M$. It is injective because
$$n\neq m \implies \N\vDash\underline{n}\neq\underline{m} \implies M\vDash\underline{n}\neq\underline{m}.$$
It respects $+$ because
$$n+m=k \implies \N\vDash\underline{n}+\underline{m}=\underline{k} \implies M\vDash\underline{n}+\underline{m}=\underline{k}.$$
And we show similarly that $f$ respects $\cdot$ and $<$, so $f$ is in fact an embedding of $\cL_A$-structures, and we can treat $\N$ as a substructure of $M$. In addition, $f$ is an elementary embedding. To recap the definition: given a formula $\varphi(x_1,\ldots,x_n)$ and $a_1,\ldots,a_n\in\N$, we have
$$\N\vDash\varphi(a_1,\ldots,a_n) \iff M\vDash\varphi(a_1,\ldots,a_n).$$
Furthermore, $\N$ is an initial segment of $M$, or in other words $M$ is an end extension of $\N$. This means every nonstandard element of $M$ must be infinitely large. We prove this by considering the sentence
$$\forall x(x<\underline{k}\to x=\underline0\lor x=\underline1\lor \ldots\lor x=\underline{k-1}).$$
The above discussion is quite trivial if you're comfortable with the basic definitions in logic, so let's move on to more interesting things.
Proposition. Let $\theta(x)$ be a formula with one free variable $x$. Then
$$\N\vDash\theta(\underline{k})\ \text{for arbitrarily large $k\in\N$} \iff M\vDash\theta(a)\ \text{for some nonstandard $a$}.$$
Proof. ($\imp$) We have $\N\vDash \forall x\exists y(\theta(y)\land y>x)$, so the same is true in $M$. Taking $x$ to be nonstandard, there exists some nonstandard $y$ such that $M\vDash\theta(y)$.
($\impd$) For each $n\in\N$ we have $M\vDash \exists x(\theta(x)\land x>\underline{n})$, since $x$ can always be taken to be $a$. Therefore the same is true in $\N$, meaning that $\N\vDash\theta(\underline{k})$ for arbitrarily large $k$. $\qed$
A neat consequence of this proposition is showing that $\N$ is not a definable subset of $M$. That is, there is no formula $\psi(x)$ such that $M\models\psi(a) \iff a\in\N$. For if such a formula existed, than it is true for arbitrarily large naturals, hence true for a nonstandard element.
The next concept is about coding an arbitrary subset of $\N$ as some nonstandard element $c\in M$. Let $S\subseteq\N$ be any subset and $p_n$ denote the $n$th prime. By compactness, there is a model $M$ and an element $c\in M$ such that
$$M\vDash \exists x(x\cdot \underline{p_k}=c) \iff k\in S.$$
Conversely, given some $c$ we have the subset $\{k\mid M\vDash \exists x(x\cdot\underline{p_k}=c)\}\subseteq\N$. There is a surprising application of this correspondence in the following result:
Proposition. There are $2^{\aleph_0}$ non-isomorphic countable models of $\Th\N$.
Proof. There are at most $2^{\aleph_0}$ of them, so it suffices to show the upper bound is attained. If there were only $\kappa<2^{\aleph_0}$ countable models, then the number of subsets coded by elements in all these models is at most $\kappa\cdot\aleph_0$ which is less than $2^{\aleph_0}$. But there are $2^{\aleph_0}$ subsets of $\N$. $\qed$