Network Troubleshooting: A Structured Method to Find and Fix Connectivity Issues

Network troubleshooting can look a bit intimidating at first, because problems rarely come with a clear error message. A user says “the internet is down,” but the internet is actually up—only DNS isn’t working. Sometimes you hear “I can’t connect to the server,” but the server is fine; the issue is the wrong VLAN, the wrong route, or a firewall rule. That’s why the hardest part of troubleshooting isn’t guessing “what it looks like,” but narrowing the problem down step by step.

In this article, I want to explain troubleshooting not as a “reflex checklist,” but as a method. Because once a good method is established, you think in a similar way regardless of the device brand. My goal is this: even a beginner should be able to classify a problem quickly by following a clear order. For that, we’ll use the OSI model not as something to memorize, but as a fault-finding map. Then we’ll discuss the most logical order for practical tests (ping, tracert, nslookup, etc.). Finally, we’ll walk through common real-world scenarios one by one (no access within the LAN, only some services working, intermittent drops, slowness).

After reading this, the goal is: when a problem happens, instead of panicking, you’ll have a roadmap where you can say “first I verify this, then that.”

1) The 3 basic rules of troubleshooting

Before you start troubleshooting, there are three rules that speed up most people:

1.1 Define the problem, not the symptom

“Internet is down” is not a problem definition—it’s a symptom. A problem definition should be more concrete:

“The PC is not getting an IP.”
“I can ping the gateway but DNS doesn’t resolve.”
“The intranet doesn’t load only when the VPN is connected.”
“Two devices in the same VLAN within the LAN can’t communicate.”

A concrete problem definition prevents you from looking in the wrong place.

1.2 Change only one variable at a time

If you change the cable and also tweak DNS settings at the same time, you won’t know what actually fixed it. Make one change, test, observe the result.

1.3 Compare

Is the problem on a single device, or on a group of devices? Does another PC in the same location have the same problem? Does wired work while wireless doesn’t? These comparisons shrink the scope of the issue.

2) Use the OSI model like a “fault map”

The OSI model is often taught theoretically: Layer 1, Layer 2… But in troubleshooting, it helps you think: “Which layer might the problem be in?”

Layer 1: Physical

Is the cable intact?
Are the port lights on?
Is link speed/duplex correct?
Is there a Wi-Fi signal?
Does the cable test show errors?

With Layer 1 issues, the connection is usually completely down or extremely unstable.

Layer 2: Data Link

Is the switch port in the correct VLAN?
Is the MAC address being learned?
Did STP put the port into blocking?
Is there a loop/broadcast storm?
Is ARP resolving?

Layer 2 issues often create the feeling of “the network looks up, but it’s not.”

Layer 3: Network

Are IP, subnet mask, and gateway correct?
Is the routing table correct?
Is an ACL/firewall blocking at Layer 3?
Is there an IP conflict?

With Layer 3 issues, ping and route tests become critical.

Layer 4 and above: Transport / Application

Is the port open? (like 80/443/3389)
Is DNS working?
Is the application server running?
Is there a TLS certificate problem?
Is a proxy/firewall blocking at the application level?

The advantage of this approach is simple: you don’t search randomly. You move layer by layer.

3) Before you start: quick information gathering

In troubleshooting, the first minute is critical. If you ask the right questions and collect the right information, you’re already halfway there.

Questions to ask

How long has the problem existed?
Is it happening for everyone, or only one person?
Did it work before? What changed?
Which service is not working? (web, mail, file access?)
Where does it happen? (in the office, at home over VPN?)

Quick checks

Does it get an IP? (ipconfig / ifconfig)
What is the gateway?
What is the DNS?
Wi-Fi or wired?
Do you have VLAN/port information?

4) The most useful basic tests and the right order

One of the biggest beginner mistakes is running tests randomly. A good order narrows the problem quickly.

4.1 IP check

Windows:

ipconfig /all

Linux:

ip a
ip route

Check these:

Is there an IP address?
Is the subnet mask/prefix correct?
Is there a default gateway?
Are DNS servers correct?

If a device has a 169.254.x.x address, it usually failed to get DHCP.

4.2 Ping test (to separate layers)

Don’t use ping as “works/doesn’t work”—use it as a test chain:

Your own IP (loopback):

ping 127.0.0.1
Is the TCP/IP stack working?

Your gateway:

ping 192.168.10.1
Is local L2/L3 connectivity working?

Another device in the same subnet:

ping 192.168.10.20
Is VLAN/switching healthy?

An external IP (internet test without DNS):

ping 1.1.1.1 or 8.8.8.8
Is routing/NAT egress working?

Ping by name:

ping google.com
Is DNS working?

If you follow this order, you can classify an “internet is down” complaint very quickly.

4.3 Traceroute / tracert

Windows:

tracert 8.8.8.8

Linux:

traceroute 8.8.8.8

This test shows you which hops the packet travels through. It’s especially helpful for routing/WAN issues.

4.4 DNS tests

nslookup example.com
dig example.com (Linux)

You can see whether DNS responds and which IP it returns. Wrong DNS, wrong records, or caching issues often show up here.

4.5 Port test (moving to application level)

If ping works but the application doesn’t, you need to check ports:

Windows: Test-NetConnection host -Port 443
Linux: nc -vz host 443

This helps separate firewall/ACL/service status issues.

5) Data access within the LAN: the “we’re on the same network but I can’t reach it” scenario

Access problems inside the LAN are very common. A user says “file sharing doesn’t open.” Here you first need to separate the layers.

5.1 Are they in the same subnet?

If two devices are in different subnets, you need routing in between. This is sometimes missed because the IPs look similar. If the subnet mask is wrong, devices may think they’re in the same network—or the opposite.

5.2 ARP and MAC table checks

If there’s no ping within the same subnet, check ARP:

Windows: arp -a
Linux: ip neigh

If there’s no ARP response, the issue may be L2/VLAN related.

5.3 VLAN/port configuration

In a wired network, if the port is in the wrong VLAN, the device might get an IP but end up in the wrong network. This can look like “some things work.”

5.4 Ports for services like file access

If ping works but SMB shares don’t, the problem may be at the application level (SMB ports, firewall, permissions). So it’s not correct to say “the network is fine” just because ping succeeds.

6) “Some sites open, some don’t” and the role of DNS

This complaint often leads to DNS—but not always. Let’s stay systematic:

Can you ping an external IP? (ping 1.1.1.1)
Does name resolution work? (nslookup)
Is DNS pointing to the correct server? (DHCP distribution, manual settings)
Could DNS cache/TTL be affecting it?
Could a proxy or security device be blocking some domains?

Sometimes a user says “YouTube opens but the company site doesn’t.” In that case:

The DNS record might be wrong
It might be split DNS (different DNS over VPN)
A security policy might be blocking certain domains

That’s why DNS issues often feel like “the internet is kind of working, but something is weird.”

7) Intermittent issues: “it drops sometimes” is the hardest type

Intermittent problems are the hardest because you need to catch them while they happen. The following ideas help:

7.1 Logs and timing information

When does the drop happen? Every day at the same time? That could point to load or a scheduled job.

7.2 Physical layer and cabling

Loose cables, a bad port, or a poor patch cord can cause intermittent drops.

7.3 Wi-Fi interference and roaming

In wireless, short drops during roaming can be normal. But if it happens too often, it may be AP placement or channel overlap/interference.

7.4 Duplex/speed mismatch

In some environments, incorrect speed/duplex settings can cause packet loss and unstable performance.

7.5 Signs of a loop or broadcast storm

If the network suddenly becomes slow and then recovers, consider an L2 loop or a broadcast storm. STP logs or high switch CPU utilization can be clues.

In these cases, continuous ping with packet loss monitoring or basic monitoring can help. But the most important thing is narrowing the time window so you can catch the moment it repeats.

8) Slowness issues: “it’s connected, but it crawls”

Slowness is rarely caused by a single factor. It’s safer to think in this order:

Is it only one user, or everyone?
Wired or wireless?
Slow inside the LAN, or slow to the internet?
One specific application, or everything?
Is bandwidth saturated? (high download/upload)
Is there packet loss? (ping with % loss)
Is it DNS delay? (sites open slowly but throughput is fine)

Many “slow internet” complaints turn out to be DNS delays, proxy delays, or Wi-Fi interference. That’s why a speed test alone is not enough.

9) A small “decision tree” approach for troubleshooting

For a beginner, one of the best methods is building a simple decision tree:

9.1 If there is no IP

DHCP? VLAN? Port?

9.2 If there is an IP but no gateway ping

L1/L2 problem (cable, switch port, VLAN, ARP)

9.3 If gateway ping works but external IP ping fails

Routing/NAT/firewall egress problem

9.4 If external IP ping works but names don’t

DNS problem (server, distribution, cache)

9.5 If ping works but the application doesn’t

Port/firewall/service problem

This looks simple, but it classifies a lot of problems.

10) Tools: Wireshark and the switch/router side

At some point, looking only at the client side isn’t enough. Switch/router logs, interface states, and—if necessary—packet captures come into play.

10.1 When do you need Wireshark?

Is DHCP getting stuck?
Is an ARP reply arriving?
Is the DNS query going out, and is a reply coming back?
Is the TCP handshake completing?

Wireshark can feel heavy for beginners, but for certain protocols (DHCP/DNS/ARP) it gives very fast answers.

10.2 Where to look on the switch side?

Port status (up/down)
VLAN membership
MAC table
STP state
Error counters (CRC, drops)

10.3 On the router/firewall side

Routing table
NAT sessions
Firewall logs (blocked connections)
Interface status
VPN tunnel status

Reading logs can feel intimidating, but what you’re looking for is usually simple: Is traffic passing, or is it being blocked?

11) A few realistic scenarios and approaches

Senaryo A: “Wi-Fi is connected, but there’s no internet”

Did it get an IP?
Can it ping the gateway?
Can it ping an external IP?
Does DNS resolve?

Results:

If there’s no IP: DHCP/WLAN access problem.
If there’s no gateway ping: it looks connected, but there’s no L2 access (VLAN, AP uplink, captive portal).
If there’s no external IP: egress (NAT/firewall) problem.
If DNS fails: DNS distribution issue or DNS server problem.

Senaryo B: “The file server won’t open”

Can you ping the server’s IP?
What is the port test result (SMB ports)?
Is the service running on the server?
Is a firewall blocking it?
Are permissions correct?

If ping works but ports are blocked, the network may be fine and the issue may be service/firewall related.

Senaryo C: “The branch office can’t reach the HQ server”

Can the branch reach out to the WAN from its gateway?
Is the VPN/MPLS up?
Are routes present (forward and return)?
Is a firewall rule blocking it?

The most common mistake here is “no return route”: the branch reaches the server, but the server can’t send traffic back.

12) Closing: good troubleshooting is a good habit

People who troubleshoot well are usually not just “people who know a lot,” but first and foremost “people who check things in a good order.” Speed in troubleshooting comes from asking the right question in the right order. The OSI model is not a memorization exercise here—it’s a fault map. Ping/traceroute/DNS/port tests are the practical tools that let you use that map.

Once you establish this method, problems feel less stressful. Because the question “Why isn’t it working?” is no longer vague: is there no IP, no gateway, no route, broken DNS, or a closed port? You narrow it down step by step.