---
title: "2_2_cleaning_ADFG_SE_AK_crab_data"
author: "Aidan Coyle"
date: "8/17/2021"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Introduction

In the previous script, we merged several files together to create a single file of all Tanner crab survey data. In this script, we will clean that data

#### Load libraries (and install if necessary)

```{r libraries, message=FALSE, warning=FALSE}
# Add all required libraries here
list.of.packages <- c("tidyverse", "readxl", "lubridate", "rnaturalearth", "rnaturalearthdata", "sf", "rgeos")
# Get names of all required packages that aren't installed
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[, "Package"])]
# Install all new packages
if(length(new.packages)) install.packages(new.packages)


# Load all required libraries
lapply(list.of.packages, FUN = function(X) {
  do.call("require", list(X))
})
```

Now, read in data

```{r}
crabdat <- read.csv(file = "../data/ADFG_SE_AK_pot_surveys/TC_survey_specimen_data_all_years.csv")
```

Now, let's look at the column names of each

```{r}
colnames(crabdat)

crabdat <- crabdat %>%
  dplyr::rename(Year = "ï..Year")

# We can start by removing several columns.

# Project, Trip Number, and Pot Number are likely to be useful in the future to match skipper data to survey data.

# However, we can remove the following columns:

# Specimen.No: We don't particularly care about this - it seems to be its spot on the page
# Number.Of.Specimens: Indicates the degree of subsampling, which we also don't care about
# Length.Millimeters: Tanner crab size is measured via width, not length (king crabs, which were kept in the same database, are measured with length)
# Width.Spines.Millimeters: Also not used for Tanner crab size, column likely only for Dungeness crabs
# Tag.No: Crab number of each crab is irrelevant
# Tag.Event.Code: Event under which each crab was tagged is irrelevant

crabdat <- crabdat %>%
  dplyr::select(-c(Specimen.No, Number.Of.Specimens, Length.Millimeters,
            Width.Spines.Millimeters, Tag.No,
            Tag.Event.Code))
```

We can now begin cleaning our data. First, we'll examine our variables in order to look for outliers and see if any categories should be removed.

**We'll first go through all non-disease regular columns in order, then Specimen.Comments, then our two disease columns (Parasite and Blackmat)**

### Year

```{r}
# First variable - year
plot(table(crabdat$Year))

# Looks like there were no surveys carried out in 1990 or 1992! Let's confirm this by looking at this in table form 
table(crabdat$Year)
```

Interesting! It appears that there were no surveys in either 1990 or 1992, but surveys in all other years. Furthermore, there has been a definitive rise in the number of crab measured on each survey over the past two decades. This could be either due to an overall rise in population or an increase in the size of the survey. Still, interesting to note!

No apparent outliers, and we should keep data from all surveys - even ones with low overall counts.

### Project

```{r}
# Looking at Project column
table(crabdat$Project, useNA = "ifany")

# Unsurprisingly, we have data from both the RKC and Tanner surveys. Interestingly, a lot more Tanners have been measured on the RKC surveys. Let's look at how this changes over time
ggplot(as.data.frame(table(crabdat$Year, crabdat$Project)),
       aes(x = Var1, y = Freq, fill = Var2)) +
  geom_bar(stat = "identity")

# Interesting - so initially, all Tanner crabs caught were on the RKC survey, and the Tanner crab survey is newer. 
# Let's see the first survey that included Tanner crabs
min(crabdat[crabdat$Project == "Tanner Crab Survey", 'Year'])
```

Appears that Tanner crab surveys only began in 1997! This is important to note, as the location (both on a macro scale and a local scale) could be different for RKC and Tanner crab surveys

### Trip Number
```{r}
table(crabdat$Trip.No, useNA = "ifany")

# Looks like surveys were a maximum of 3 legs, with 1 error code (999). Let's change that 999 to an NA
crabdat <- crabdat %>%
  mutate(Trip.No = na_if(Trip.No, 999))

# Verify we did it correctly
table(crabdat$Trip.No, useNA = "ifany")
```

### Location
```{r}
table(crabdat$Location, useNA = "ifany")

# All these look alright to me!
```

### Pot Number

```{r}
table(crabdat$Pot.No, useNA = "ifany")
# No NAs or clear error codes, all good!
```

### Species

```{r}
table(crabdat$Species, useNA = "ifany")

# Since they're all Tanner crab, this column contains no useful info and can be removed
crabdat <- dplyr::select(crabdat, -Species)
```

### Sex

```{r}
table(crabdat$Sex, useNA = "ifany")

# Alright, looks like we have 7525 with no sex listed. Let's change those to NA
crabdat <-  crabdat %>%
  mutate(Sex = na_if(Sex, ""))

# Check we did it right 
table(crabdat$Sex, useNA = "ifany")

# Let's also check at sex ratio for each year
ggplot(crabdat, aes(fill = Sex, x = as.factor(Year))) +
  geom_bar(position = "fill")
# Huh, some overall variance, but overall getting lots more females
# Also seems like a pronounced drop in females in the last few years. What's up with that?
```

### Carapace Width


```{r}
# Check 10 highest values quickly
crabdat %>%
  arrange(desc(Width.Millimeters)) %>%
  slice(1:10)

# Alright, we can maaaybe accept a 224-mm crab. That's immense, but not impossible.
# There is absolutely no way they have a crab with a carapace width of nearly 2 meters (at least I hope not)
# We'll turn everything with a CW > 400 to NA
crabdat[crabdat$Width.Millimeters > 400 & !is.na(crabdat$Width.Millimeters), ]$Width.Millimeters <- NA

# We'll also look at females separately \
crabdat[crabdat$Sex == "Female", ] %>%
  arrange(desc(Width.Millimeters)) %>%
  slice(1:20)
# Those are definitely big females, but not unreasonably so

# Check 10 lowest values too
crabdat %>%
  arrange(Width.Millimeters) %>%
  slice(1:20)
# These aren't unreasonably small, but they do show some crab with a chela height greater than their width
# We'll keep that in mind for later

# Create some histograms
# All crab
hist(crabdat$Width.Millimeters)
# Male crab
hist(crabdat[crabdat$Sex == "Male", ]$Width.Millimeters)
# Female crab
hist(crabdat[crabdat$Sex == "Female", ]$Width.Millimeters)

# Let's also check how many crabs we have without a measurement for Width
sum(is.na(crabdat$Width.Millimeters))
# 7800 sounds like a lot, but that's just under 5%
```

### Weight.Grams

```{r}
# Check how many have a weight measurement
sum(!is.na(crabdat$Weight.Grams))
# That's a really negligible number

# Let's quickly check the correlation of weight and carapace width
plot(crabdat$Weight.Grams, crabdat$Width.Millimeters)

# As expected, width and weight are pretty dang tightly correlated
# Since only around 5% have weight measurements, we'll remove the column
crabdat <- dplyr::select(crabdat, -Weight.Grams)
```

### Chela.Height.Millimeters

```{r}
# Again, check how many have a measurement
sum(!is.na(crabdat$Chela.Height.Millimeters))
# Around 10% or so of the crabs have a measured chela height

# Check max and min values
crabdat %>%
  arrange(desc(Chela.Height.Millimeters)) %>%
  slice(1:20)

# Most of these are totally unrealistic. Eliminate every chela height over 80 mm and try again
crabdat[crabdat$Chela.Height.Millimeters > 80 & !is.na(crabdat$Chela.Height.Millimeters), ]$Chela.Height.Millimeters <- NA

# Check max and min values
crabdat %>%
  arrange(desc(Chela.Height.Millimeters)) %>%
  slice(1:20)
# Alright, no obvious chela heights that are extremely wrong

# Let's see all rows with a chela height greater than or equal to the carapace width
crabdat %>%
  filter(Chela.Height.Millimeters >= Width.Millimeters)

# We only have 5, all of which have a CW below 20.
# Realistically any crab with a carapace width below 20 mm is too small to get any sort of reliable chela height from
# Remove the chela height of all crabs with a CW below 20
crabdat[crabdat$Width.Millimeters <= 20 & !is.na(crabdat$Width.Millimeters), ]$Chela.Height.Millimeters <- NA

# Histogram
hist(crabdat$Chela.Height.Millimeters)
# Alright, looks solid

# Check what year chela height measurements began
min(crabdat[!is.na(crabdat$Chela.Height.Millimeters), ]$Year)
# Hmm, 1998. Good to know. Not enough to remove either the column or all pre-98 data, but worth knowing.
```
### Recruit Status

```{r}
table(crabdat$Recruit.Status, useNA = "ifany")
# Hmm, interesting that some are labeled with sex. Let's look at those further
table(crabdat$Recruit.Status, crabdat$Sex, useNA = "ifany")

# Alright, all crabs with an NA for sex also have a blank for recruit status, which is a point in favor of the removal of those rows
# For now, let's just leave them be
# However, we'll convert those blanks to NAs
crabdat <-  crabdat %>%
  mutate(Recruit.Status = na_if(Recruit.Status, ""))

# Check everything worked properly
table(crabdat$Recruit.Status, useNA = "ifany")

# Check recruit status was checked in all years
table(crabdat$Year, crabdat$Recruit.Status, useNA = "ifany")
# Looks good, moving on

```




### Shell Condition

```{r}
table(crabdat$Shell.Condition, useNA = "ifany")

# Alright, we'll first change all blanks to NAs
crabdat <-  crabdat %>%
  mutate(Shell.Condition = na_if(Shell.Condition, ""))

# We also want to change the codes from "Light", "New', "Old"... to numerical codes
# Official ADFG codes are available in the ROPs in ../data/ADFG_SE_AK_pot_surveys/survey_information/
# Soft = 1
# Light = 2
# New = 3
# Old = 4
# Very Old = 5
crabdat$Shell.Condition <- recode(crabdat$Shell.Condition, "Soft" = "1",
                                                       "Light" = "2",
                                                       "New" = "3",
                                                       "Old" = "4",
                                                       "Very Old" = "5")
# Check it worked
table(crabdat$Shell.Condition, useNA = "ifany")
# Great! Looks fantastic

# It'd be a huge shocker if shell condition wasn't checked in all years, but let's be safe
table(crabdat$Shell.Condition, crabdat$Year)
# Yep! Moving on 
```

### Egg Condition

```{r}
table(crabdat$Egg.Condition, useNA = "ifany")

# We have a lot of blanks, let's change those to NAs
crabdat <- crabdat %>%
  mutate(Egg.Condition = na_if(Egg.Condition, ""))

# We'll simplify these variable names somewhat
crabdat$Egg.Condition <- recode(crabdat$Egg.Condition, "Normal Eggs" = "Normal",
                                                       "Dead Eggs < 20%" = "Dead_eggs_under_20pct",
                                                       "Dead Eggs > 20%" = "Dead_eggs_over_20pct",
                                                       'Barren With Clean "Silky" Setae' = "Barren_Clean",
                                                       'Barren With "Matted" Setae, Empty Egg Cases' = "Barren_Matted")
# Check that we did it right
table(crabdat$Egg.Condition, useNA = "ifany")

# Check egg condition was used in all years
table(crabdat$Egg.Condition, crabdat$Year)
# Realistically, looks like it wasn't truly checked prior to '86. Good to keep in mind. Let's continue:
```

### Egg.Development

```{r}
table(crabdat$Egg.Development, useNA = "ifany")

# Again, let's change all those blanks to NAs
crabdat <- crabdat %>%
  mutate(Egg.Development = na_if(Egg.Development, ""))
# Like before, we'll also change some of the variable names to play a little easier in R
crabdat$Egg.Development <- recode(crabdat$Egg.Development, "Eyed eggs" = "Eyed", 
                                  "No eggs" = "Barren",
                                  "Uneyed eggs" = "Uneyed")
# Let's cross-reference the Egg Development and Egg Condition tables
table(crabdat$Egg.Condition, crabdat$Egg.Development, useNA = "ifany")

# Okay, we have some eyebrow-raisers
# First, the 9 Barren Clean crab with an NA in Egg Development, and the 2 Barren Matted crab with the same
# We know they're barren, so we'll assign them an Egg.Development of "Barren"
crabdat[crabdat$Egg.Condition == "Barren_Clean" & !is.na(crabdat$Egg.Condition) & is.na(crabdat$Egg.Development),]$Egg.Development <- "Barren"
crabdat[crabdat$Egg.Condition == "Barren_Matted" & !is.na(crabdat$Egg.Condition) & is.na(crabdat$Egg.Development),]$Egg.Development <- "Barren"

# Next, the crab with over 20% dead eggs that's also barren
# First, change the egg development to something arbitrary (like "REMOVE") to mark it
crabdat[crabdat$Egg.Condition == "Dead_eggs_over_20pct" & !is.na(crabdat$Egg.Condition) & crabdat$Egg.Development == "Barren" & !is.na(crabdat$Egg.Development), ]$Egg.Condition <- "REMOVE"
# Now change that crab's egg development to NA
crabdat[crabdat$Egg.Condition == "REMOVE" & !is.na(crabdat$Egg.Condition), ]$Egg.Development <- NA
# Finally change the egg condition to NA as well
crabdat[crabdat$Egg.Condition == "REMOVE" & !is.na(crabdat$Egg.Condition), ]$Egg.Condition <- NA

# Do the same for the 8 crab with Normal egg condition and Barren egg development
crabdat[crabdat$Egg.Condition == "Normal" & !is.na(crabdat$Egg.Condition) & crabdat$Egg.Development == "Barren" & !is.na(crabdat$Egg.Development), ]$Egg.Condition <- "REMOVE"
# Change those crab egg developments to NA
crabdat[crabdat$Egg.Condition == "REMOVE" & !is.na(crabdat$Egg.Condition), ]$Egg.Development <- NA
# Finally change the egg condition to NA as well
crabdat[crabdat$Egg.Condition == "REMOVE" & !is.na(crabdat$Egg.Condition), ]$Egg.Condition <- NA

# Finally, juveniles definitionally can't have eggs. 
# The 94 normals probably were described as "normal juveniles" and the 43 barren juveniles are redundant
# Therefore, for all juveniles, change egg condition to NA
crabdat[crabdat$Egg.Development == "Juvenile" & !is.na(crabdat$Egg.Development), ]$Egg.Condition <- NA

# Let's also check what years egg development was tracked too
table(crabdat$Egg.Development, crabdat$Year)
# Oh wow, all years! Nice. Continuing

```


### Leg.Condition

```{r}
table(crabdat$Leg.Condition, useNA = "ifany")

# First, change all blanks to NAs
crabdat <- crabdat %>%
  mutate(Leg.Condition = na_if(Leg.Condition, ""))
# Also change all "Not Observed" to NAs
crabdat <- crabdat %>%
  mutate(Leg.Condition = na_if(Leg.Condition, "Not Observed"))

# Change to ADFG codes, as these roughly correspond to the severity of the injury
# From the same ROP described above (found in this repo):
# 1 = No legs missing or regenerated
# 2 = 1 leg missing or regenerated
# 3 = 2+ legs missing or regenerated
# 4 = carapace damage
# 5 = combination of conditions
crabdat$Leg.Condition <- recode(crabdat$Leg.Condition, "Normal" = "1",
                                "One leg or claw missing or regenerated" = "2",
                                "Two or more legs/claws missing or regenerated" = "3",
                                "Abnormal carapace" = "4",
                                "Combination of conditions" = "5")

# Check we did it right
table(crabdat$Leg.Condition, useNA = "ifany")

# Check what years leg condition was noted
table(crabdat$Leg.Condition, crabdat$Year)
# Wasn't checked before 1997. Good to know - let's continue.
```

### Legal.Size

We'll just directly remove this table. Tanner crabs have very small spines, and the only difference between the biological Carapace Width measurement and the legality measurement is that when examining legality, you include the spines. Variance between crab is maybe a millimeter or two at most.


```{r}
# Check just in case there's a tonnnn of info here
table(crabdat$Legal.Size, useNA = "ifany")
# Nope, let's remove
crabdat <- dplyr::select(crabdat, -Legal.Size)
```

### Leatherback

Leatherback is a condition that only king crab have, in which the carapace is leathery or rubbery. 

It is not present in Tanners, therefore this column can be removed

```{r}
# Just double check here
table(crabdat$Leatherback, useNA = "ifany")
# Yep, remove
crabdat <- dplyr::select(crabdat, -Leatherback)
```

### Parasite

We'll skip Parasite for now, and will address it at the end to ensure we've eliminated all other problems

### Egg Percent

```{r}
# See how many non-NAs we have
sum(!is.na(crabdat$Egg.Percent))
# Hmm, just under 50k. Let's see when they began to track it
min(crabdat[!is.na(crabdat$Egg.Percent), ]$Year)
# Alright, 1997 at earliest - our original start date. 

# What values do we have?
table(crabdat$Egg.Percent, useNA = "ifany")
# Hmm, alright it's not exactly ideal. But if we treat it as a continuous variable, it should be all OK.
```

### Specimen Comments

Alright, we've finished all our non-disease columns! Let's see if we have any interesting comments we can work with

```{r}
# See if we have any semicolons. In ADFG-speak, semicolons separate comments
crabdat[grep(";", crabdat$Specimen.Comments), ]
# Check if commas were used too for the same purpose. Sometimes done on older surveys
crabdat[grep(",", crabdat$Specimen.Comments), ]
# No crab have a common, boring comment tagged on to an interesting comment.
# e.g. lots of boring comments say "NMFS [tag_no]". We can therefore remove all crabs with that comment, as
# no crab has both a common boring AND interesting comment.

# Remove all comments with a variation of "NMFS" in them
crabdat[grep("nmfs", crabdat$Specimen.Comments, ignore.case = TRUE), ]$Specimen.Comments <- NA
# Remove all comments with "ws ###" in them. Refers to size with spines
crabdat[grep("^ws ???", crabdat$Specimen.Comments, ignore.case = TRUE),]$Specimen.Comments <- NA
# Remove all comments with "with spines" in them. Also refers to size with spines
crabdat[grep("with spines", crabdat$Specimen.Comments, ignore.case = TRUE),]$Specimen.Comments <- NA
# Remove all comments with "w/s" in them. Also refers to size with spines
crabdat[grep("w/s", crabdat$Specimen.Comments, ignore.case = TRUE),]$Specimen.Comments <- NA
# Remove all comments with "w/ sp" in them. Also refers to size with spines
crabdat[grep("w/ sp", crabdat$Specimen.Comments, ignore.case = TRUE),]$Specimen.Comments <- NA
# Remove all comments with "Spines:" in them. Also refers to size with spines
crabdat[grep("spines:", crabdat$Specimen.Comments, ignore.case = TRUE),]$Specimen.Comments <- NA
# Remove all comments with TG #### in them
crabdat[grep("^TG ????", crabdat$Specimen.Comments, ignore.case = TRUE), ]$Specimen.Comments <- NA
# Remove all comments with Tag ### in them
crabdat[grep("^Tag????", crabdat$Specimen.Comments, ignore.case = TRUE), ]$Specimen.Comments <- NA
# Remove all comments with Slide in them
crabdat[grep("Slide", crabdat$Specimen.Comments, ignore.case = TRUE), ]$Specimen.Comments <- NA
# Remove all comments with PME in them
crabdat[grep("PME", crabdat$Specimen.Comments, ignore.case = TRUE), ]$Specimen.Comments <- NA
# Remove all comments with NF #### in them
crabdat[grep("NF ????", crabdat$Specimen.Comments, ignore.case = TRUE), ]$Specimen.Comments <- NA
# Remove all comments with N #### in them
crabdat[grep("N ????", crabdat$Specimen.Comments, ignore.case = TRUE), ]$Specimen.Comments <- NA
# Remove all comments with (legal_size_code edited from 00) in them
crabdat[grep("legal_size_code edited from 00", crabdat$Specimen.Comments, ignore.case = TRUE), ]$Specimen.Comments <- NA

# Looked through all remaining comments, and found zero interesting enough to change another column's value

# Therefore, remove the column
crabdat <- dplyr::select(crabdat, -Specimen.Comments)
```



### Black Mat

```{r}
table(crabdat$Blackmat)
# Check when they began checking for Black Mat on each survey
table(crabdat$Blackmat, crabdat$Project, crabdat$Year)
# No big differences between surveys - both were checking by '97

# Alright, we'll do the same as before - change Black Mat to a 0/1 column. 
# 1 = observed, 0 = "" or "None Observed
crabdat <- crabdat %>%
  mutate(Blackmat = if_else(Blackmat == "Observed", "1", "0"))

# Alright let's check we didn't mess up
table(crabdat$Blackmat)

# Let's also look at the Black Mat infection rate by year
ggplot(crabdat, aes(x = Year, fill = Blackmat)) +
  geom_bar(position = "fill")
# Seems like there was a wave in the mid-2000s that's since decreased somewhat

# Also looks like Black Mat wasn't checked for prior to the early '80s. We may want to remove those rows, just in case we want to do a future analysis of Black Mat's causes
# First, let's check the earliest year that Black Mat was checked for
min(crabdat[crabdat$Blackmat == "1", ]$Year)
# Alright, it's 1982. Was BCS checked for before then? (I did a more in-depth dive earlier, and the answer is "no" - they weren't checked for before 1997 systematically.)
min(crabdat[crabdat$Parasite == "Bitter crab", ]$Year)

# Alright, we'll eliminate all crab from before 1982
crabdat <- crabdat[crabdat$Year >= 1982, ]
```

### Parasite

```{r}
table(crabdat$Parasite, useNA = "ifany")

# Some entries with parasites are blanks, others are "None present"
# We want to see if the ones with blanks were actually checked
# This would likely be determined by the year in which the survey took place - early years may not have checked for parasites
# Let's graph parasite status by year to determine
ggplot(crabdat, aes(fill = Parasite, x = Year)) +
  geom_bar(position = "fill")
# Okay yeah, earlier surveys didn't check for parasites. This is bad news - surveys from early years aren't useful to us, as the presence of Hematodinium wasn't checked. Let's see the earliest crab that had a parasite noted
min(crabdat[crabdat$Parasite != "" & !is.na(crabdat$Parasite), ]$Year)

table(crabdat$Year, crabdat$Parasite)

# Alright, so the crabs definitely weren't checked for parasites prior to 1993. Between 1993-1997, it's uncertain, as there are zero bitter crab from '94-96. It's possible they didn't encounter any diseased crab, but the column header is "", not None Present (which begin to appear in '97).

# Check that it's not related to a difference in survey protocol
table(crabdat$Year, crabdat$Parasite, crabdat$Project)
# Nope, looks all good. Also looks like the "" vs "None Present" distinction isn't a survey thing either.
# Okay, later we'll eliminate some rows

# Alright, we only have around 65 rows with a parasite other than Hematodinium. Let's remove those rows, as it's quite possible the survey guidelines only allowed for one parasite to be noted at once
crabdat <- crabdat %>%
  filter(Parasite == "" | Parasite == "Bitter crab" | Parasite == "None present")

# We'll now change the column name from Parasite to Bitter
crabdat <- dplyr::rename(crabdat, Bitter = Parasite)

# We'll also recode all blanks or uninfected crab as 0 and all infected crab as 1
crabdat <- crabdat %>%
  mutate(Bitter = if_else(Bitter == "Bitter crab", "1", "0"))

table(crabdat$Bitter)

# Alright, time to remove some rows
# Since we might want to model Black Mat infection status (and have already removed years from before Black Mat was noted), we'll make a copy of the current data table for that purpose.
# BM = Black Mat
BM.crabdat <- crabdat

# To be conservative, we'll assume that all crabs were checked for parasites beginning in '97. Therefore, we'll remove all data prior to '97
# To be clear that it's BCS-specific, we'll give it a new name
BCS.crabdat <- crabdat[crabdat$Year >= 1997 & !is.na(crabdat$Year), ]

# Bummer, there goes a lot of our data. Ah well, nothing we can do about it!

ggplot(BCS.crabdat, aes(fill = Bitter, x = Year)) +
  geom_bar(position = "fill")
```

### Writing out data

NOTE: This is NOT the actual data that will be used inside each model. It includes a whole lot of lines with NA values, for instance. Instead, it is the FULL data that CAN be used in each model. Before actually creating the model, we'll filter out NAs as desired. However, all codes should be accurate.

```{r}
# Black Mat data
write.csv(BM.crabdat, "../output/ADFG_SE_AK pot_surveys/cleaned_data/crab_data/black_mat_cleaned.csv", row.names = FALSE)

# Bitter Crab Syndrome data
write.csv(BCS.crabdat, "../output/ADFG_SE_AK pot_surveys/cleaned_data/crab_data/BCS_cleaned.csv", row.names = FALSE)

```

# END OF SCRIPT

However, just for fun, we can also make some quick graphs to look at how our key variable (Hematodinium infection status) varies with our other variables


``` {r} 
# Let's look a bit further at how infection status changes with a few other variables
ggplot(crabdat, aes(fill = Bitter, x = Location)) +
  geom_bar(position = "fill")
# Definitely a ton of change in different locations
ggplot(crabdat, aes(fill = Bitter, x = Sex)) +
  geom_bar(position = "fill")
# Sex doesn't actually seem to be too different!
ggplot(crabdat, aes(x = Bitter, y = Width.Millimeters)) +
  geom_violin()
# Also not much overlap - perhaps infected crab are slightly larger
ggplot(crabdat, aes(fill = Bitter, x = Shell.Condition)) +
  geom_bar(position = "fill")
# Definitely an effect of shell condition going on here
ggplot(crabdat, aes(fill = Bitter, x = Egg.Condition)) +
  geom_bar(position = "fill")
# The disparity between Barren_Clean and Barren_Matted is interesting
ggplot(crabdat, aes(fill = Bitter, x = Egg.Development)) +
  geom_bar(position = "fill")
# We don't have many eyed eggs, so it's interesting to see that big gap between that and everything else
ggplot(crabdat, aes(fill = Bitter, x = Leg.Condition)) +
  geom_bar(position = "fill")
# Huh, looks like infection rates actually decrease as injury level increases. Weird!
ggplot(crabdat, aes(fill = Bitter, x = Egg.Development)) +
  geom_bar(position = "fill")
ggplot(crabdat, aes(fill = Bitter, x = Egg.Percent)) +
  geom_bar(position = "fill")
ggplot(crabdat, )

```




```{r}
colnames(crabdat)
```