Code to annotate our A. pulchra reference files (the A. millipora transcriptome and genome) with GO information

1 Genome

1.1 Retrieve genome fasta file

We’re using a new A.pulchra genome file annotated by collaborators, which has not been yet been formally published. (stored locally at ../data/Apulchra-genome.fa, ../data/Apulcra-genome.gff)

We want to functionally annotate all of the mRNAs annotated in the genome gff (this gff annotates “genes” and “mRNAs” identically). First let’s get a fasta of this gff.

# create gff of only mRNAs
awk -F'\t' '$3 == "mRNA"' ../data/Apulcra-genome.gff > ../data/02-Apul-reference-annotation/Apulcra-genome-mRNA.gff

/home/shared/bedtools2/bin/bedtools getfasta \
-fi "../data/Apulchra-genome.fa" \
-bed "../data/02-Apul-reference-annotation/Apulcra-genome-mRNA.gff" \
-fo "../data/02-Apul-reference-annotation/Apulcra-genome-mRNA.fa"

Let’s check the file

echo "First few lines:"
head -3 ../data/02-Apul-reference-annotation/Apulcra-genome-mRNA.fa

echo ""
echo "How many sequences are there?"
grep -c ">" ../data/02-Apul-reference-annotation/Apulcra-genome-mRNA.fa
## First few lines:
## >ntLink_0:1104-7056
## ATGATGCCACAAGGGTACAAAAACGCCCTTCCAGGCTATAAAGACCTCTATCTTAGCCAAGCAATCACCGAGGAGGTTCACAATGTAAGTTTGTCCTTTTAGCTTGTGCCTGTTtttcctagtttgtttgcttgtttctttctttctttctttctttttagttctattttgtttcattTGTGATTCTTCAAGATATTTGGTGGTTACTTGTTCAGTCTAGCGTAGCCAAACTATTTGAACACTACAAGATAATTCGCAAGAAACAGGTTGAAGAAAATTCCATTTCAAGCGAAATCTTATTTCATTTTTTTACATTCTCTGTAGCTAATATACTGGCAAGTAATTTTATCGCCAACTGAAGGCAAGCACCAAGGACGTTTACATGGTAAGTTTGTCCTTTTGTTACACAAATTGGGTAGTCTGTTTCCTGGTGTTTTTTTTTTTTTTAATGTTTTAATTCACGATTATTCAGGATATCTAGGGATGAATTATCTGTTTATTACTCCAACATAATTAGTAAGAAACCAATTGGGAAGAATTCCAATAGAAGCAAAATCTTAATCTGTCTTTAAACTACTATGTAGTTAATACTTTAGCGAGTGATCTCTTCTCGGTGGCTATTACCATTTGTTGTGAATGCTATGAACGAAGAGCACCTTATTTTACTACCCGACATTATTGTAGCCCTGAGTTCCCTTTAGATCTGACGCTGCCATTTCTTACTTTAAGGCCTCTTCAATTTCTTGGTATTCGTAGATGTTTTCCACTGGCATTGACTGTGGTGTAAGTTCTTTCAAGCACGGGCCGTCGTTGTCTCTTCTAGCACTAGACAAAAAGGTGAGTGAGTTAAAACATGTTGGCTGTTAGTTCTTTCTACTATTTTCAGTTTTGCACTTGAATTCGTTGGAAGCCTCTGTGACAAGGCTTGATTTAATACTCGCTTTTTGGCATTTCAATTTCGCTTTCACCAAGAGAAACTCGATATTTCGTTGACTTCGCAATAAGTTAGTAGTTTTTCTTTCATATTGAGAGCTTCGCATTAAGGGTTCACATCGATGCTCTTGACTAGGCAAAGAAACGAGTCATTGTGCGAATTGCAGCATCTGTTTCCTACAATCACATATCTACACATATTTCTGGTGGTTTTAATTATAGTATCAATCAATCATCTTGGTTGAATATTGGTTAGGCAAAAGAGCATCCCAAAAAAACGTTTCGATCGAGCGTGTGACCAAACGTGTGGTATGACGGGATTCCTGGTATTGCAATGACCACTGATTTAACACATGAAATGGATACCTTAGAAATATTTCCTGTGATAGTGGTTTTTAAGACATTGTGCTAAATGCCACCCTGGATACTTTATCAACAGCAATTTTGGCCTCAGTAGGCACGTTTAGTAACCTCGAATTAACGCCACTTAATTTTTTTGGCAGCTCTGTGTTATCATATTCAAGTTATTACTCCAAAGCCACCTTCCCCTGTTTTGTCTGCGATCGTTGAACACAACGGTGAGTGAGAACTGAATATCAATATTACTATTAATATATCTTGCTTATAGCAGTAATTGAATTTGCTGCGGATGTACAACAATACTTATATTGTTGTTTTGCACCTCAATTAAGCTCAATTTAACGTCACTTAACTTTCTTGGCAGCTCTGTGTCATCGCATTGATCGAGTCATTATTCCAAGTACGTGGCCTTCCCTTCTGCGCACGATCGCTGAACAAAACGGTGAGTGAGAAAACTGAAGCGTTACATGGAAACTAACTGTAGTCAAGTCGCATCAAGGTTTCACCCTTCTAGAACGACTCAGTTAGCTGTCGTTTTGCTTCAAGTTTTTTTCCGGCAGTACTTAGTAGCATAATTTTGACCATTGACCGCTAATTATTGTGCTCAAAATCTCTCCGCAGATTTCTTTGGTGCGTTTGGCTGGGGCATTTCACCGTAAACACGAGCCCCCTTTGACTCCGTGTCCTCGAACGTCGCAAGAAACAGTGAGTAAAACACTTTGTCTGTATGATTCAACTCCATCAGCTGCATTTGCATGTCAGAAAAGCGTTAATTATCGTTTCATTTCCCTTTTATCTACTTTCAATAACTTGTTTCTTTAGAAAAGAAATTTTATATTCGATAGGCTATCTCACTATTTAGAAAAGAGACAACTTACCTTTGGCAAGCAATTTGATTTTCACTCCTATCATTCAATTGATCATGCCATGTTAAGTATTATTCATAAAATCAATGACCGCTAATTATTATGCTCAAAATCTCCCCGCAGATTTCTTTAGTGCATTTGGACAGGGCGCCGTTTCACCGTAAACACGAGCCCCCTTTGATTCCGTGTCCTCGAACGTCGCAAGAAACGGTGAGTAAAACACTTTGTCTATATGATTCAACTCCATCAGTTGCATTAAATATGTCAGAAAAGCGTTAATTATCGTTTCATTTCCCTTTTATCTTTTTTCAATAACTTGTTACTTTAGAAAAGAAATTTGCATTCGATAGGCTGTCTGACTATTTAGAAAAGAGACAACTTACCTTTGGCAAGCAATTTGATTTTCACTCCTATCATTCAATTGATCATGCCATGTTAAGTATTATTCATAAAATCAATGACCGCTAATTATTATGCTCAAATTCTTCCCGCAGATTTCTTTAGTGCATTTGGACAGGGCGCCGTTTCACCGTAAACACGAGCCCCCTTTGACTCCGTGTCCTCGAACGTCGCAAGAAACGGTGAGTAAAACACTTTGTCTATATGATTTAACTCCATCAGCTGCATTTATATGTTAGAAAAGCTTTAATTATCGTTTCATTTTCCCTTTTATCTATTTTCAATAAGTTGTTGCTGTAGAAAAAAAAAAACTTGTATTCAATAGGCTATCTGACTATTTAGAAAAGAGTCAAGTTACCTTTAGTAAGCAATTTGATTTTTGCACCCATGCAACATTCATCTGATCATGCCAAGTTAAGTATTGGTGATCAACTGCAAAAGGCAATTGCTGGAGATTTTTTAGGCTTGAGCAAAGCTTATTTACCTAGAAAGAAGTCACGTTCTTCCTCACACTGAAAACATGCCAACTTTCTTTCTTTTTCTGACAAGAACGAACGCGTTTTTGACCAGAACAAAAAAAAAAGAGAAGCCTTTTTTGACCGCCTATTGTCGCTTGAAACATTGTTTTGAATGGTTGGTAATAAACTTCACTTTCCTCTCGGTAGATTTCGTTTGCTGGCTTTGACTCAGACGATTCCCTTAGACCATCTTGATTTTCGCACAAAAACAAGATTTCCGCTCCCATGTGCGCCATATTGAACATGGTGACAATGCTGCCTTAATTCACAAGGAACTCTTCTTTTTTTTTCTGCTTTCAGCTGATATAACGAAAAGCTCGCGTTACGATTTCTGACTGCCTGACGCATCTTATATTTTGGTCCTCGCCTGGTTAACAAACACTGACTACCAAATGGAGCGGATTCTCCAAGCTGTTCAAGAAATGTAAAACACCTATTTTCATTTATGTACGTAGTGATCATCGTAATTAGGATAAATTAGCTATCATTTCATTAGGCTTTTACCTGCTTTTCCGTCTTTTTAAGTTCTGCATTAAGCGACAAAAATGGCCTCTTTACTATTTGGCGATTTGTGTTCAACTAGGCGGGGAAAGGAATTGTTTTTCTGTTTTATGCAAGCAAAGGAAATTTTATTCTCTAGTTATAGTGACCAACATATCTGCAAGGCATTATCTTATTTAAGTGTGAACCTCGTTCCGTGTACGGATCCAGAACTCGATTATTTTCAGCAAAAACTTTCATCCCGTTTCTACCCCTTCGGTTTAATTAGTGAACACAATTTGCTCTACCTGAGGACTTTTCTGCAAGCGAATCGTGGCATTCAATTATCCATCCCAAACATTGACAACCAGGCTTATTTACAGGCTCTTTCGACGCAATCTAGTTTGATGTCGCCGCATTATCTTCCATTACAAAGCAATCAGCTCGGCCAAGTCAAACCAACTTTTCAGCTTTCGGAACCAGGTCGCTCTGATGAAAATTTCAGCGACACTGACCCAAAATTCATTTGCAAGCATCCGAGAATTCACGTTCCAACATCAGTCGGTGTTGTACAGCCTAGTGTAAGACGTCGGACTAGCGACAATCCTGTTAGTCCAACTGAAAGTCGATCTGAATCACCTCTATTTTCACTACTACACGAGCCTGAGACAAGTGTCGCTACAACCACTCTAGGTGATCCGACCAATCAATTAGTTTCGCGAACTCTGGCCAAAAGCAATGTCAACCAACTCTCCGCGCAAGATATTCCTAACGATTCCTCCGTTCAGCAAAACAGCCTCGAAAGTCACCTTACCCCTTTGAACCAACCTGTTGATCCTGATCCGCTATTACCCGAATCCATTGATGGTACCAGCAGCATTGAGATCGATTCTTCTAAAGAAAGTACGAAAAGCAGTAGAACTGTGACTCTTTCCGAATCGGAGATGTCACCCCAGTTGCGTTTAGATTTGGAGGAAATACGAAAATTTTATTCTCTTCCAATTAACCTCAATCGTGACGGAGGTGTTCTGCAAGATGTCTCGATAGGGAAAATGTTGGAAAGGATAAAAGGGTTCTTGTGGTTTTTAAAGAAGGTAAAAGGCGTCGAGCCTGCTTTGACTTATTGTATCAATCCGGAAGTCTTACAACAGTTTGTCGAATTTATGATGAAAAATCGTGGTATCAAAGCCATTACTTGTAGCCGGTATGTGACGTCCTTAATAAGTGCCTGCAAAGTGCCACTCGCGTGCACACAAGATGAACAAAAAGAAGAGTCTCTTGAAAAAATTAGGGCCATTCAGAGGCAACTTGAGCGATTGTCCAGACAGGAAAAAATTGATTCCGACAGTCTTAATCCTCAGACAGACAAAGTAGTTTACTCTGAATTGCTAGAATTATGCAGAGAATTCAAATGGGAGGTTTCGGAAAAAACAGGTGCTGATCGTGCACGAAGTTGTATGAATTTGTGCTTGCTTCTCATGTACTGTGCGGTTAACCCGGGCCGAGTCAAAGAATACATCTCACTGAGAATTTATAAAGATCAAAGCGGCGACCAATTGAAAGATCAAAATTTTATCTGGTTCAAGGAGGACGGTGGCATAGTATTGTTGGAAAATAATTACAAGACCAGAAATACTTACGGCCTAAACACCACTGACGTGAGCTCAGTCACATACTTGAATTACTATCTGCAACTATACAAGTCTAAGATGAGATCACTTTTGCTACACGGCAATGACCACGACTTTTTTTTCGTTGCTCCGAGGGGAAATCGTTTCTCGCATGCCTCTTACAACTATTATATATCCGGACTATTCGAAAAGTACTTATCTCGGAGATTGACAACGGTTGACCTTCGAAAAATTGTTGTTAATTACTTTTTGTCGCTTCCAGAAAGTGGCGATTATTCCTTAAGGGAATCGTTTGCGACTCTCATGAAACATTCTATCAGAGCGCAACAAAAATATTACGATGAACGTCCGTTAACCCAAAAAAAAGATAGAGCGCTCGATTTGTTAACCTCTGTGGCTAGACGAAGTCTAGACGAAGATGAACCTGAGATTGTAAGTGATGAAGACCAGGAAGGATATCTCGACTGCTTACCGGTCCCGGGAGATTTTGTGGCCTTGGTCGCAGCCAATTCTACCGAAAAGGTTCCGGAAGTTTTTGTGGCTAAGGTACTGAGACTTTCCGAGGACAAAAAAACTGCTTATCTTGCCGATTTTGCGGAAGAAGAGCCAGGAAGATTTAAATCGAAAGCGGGAAAAAGTTATAAAGAAAATACAAATTCTCTAATTTTCCCAATTGACATCGTCTTTTCGCATTCGGACGGTCTATATGAATTGAGAACGCCAAAAATTGACCTTCATCTTGTGACAGTTCAAAAGAAAAGTTAA
## >ntLink_0:10214-15286
## 
## How many sequences are there?
## 36447
# Read FASTA file
fasta_file <- "../data/02-Apul-reference-annotation/Apulcra-genome-mRNA.fa"  # Replace with the name of your FASTA file
sequences <- readDNAStringSet(fasta_file)

# Calculate sequence lengths
sequence_lengths <- width(sequences)

# Create a data frame
sequence_lengths_df <- data.frame(Length = sequence_lengths)

# Plot histogram using ggplot2
ggplot(sequence_lengths_df, aes(x = Length)) +
  geom_histogram(binwidth = 100, color = "black", fill = "blue", alpha = 0.75) +
  labs(title = "Histogram of Sequence Lengths",
       x = "Sequence Length",
       y = "Frequency") +
  theme_minimal()

summary(sequence_lengths_df)
##      Length      
##  Min.   :   153  
##  1st Qu.:  1162  
##  Median :  2850  
##  Mean   :  5887  
##  3rd Qu.:  6748  
##  Max.   :199145
# Calculate base composition
base_composition <- alphabetFrequency(sequences, baseOnly = TRUE)

# Convert to data frame and reshape for ggplot2
base_composition_df <- as.data.frame(base_composition)
base_composition_df$ID <- rownames(base_composition_df)
base_composition_melted <- reshape2::melt(base_composition_df, id.vars = "ID", variable.name = "Base", value.name = "Count")

# Plot base composition bar chart using ggplot2
ggplot(base_composition_melted, aes(x = Base, y = Count, fill = Base)) +
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  labs(title = "Base Composition",
       x = "Base",
       y = "Count") +
  theme_minimal() +
  scale_fill_manual(values = c("A" = "green", "C" = "blue", "G" = "yellow", "T" = "red"))

# Count CG motifs in each sequence
count_cg_motifs <- function(sequence) {
  cg_motif <- "CG"
  return(length(gregexpr(cg_motif, sequence, fixed = TRUE)[[1]]))
}

cg_motifs_counts <- sapply(sequences, count_cg_motifs)

# Create a data frame
cg_motifs_counts_df <- data.frame(CG_Count = cg_motifs_counts)

# Plot CG motifs distribution using ggplot2
ggplot(cg_motifs_counts_df, aes(x = CG_Count)) +
  geom_histogram(binwidth = 1, color = "black", fill = "blue", alpha = 0.75) +
  labs(title = "Distribution of CG Motifs",
       x = "Number of CG Motifs",
       y = "Frequency") +
  theme_minimal()

1.2 Database Creation

1.2.1 Obtain Fasta (UniProt/Swiss-Prot)

Already done during transcriptome annotation

{r download-UniPSwissP-data, engine='bash'} cd ../../data curl -O https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz mv uniprot_sprot.fasta.gz uniprot_sprot_r2024_11.fasta.gz gunzip -k uniprot_sprot_r2024_11.fasta.gz

1.2.2 Making the database

Already done during transcriptome annotation

{r make-UniPSwissP-blastdb, engine='bash'} /home/shared/ncbi-blast-2.11.0+/bin/makeblastdb \ -in ../../data/uniprot_sprot_r2024_11.fasta \ -dbtype prot \ -out ../../data/blastdb/uniprot_sprot_r2024_11

1.3 Running Blastx

/home/shared/ncbi-blast-2.11.0+/bin/blastx \
-query ../data/02-Apul-reference-annotation/Apulcra-genome-mRNA.fa \
-db ../../data/blastdb/uniprot_sprot_r2024_11 \
-out ../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-uniprot_blastx.tab \
-evalue 1E-20 \
-num_threads 40 \
-max_target_seqs 1 \
-outfmt 6
echo "First few lines:"
head -2 ../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-uniprot_blastx.tab

echo "Number of lines in output:"
wc -l ../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-uniprot_blastx.tab
## First few lines:
## ntLink_4:1155-1537   sp|P35061|H2A_ACRFO 100.000 125 0   0   5   379 1   125 4.96e-86    249
## ntLink_4:2660-3441   sp|P84239|H3_URECA  99.265  136 1   0   371 778 1   136 5.03e-93    273
## Number of lines in output:
## 16190 ../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-uniprot_blastx.tab

1.4 Joining Blast table with annoations.

1.4.1 Prepping Blast table for easy join

tr '|' '\t' < ../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-uniprot_blastx.tab \
> ../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-uniprot_blastx_sep.tab

head -1 ../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-uniprot_blastx_sep.tab
## ntLink_4:1155-1537   sp  P35061  H2A_ACRFO   100.000 125 0   0   5   379 1   125 4.96e-86    249

1.4.2 Could do some cool stuff in R here reading in table

bltabl <- read.csv("../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-uniprot_blastx_sep.tab", sep = '\t', header = FALSE)

spgo <- read.csv("https://gannet.fish.washington.edu/seashell/snaps/uniprot_table_r2023_01.tab", sep = '\t', header = TRUE)

datatable(head(bltabl), options = list(scrollX = TRUE, scrollY = "400px", scrollCollapse = TRUE, paging = FALSE))
datatable(head(spgo), options = list(scrollX = TRUE, scrollY = "400px", scrollCollapse = TRUE, paging = FALSE))
datatable(
  left_join(bltabl, spgo,  by = c("V3" = "Entry")) %>%
  select(V1, V3, V13, Protein.names, Organism, Gene.Ontology..biological.process., Gene.Ontology.IDs) 
 # %>% mutate(V1 = str_replace_all(V1,pattern = "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed", replacement = "Ab"))
)
annot_tab <-
  left_join(bltabl, spgo,  by = c("V3" = "Entry")) %>%
  select(V1, V3, V13, Protein.names, Organism, Gene.Ontology..biological.process., Gene.Ontology.IDs)

write.table(annot_tab, file = "../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-IDmapping-2024_12_12.tab", sep = "\t",
            row.names = TRUE, col.names = NA)
head -n 3 ../output/02-Apul-reference-annotation/Apulcra-genome-mRNA-IDmapping-2024_12_12.tab
# Read dataset
#dataset <- read.csv("../output/blast_annot_go.tab", sep = '\t')  # Replace with the path to your dataset

# Select the column of interest
column_name <- "Organism"  # Replace with the name of the column of interest
column_data <- annot_tab[[column_name]]

# Count the occurrences of the strings in the column
string_counts <- table(column_data)

# Convert to a data frame, sort by count, and select the top 10
string_counts_df <- as.data.frame(string_counts)
colnames(string_counts_df) <- c("String", "Count")
string_counts_df <- string_counts_df[order(string_counts_df$Count, decreasing = TRUE), ]
top_10_strings <- head(string_counts_df, n = 10)

# Plot the top 10 most common strings using ggplot2
ggplot(top_10_strings, aes(x = reorder(String, -Count), y = Count, fill = String)) +
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  labs(title = "Top 10 Species hits",
       x = column_name,
       y = "Count") +
  theme_minimal() +
  theme(legend.position = "none") +
  coord_flip()

#data <- read.csv("../output/blast_annot_go.tab", sep = '\t')

# Rename the `Gene.Ontology..biological.process.` column to `Biological_Process`
colnames(annot_tab)[colnames(annot_tab) == "Gene.Ontology..biological.process."] <- "Biological_Process"

# Separate the `Biological_Process` column into individual biological processes
data_separated <- unlist(strsplit(annot_tab$Biological_Process, split = ";"))

# Trim whitespace from the biological processes
data_separated <- gsub("^\\s+|\\s+$", "", data_separated)

# Count the occurrences of each biological process
process_counts <- table(data_separated)
process_counts <- data.frame(Biological_Process = names(process_counts), Count = as.integer(process_counts))
process_counts <- process_counts[order(-process_counts$Count), ]

# Select the 20 most predominant biological processes
top_20_processes <- process_counts[1:20, ]

# Create a color palette for the bars
bar_colors <- rainbow(nrow(top_20_processes))

# Create a staggered vertical bar plot with different colors for each bar
barplot(top_20_processes$Count, names.arg = rep("", nrow(top_20_processes)), col = bar_colors,
        ylim = c(0, max(top_20_processes$Count) * 1.25),
        main = "Occurrences of the 20 Most Predominant Biological Processes", xlab = "Biological Process", ylab = "Count")

# Create a separate plot for the legend
png("../output/02-Apul-reference-annotation/GOlegend.png", width = 800, height = 600)
par(mar = c(0, 0, 0, 0))
plot.new()
legend("center", legend = top_20_processes$Biological_Process, fill = bar_colors, cex = 1, title = "Biological Processes")
dev.off()
## png 
##   2
knitr::include_graphics("../output/02-Apul-reference-annotation/GOlegend.png")

rm ../output/02-Apul-reference-annotation/GOlegend.png

2 Transcriptome

2.1 Retrieve transcriptome fasta file

We will likely only make use of the annotated genome, since we have an A.pulchra genome now (instead of A.millepora). If we do need the millepora transcriptome though, I have code below for annotation

We’ll be using the A. millipora NCBI rna.fna file, stored here and accessible on the deep-dive genomic resources page

curl https://gannet.fish.washington.edu/acropora/E5-deep-dive/Transcripts/Apul_GCF_013753865.1_rna.fna \
-k \
> ../../data/Apul_GCF_013753865.1_rna.fna

Let’s check the file

echo "First few lines:"
head -3 ../../data/Apul_GCF_013753865.1_rna.fna

echo ""
echo "How many sequences are there?"
grep -c ">" ../../data/Apul_GCF_013753865.1_rna.fna
# Read FASTA file
fasta_file <- "../../data/Apul_GCF_013753865.1_rna.fna"  # Replace with the name of your FASTA file
sequences <- readDNAStringSet(fasta_file)

# Calculate sequence lengths
sequence_lengths <- width(sequences)

# Create a data frame
sequence_lengths_df <- data.frame(Length = sequence_lengths)

# Plot histogram using ggplot2
ggplot(sequence_lengths_df, aes(x = Length)) +
  geom_histogram(binwidth = 1, color = "black", fill = "blue", alpha = 0.75) +
  labs(title = "Histogram of Sequence Lengths",
       x = "Sequence Length",
       y = "Frequency") +
  theme_minimal()

summary(sequence_lengths_df)
# Calculate base composition
base_composition <- alphabetFrequency(sequences, baseOnly = TRUE)

# Convert to data frame and reshape for ggplot2
base_composition_df <- as.data.frame(base_composition)
base_composition_df$ID <- rownames(base_composition_df)
base_composition_melted <- reshape2::melt(base_composition_df, id.vars = "ID", variable.name = "Base", value.name = "Count")

# Plot base composition bar chart using ggplot2
ggplot(base_composition_melted, aes(x = Base, y = Count, fill = Base)) +
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  labs(title = "Base Composition",
       x = "Base",
       y = "Count") +
  theme_minimal() +
  scale_fill_manual(values = c("A" = "green", "C" = "blue", "G" = "yellow", "T" = "red"))
# Count CG motifs in each sequence
count_cg_motifs <- function(sequence) {
  cg_motif <- "CG"
  return(length(gregexpr(cg_motif, sequence, fixed = TRUE)[[1]]))
}

cg_motifs_counts <- sapply(sequences, count_cg_motifs)

# Create a data frame
cg_motifs_counts_df <- data.frame(CG_Count = cg_motifs_counts)

# Plot CG motifs distribution using ggplot2
ggplot(cg_motifs_counts_df, aes(x = CG_Count)) +
  geom_histogram(binwidth = 1, color = "black", fill = "blue", alpha = 0.75) +
  labs(title = "Distribution of CG Motifs",
       x = "Number of CG Motifs",
       y = "Frequency") +
  theme_minimal()

2.2 Database Creation

2.2.1 Obtain Fasta (UniProt/Swiss-Prot)

cd ../../data
curl -O https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
mv uniprot_sprot.fasta.gz uniprot_sprot_r2024_11.fasta.gz
gunzip -k uniprot_sprot_r2024_11.fasta.gz

2.2.2 Making the database

/home/shared/ncbi-blast-2.11.0+/bin/makeblastdb \
-in ../../data/uniprot_sprot_r2024_11.fasta \
-dbtype prot \
-out ../../data/blastdb/uniprot_sprot_r2024_11

2.3 Running Blastx

/home/shared/ncbi-blast-2.11.0+/bin/blastx \
-query ../../data/Apul_GCF_013753865.1_rna.fna \
-db ../../data/blastdb/uniprot_sprot_r2024_11 \
-out ../output/02-Apul-reference-annotation/Apul_GCF_013753865.1_rna-uniprot_blastx.tab \
-evalue 1E-20 \
-num_threads 40 \
-max_target_seqs 1 \
-outfmt 6
echo "First few lines:"
head -2 ../output/02-Apul-reference-annotation/Apul_GCF_013753865.1_rna-uniprot_blastx.tab

echo "Number of lines in output:"
wc -l ../output/02-Apul-reference-annotation/Apul_GCF_013753865.1_rna-uniprot_blastx.tab

2.4 Joining Blast table with annoations.

2.4.1 Prepping Blast table for easy join

tr '|' '\t' < ../output/02-Apul-reference-annotation/Apul_GCF_013753865.1_rna-uniprot_blastx.tab \
> ../output/02-Apul-reference-annotation/Apul_GCF_013753865.1_rna-uniprot_blastx_sep.tab

head -1 ../output/02-Apul-reference-annotation/Apul_GCF_013753865.1_rna-uniprot_blastx_sep.tab

2.4.2 Could do some cool stuff in R here reading in table

bltabl <- read.csv("../output/02-Apul-reference-annotation/Apul_GCF_013753865.1_rna-uniprot_blastx_sep.tab", sep = '\t', header = FALSE)

spgo <- read.csv("https://gannet.fish.washington.edu/seashell/snaps/uniprot_table_r2023_01.tab", sep = '\t', header = TRUE)

datatable(head(bltabl), options = list(scrollX = TRUE, scrollY = "400px", scrollCollapse = TRUE, paging = FALSE))
datatable(head(spgo), options = list(scrollX = TRUE, scrollY = "400px", scrollCollapse = TRUE, paging = FALSE))
datatable(
  left_join(bltabl, spgo,  by = c("V3" = "Entry")) %>%
  select(V1, V3, V13, Protein.names, Organism, Gene.Ontology..biological.process., Gene.Ontology.IDs) 
 # %>% mutate(V1 = str_replace_all(V1,pattern = "solid0078_20110412_FRAG_BC_WHITE_WHITE_F3_QV_SE_trimmed", replacement = "Ab"))
)
annot_tab <-
  left_join(bltabl, spgo,  by = c("V3" = "Entry")) %>%
  select(V1, V3, V13, Protein.names, Organism, Gene.Ontology..biological.process., Gene.Ontology.IDs)

write.table(annot_tab, file = "../output/02-Apul-reference-annotation/Apul_GCF_013753865.1_rna-IDmapping-2024_08_21.tab", sep = "\t",
            row.names = TRUE, col.names = NA)
head -n 3 ../output/02-Apul-reference-annotation/Apul_GCF_013753865.1_rna-IDmapping-2024_08_21.tab
# Read dataset
#dataset <- read.csv("../output/blast_annot_go.tab", sep = '\t')  # Replace with the path to your dataset

# Select the column of interest
column_name <- "Organism"  # Replace with the name of the column of interest
column_data <- annot_tab[[column_name]]

# Count the occurrences of the strings in the column
string_counts <- table(column_data)

# Convert to a data frame, sort by count, and select the top 10
string_counts_df <- as.data.frame(string_counts)
colnames(string_counts_df) <- c("String", "Count")
string_counts_df <- string_counts_df[order(string_counts_df$Count, decreasing = TRUE), ]
top_10_strings <- head(string_counts_df, n = 10)

# Plot the top 10 most common strings using ggplot2
ggplot(top_10_strings, aes(x = reorder(String, -Count), y = Count, fill = String)) +
  geom_bar(stat = "identity", position = "dodge", color = "black") +
  labs(title = "Top 10 Species hits",
       x = column_name,
       y = "Count") +
  theme_minimal() +
  theme(legend.position = "none") +
  coord_flip()
#data <- read.csv("../output/blast_annot_go.tab", sep = '\t')

# Rename the `Gene.Ontology..biological.process.` column to `Biological_Process`
colnames(annot_tab)[colnames(annot_tab) == "Gene.Ontology..biological.process."] <- "Biological_Process"

# Separate the `Biological_Process` column into individual biological processes
data_separated <- unlist(strsplit(annot_tab$Biological_Process, split = ";"))

# Trim whitespace from the biological processes
data_separated <- gsub("^\\s+|\\s+$", "", data_separated)

# Count the occurrences of each biological process
process_counts <- table(data_separated)
process_counts <- data.frame(Biological_Process = names(process_counts), Count = as.integer(process_counts))
process_counts <- process_counts[order(-process_counts$Count), ]

# Select the 20 most predominant biological processes
top_20_processes <- process_counts[1:20, ]

# Create a color palette for the bars
bar_colors <- rainbow(nrow(top_20_processes))

# Create a staggered vertical bar plot with different colors for each bar
barplot(top_20_processes$Count, names.arg = rep("", nrow(top_20_processes)), col = bar_colors,
        ylim = c(0, max(top_20_processes$Count) * 1.25),
        main = "Occurrences of the 20 Most Predominant Biological Processes", xlab = "Biological Process", ylab = "Count")


# Create a separate plot for the legend
png("../output/02-Apul-reference-annotation/GOlegend.png", width = 800, height = 600)
par(mar = c(0, 0, 0, 0))
plot.new()
legend("center", legend = top_20_processes$Biological_Process, fill = bar_colors, cex = 1, title = "Biological Processes")
dev.off()
knitr::include_graphics("../output/02-Apul-reference-annotation/GOlegend.png")
rm ../output/02-Apul-reference-annotation/GOlegend.png
LS0tCnRpdGxlOiAiMDItQXB1bC1yZWZlcmVuY2UtYW5ub3RhdGlvbiIKYXV0aG9yOiAiS2F0aGxlZW4gRHVya2luIgpkYXRlOiAiMjAyNC0wOC0yMCIKYWx3YXlzX2FsbG93X2h0bWw6IHRydWUKb3V0cHV0OiAKICBib29rZG93bjo6aHRtbF9kb2N1bWVudDI6CiAgICB0aGVtZTogY29zbW8KICAgIHRvYzogdHJ1ZQogICAgdG9jX2Zsb2F0OiB0cnVlCiAgICBudW1iZXJfc2VjdGlvbnM6IHRydWUKICAgIGNvZGVfZm9sZGluZzogc2hvdwogICAgY29kZV9kb3dubG9hZDogdHJ1ZQogIGdpdGh1Yl9kb2N1bWVudDoKICAgIHRvYzogdHJ1ZQogICAgdG9jX2RlcHRoOiAzCiAgICBudW1iZXJfc2VjdGlvbnM6IHRydWUKICAgIGh0bWxfcHJldmlldzogdHJ1ZSAKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0KbGlicmFyeShrbml0cikKbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkoa2FibGVFeHRyYSkKbGlicmFyeShEVCkKbGlicmFyeShCaW9zdHJpbmdzKQpsaWJyYXJ5KHRtKQprbml0cjo6b3B0c19jaHVuayRzZXQoCiAgZWNobyA9IFRSVUUsICAgICAgICAgIyBEaXNwbGF5IGNvZGUgY2h1bmtzCiAgZXZhbCA9IEZBTFNFLCAgICAgICAgICMgRXZhbHVhdGUgY29kZSBjaHVua3MKICB3YXJuaW5nID0gRkFMU0UsICAgICAjIEhpZGUgd2FybmluZ3MKICBtZXNzYWdlID0gRkFMU0UsICAgICAjIEhpZGUgbWVzc2FnZXMKICBmaWcud2lkdGggPSA2LCAgICAgICAjIFNldCBwbG90IHdpZHRoIGluIGluY2hlcwogIGZpZy5oZWlnaHQgPSA0LCAgICAgICMgU2V0IHBsb3QgaGVpZ2h0IGluIGluY2hlcwogIGZpZy5hbGlnbiA9ICJjZW50ZXIiICMgQWxpZ24gcGxvdHMgdG8gdGhlIGNlbnRlcgopCmBgYAoKQ29kZSB0byBhbm5vdGF0ZSBvdXIgKkEuIHB1bGNocmEqIHJlZmVyZW5jZSBmaWxlcyAodGhlICpBLiBtaWxsaXBvcmEqIHRyYW5zY3JpcHRvbWUgYW5kIGdlbm9tZSkgd2l0aCBHTyBpbmZvcm1hdGlvbgoKCiMgR2Vub21lCiMjIFJldHJpZXZlIGdlbm9tZSBmYXN0YSBmaWxlCgpXZSdyZSB1c2luZyBhIG5ldyBBLnB1bGNocmEgZ2Vub21lIGZpbGUgYW5ub3RhdGVkIGJ5IGNvbGxhYm9yYXRvcnMsIHdoaWNoIGhhcyBub3QgYmVlbiB5ZXQgYmVlbiBmb3JtYWxseSBwdWJsaXNoZWQuIAooc3RvcmVkIGxvY2FsbHkgYXQgYC4uL2RhdGEvQXB1bGNocmEtZ2Vub21lLmZhYCwgYC4uL2RhdGEvQXB1bGNyYS1nZW5vbWUuZ2ZmYCkKCgpXZSB3YW50IHRvIGZ1bmN0aW9uYWxseSBhbm5vdGF0ZSBhbGwgb2YgdGhlIG1STkFzIGFubm90YXRlZCBpbiB0aGUgZ2Vub21lIGdmZiAodGhpcyBnZmYgYW5ub3RhdGVzICJnZW5lcyIgYW5kICJtUk5BcyIgaWRlbnRpY2FsbHkpLiBGaXJzdCBsZXQncyBnZXQgYSBmYXN0YSBvZiB0aGlzIGdmZi4KYGBge3IgY3JlYXRlLWdlbm9tZS1nZmYtZmFzdGEsIGVuZ2luZT0nYmFzaCd9CiMgY3JlYXRlIGdmZiBvZiBvbmx5IG1STkFzCmF3ayAtRidcdCcgJyQzID09ICJtUk5BIicgLi4vZGF0YS9BcHVsY3JhLWdlbm9tZS5nZmYgPiAuLi9kYXRhLzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bGNyYS1nZW5vbWUtbVJOQS5nZmYKCi9ob21lL3NoYXJlZC9iZWR0b29sczIvYmluL2JlZHRvb2xzIGdldGZhc3RhIFwKLWZpICIuLi9kYXRhL0FwdWxjaHJhLWdlbm9tZS5mYSIgXAotYmVkICIuLi9kYXRhLzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bGNyYS1nZW5vbWUtbVJOQS5nZmYiIFwKLWZvICIuLi9kYXRhLzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bGNyYS1nZW5vbWUtbVJOQS5mYSIKYGBgCgpMZXQncyBjaGVjayB0aGUgZmlsZQoKYGBge3IgZ2Vub21lLXZpZXctcXVlcnksIGVuZ2luZT0nYmFzaCcsIGV2YWw9VFJVRX0KZWNobyAiRmlyc3QgZmV3IGxpbmVzOiIKaGVhZCAtMyAuLi9kYXRhLzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bGNyYS1nZW5vbWUtbVJOQS5mYQoKZWNobyAiIgplY2hvICJIb3cgbWFueSBzZXF1ZW5jZXMgYXJlIHRoZXJlPyIKZ3JlcCAtYyAiPiIgLi4vZGF0YS8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0FwdWxjcmEtZ2Vub21lLW1STkEuZmEKYGBgCgpgYGB7ciBnZW5vbWUtc2VxbGVuZ3RoLWhpc3RvZ3JhbSwgZXZhbD1UUlVFfQojIFJlYWQgRkFTVEEgZmlsZQpmYXN0YV9maWxlIDwtICIuLi9kYXRhLzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bGNyYS1nZW5vbWUtbVJOQS5mYSIgICMgUmVwbGFjZSB3aXRoIHRoZSBuYW1lIG9mIHlvdXIgRkFTVEEgZmlsZQpzZXF1ZW5jZXMgPC0gcmVhZEROQVN0cmluZ1NldChmYXN0YV9maWxlKQoKIyBDYWxjdWxhdGUgc2VxdWVuY2UgbGVuZ3RocwpzZXF1ZW5jZV9sZW5ndGhzIDwtIHdpZHRoKHNlcXVlbmNlcykKCiMgQ3JlYXRlIGEgZGF0YSBmcmFtZQpzZXF1ZW5jZV9sZW5ndGhzX2RmIDwtIGRhdGEuZnJhbWUoTGVuZ3RoID0gc2VxdWVuY2VfbGVuZ3RocykKCiMgUGxvdCBoaXN0b2dyYW0gdXNpbmcgZ2dwbG90MgpnZ3Bsb3Qoc2VxdWVuY2VfbGVuZ3Roc19kZiwgYWVzKHggPSBMZW5ndGgpKSArCiAgZ2VvbV9oaXN0b2dyYW0oYmlud2lkdGggPSAxMDAsIGNvbG9yID0gImJsYWNrIiwgZmlsbCA9ICJibHVlIiwgYWxwaGEgPSAwLjc1KSArCiAgbGFicyh0aXRsZSA9ICJIaXN0b2dyYW0gb2YgU2VxdWVuY2UgTGVuZ3RocyIsCiAgICAgICB4ID0gIlNlcXVlbmNlIExlbmd0aCIsCiAgICAgICB5ID0gIkZyZXF1ZW5jeSIpICsKICB0aGVtZV9taW5pbWFsKCkKCnN1bW1hcnkoc2VxdWVuY2VfbGVuZ3Roc19kZikKYGBgCgpgYGB7ciBnZW5vbWUtQUNHVC1jb21wb3NpdGlvbiwgZXZhbD1UUlVFfQoKIyBDYWxjdWxhdGUgYmFzZSBjb21wb3NpdGlvbgpiYXNlX2NvbXBvc2l0aW9uIDwtIGFscGhhYmV0RnJlcXVlbmN5KHNlcXVlbmNlcywgYmFzZU9ubHkgPSBUUlVFKQoKIyBDb252ZXJ0IHRvIGRhdGEgZnJhbWUgYW5kIHJlc2hhcGUgZm9yIGdncGxvdDIKYmFzZV9jb21wb3NpdGlvbl9kZiA8LSBhcy5kYXRhLmZyYW1lKGJhc2VfY29tcG9zaXRpb24pCmJhc2VfY29tcG9zaXRpb25fZGYkSUQgPC0gcm93bmFtZXMoYmFzZV9jb21wb3NpdGlvbl9kZikKYmFzZV9jb21wb3NpdGlvbl9tZWx0ZWQgPC0gcmVzaGFwZTI6Om1lbHQoYmFzZV9jb21wb3NpdGlvbl9kZiwgaWQudmFycyA9ICJJRCIsIHZhcmlhYmxlLm5hbWUgPSAiQmFzZSIsIHZhbHVlLm5hbWUgPSAiQ291bnQiKQoKIyBQbG90IGJhc2UgY29tcG9zaXRpb24gYmFyIGNoYXJ0IHVzaW5nIGdncGxvdDIKZ2dwbG90KGJhc2VfY29tcG9zaXRpb25fbWVsdGVkLCBhZXMoeCA9IEJhc2UsIHkgPSBDb3VudCwgZmlsbCA9IEJhc2UpKSArCiAgZ2VvbV9iYXIoc3RhdCA9ICJpZGVudGl0eSIsIHBvc2l0aW9uID0gImRvZGdlIiwgY29sb3IgPSAiYmxhY2siKSArCiAgbGFicyh0aXRsZSA9ICJCYXNlIENvbXBvc2l0aW9uIiwKICAgICAgIHggPSAiQmFzZSIsCiAgICAgICB5ID0gIkNvdW50IikgKwogIHRoZW1lX21pbmltYWwoKSArCiAgc2NhbGVfZmlsbF9tYW51YWwodmFsdWVzID0gYygiQSIgPSAiZ3JlZW4iLCAiQyIgPSAiYmx1ZSIsICJHIiA9ICJ5ZWxsb3ciLCAiVCIgPSAicmVkIikpCmBgYAoKCmBgYHtyIGdlbm9tZS1jZy1tb3RpZnMsIGV2YWw9VFJVRX0KCiMgQ291bnQgQ0cgbW90aWZzIGluIGVhY2ggc2VxdWVuY2UKY291bnRfY2dfbW90aWZzIDwtIGZ1bmN0aW9uKHNlcXVlbmNlKSB7CiAgY2dfbW90aWYgPC0gIkNHIgogIHJldHVybihsZW5ndGgoZ3JlZ2V4cHIoY2dfbW90aWYsIHNlcXVlbmNlLCBmaXhlZCA9IFRSVUUpW1sxXV0pKQp9CgpjZ19tb3RpZnNfY291bnRzIDwtIHNhcHBseShzZXF1ZW5jZXMsIGNvdW50X2NnX21vdGlmcykKCiMgQ3JlYXRlIGEgZGF0YSBmcmFtZQpjZ19tb3RpZnNfY291bnRzX2RmIDwtIGRhdGEuZnJhbWUoQ0dfQ291bnQgPSBjZ19tb3RpZnNfY291bnRzKQoKIyBQbG90IENHIG1vdGlmcyBkaXN0cmlidXRpb24gdXNpbmcgZ2dwbG90MgpnZ3Bsb3QoY2dfbW90aWZzX2NvdW50c19kZiwgYWVzKHggPSBDR19Db3VudCkpICsKICBnZW9tX2hpc3RvZ3JhbShiaW53aWR0aCA9IDEsIGNvbG9yID0gImJsYWNrIiwgZmlsbCA9ICJibHVlIiwgYWxwaGEgPSAwLjc1KSArCiAgbGFicyh0aXRsZSA9ICJEaXN0cmlidXRpb24gb2YgQ0cgTW90aWZzIiwKICAgICAgIHggPSAiTnVtYmVyIG9mIENHIE1vdGlmcyIsCiAgICAgICB5ID0gIkZyZXF1ZW5jeSIpICsKICB0aGVtZV9taW5pbWFsKCkKYGBgCgojIyBEYXRhYmFzZSBDcmVhdGlvbgoKIyMjIE9idGFpbiBGYXN0YSAoVW5pUHJvdC9Td2lzcy1Qcm90KQpBbHJlYWR5IGRvbmUgZHVyaW5nIHRyYW5zY3JpcHRvbWUgYW5ub3RhdGlvbgoKYGB7ciBkb3dubG9hZC1VbmlQU3dpc3NQLWRhdGEsIGVuZ2luZT0nYmFzaCd9CmNkIC4uLy4uL2RhdGEKY3VybCAtTyBodHRwczovL2Z0cC51bmlwcm90Lm9yZy9wdWIvZGF0YWJhc2VzL3VuaXByb3QvY3VycmVudF9yZWxlYXNlL2tub3dsZWRnZWJhc2UvY29tcGxldGUvdW5pcHJvdF9zcHJvdC5mYXN0YS5negptdiB1bmlwcm90X3Nwcm90LmZhc3RhLmd6IHVuaXByb3Rfc3Byb3RfcjIwMjRfMTEuZmFzdGEuZ3oKZ3VuemlwIC1rIHVuaXByb3Rfc3Byb3RfcjIwMjRfMTEuZmFzdGEuZ3oKYGAKCiMjIyBNYWtpbmcgdGhlIGRhdGFiYXNlCkFscmVhZHkgZG9uZSBkdXJpbmcgdHJhbnNjcmlwdG9tZSBhbm5vdGF0aW9uCgpgYHtyIG1ha2UtVW5pUFN3aXNzUC1ibGFzdGRiLCBlbmdpbmU9J2Jhc2gnfQovaG9tZS9zaGFyZWQvbmNiaS1ibGFzdC0yLjExLjArL2Jpbi9tYWtlYmxhc3RkYiBcCi1pbiAuLi8uLi9kYXRhL3VuaXByb3Rfc3Byb3RfcjIwMjRfMTEuZmFzdGEgXAotZGJ0eXBlIHByb3QgXAotb3V0IC4uLy4uL2RhdGEvYmxhc3RkYi91bmlwcm90X3Nwcm90X3IyMDI0XzExCmBgCgoKIyMgUnVubmluZyBCbGFzdHgKCmBgYHtyIGdlbm9tZS1ibGFzdHgsIGVuZ2luZT0nYmFzaCcsIGV2YWw9RkFMU0V9Ci9ob21lL3NoYXJlZC9uY2JpLWJsYXN0LTIuMTEuMCsvYmluL2JsYXN0eCBcCi1xdWVyeSAuLi9kYXRhLzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bGNyYS1nZW5vbWUtbVJOQS5mYSBcCi1kYiAuLi8uLi9kYXRhL2JsYXN0ZGIvdW5pcHJvdF9zcHJvdF9yMjAyNF8xMSBcCi1vdXQgLi4vb3V0cHV0LzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bGNyYS1nZW5vbWUtbVJOQS11bmlwcm90X2JsYXN0eC50YWIgXAotZXZhbHVlIDFFLTIwIFwKLW51bV90aHJlYWRzIDQwIFwKLW1heF90YXJnZXRfc2VxcyAxIFwKLW91dGZtdCA2CmBgYAoKYGBge3IgZ2Vub21lLWJsYXN0LWxvb2ssIGVuZ2luZT0nYmFzaCcsIGV2YWw9VFJVRX0KZWNobyAiRmlyc3QgZmV3IGxpbmVzOiIKaGVhZCAtMiAuLi9vdXRwdXQvMDItQXB1bC1yZWZlcmVuY2UtYW5ub3RhdGlvbi9BcHVsY3JhLWdlbm9tZS1tUk5BLXVuaXByb3RfYmxhc3R4LnRhYgoKZWNobyAiTnVtYmVyIG9mIGxpbmVzIGluIG91dHB1dDoiCndjIC1sIC4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0FwdWxjcmEtZ2Vub21lLW1STkEtdW5pcHJvdF9ibGFzdHgudGFiCmBgYAoKCiMjIEpvaW5pbmcgQmxhc3QgdGFibGUgd2l0aCBhbm5vYXRpb25zLgoKIyMjIFByZXBwaW5nIEJsYXN0IHRhYmxlIGZvciBlYXN5IGpvaW4KCmBgYHtyIGdlbm9tZS1zZXBhcmF0ZSwgZW5naW5lPSdiYXNoJywgZXZhbD1UUlVFfQp0ciAnfCcgJ1x0JyA8IC4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0FwdWxjcmEtZ2Vub21lLW1STkEtdW5pcHJvdF9ibGFzdHgudGFiIFwKPiAuLi9vdXRwdXQvMDItQXB1bC1yZWZlcmVuY2UtYW5ub3RhdGlvbi9BcHVsY3JhLWdlbm9tZS1tUk5BLXVuaXByb3RfYmxhc3R4X3NlcC50YWIKCmhlYWQgLTEgLi4vb3V0cHV0LzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bGNyYS1nZW5vbWUtbVJOQS11bmlwcm90X2JsYXN0eF9zZXAudGFiCgpgYGAKCiMjIyBDb3VsZCBkbyBzb21lIGNvb2wgc3R1ZmYgaW4gUiBoZXJlIHJlYWRpbmcgaW4gdGFibGUKCmBgYHtyIGdlbm9tZS1yZWFkLWRhdGEsIGV2YWw9VFJVRSwgY2FjaGU9VFJVRX0KYmx0YWJsIDwtIHJlYWQuY3N2KCIuLi9vdXRwdXQvMDItQXB1bC1yZWZlcmVuY2UtYW5ub3RhdGlvbi9BcHVsY3JhLWdlbm9tZS1tUk5BLXVuaXByb3RfYmxhc3R4X3NlcC50YWIiLCBzZXAgPSAnXHQnLCBoZWFkZXIgPSBGQUxTRSkKCnNwZ28gPC0gcmVhZC5jc3YoImh0dHBzOi8vZ2FubmV0LmZpc2gud2FzaGluZ3Rvbi5lZHUvc2Vhc2hlbGwvc25hcHMvdW5pcHJvdF90YWJsZV9yMjAyM18wMS50YWIiLCBzZXAgPSAnXHQnLCBoZWFkZXIgPSBUUlVFKQoKZGF0YXRhYmxlKGhlYWQoYmx0YWJsKSwgb3B0aW9ucyA9IGxpc3Qoc2Nyb2xsWCA9IFRSVUUsIHNjcm9sbFkgPSAiNDAwcHgiLCBzY3JvbGxDb2xsYXBzZSA9IFRSVUUsIHBhZ2luZyA9IEZBTFNFKSkKYGBgCgpgYGB7ciBnZW5vbWUtc3Bnby10YWJsZSwgZXZhbD1UUlVFfQpkYXRhdGFibGUoaGVhZChzcGdvKSwgb3B0aW9ucyA9IGxpc3Qoc2Nyb2xsWCA9IFRSVUUsIHNjcm9sbFkgPSAiNDAwcHgiLCBzY3JvbGxDb2xsYXBzZSA9IFRSVUUsIHBhZ2luZyA9IEZBTFNFKSkKYGBgCgpgYGB7ciBnZW5vbWUtc2VlLCBldmFsPVRSVUV9CmRhdGF0YWJsZSgKICBsZWZ0X2pvaW4oYmx0YWJsLCBzcGdvLCAgYnkgPSBjKCJWMyIgPSAiRW50cnkiKSkgJT4lCiAgc2VsZWN0KFYxLCBWMywgVjEzLCBQcm90ZWluLm5hbWVzLCBPcmdhbmlzbSwgR2VuZS5PbnRvbG9neS4uYmlvbG9naWNhbC5wcm9jZXNzLiwgR2VuZS5PbnRvbG9neS5JRHMpIAogIyAlPiUgbXV0YXRlKFYxID0gc3RyX3JlcGxhY2VfYWxsKFYxLHBhdHRlcm4gPSAic29saWQwMDc4XzIwMTEwNDEyX0ZSQUdfQkNfV0hJVEVfV0hJVEVfRjNfUVZfU0VfdHJpbW1lZCIsIHJlcGxhY2VtZW50ID0gIkFiIikpCikKYGBgCgpgYGB7ciBnZW5vbWUtam9pbiwgZXZhbD1UUlVFfQphbm5vdF90YWIgPC0KICBsZWZ0X2pvaW4oYmx0YWJsLCBzcGdvLCAgYnkgPSBjKCJWMyIgPSAiRW50cnkiKSkgJT4lCiAgc2VsZWN0KFYxLCBWMywgVjEzLCBQcm90ZWluLm5hbWVzLCBPcmdhbmlzbSwgR2VuZS5PbnRvbG9neS4uYmlvbG9naWNhbC5wcm9jZXNzLiwgR2VuZS5PbnRvbG9neS5JRHMpCgp3cml0ZS50YWJsZShhbm5vdF90YWIsIGZpbGUgPSAiLi4vb3V0cHV0LzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bGNyYS1nZW5vbWUtbVJOQS1JRG1hcHBpbmctMjAyNF8xMl8xMi50YWIiLCBzZXAgPSAiXHQiLAogICAgICAgICAgICByb3cubmFtZXMgPSBUUlVFLCBjb2wubmFtZXMgPSBOQSkKYGBgCgpgYGB7ciBnZW5vbWUtdmlldy1oZWFkZXJzLCBlbmdpbmU9J2Jhc2gnfQpoZWFkIC1uIDMgLi4vb3V0cHV0LzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bGNyYS1nZW5vbWUtbVJOQS1JRG1hcHBpbmctMjAyNF8xMl8xMi50YWIKYGBgCgpgYGB7ciBnZW5vbWUtc3BlY2llcy1oaXRzLCBldmFsPVRSVUV9CiMgUmVhZCBkYXRhc2V0CiNkYXRhc2V0IDwtIHJlYWQuY3N2KCIuLi9vdXRwdXQvYmxhc3RfYW5ub3RfZ28udGFiIiwgc2VwID0gJ1x0JykgICMgUmVwbGFjZSB3aXRoIHRoZSBwYXRoIHRvIHlvdXIgZGF0YXNldAoKIyBTZWxlY3QgdGhlIGNvbHVtbiBvZiBpbnRlcmVzdApjb2x1bW5fbmFtZSA8LSAiT3JnYW5pc20iICAjIFJlcGxhY2Ugd2l0aCB0aGUgbmFtZSBvZiB0aGUgY29sdW1uIG9mIGludGVyZXN0CmNvbHVtbl9kYXRhIDwtIGFubm90X3RhYltbY29sdW1uX25hbWVdXQoKIyBDb3VudCB0aGUgb2NjdXJyZW5jZXMgb2YgdGhlIHN0cmluZ3MgaW4gdGhlIGNvbHVtbgpzdHJpbmdfY291bnRzIDwtIHRhYmxlKGNvbHVtbl9kYXRhKQoKIyBDb252ZXJ0IHRvIGEgZGF0YSBmcmFtZSwgc29ydCBieSBjb3VudCwgYW5kIHNlbGVjdCB0aGUgdG9wIDEwCnN0cmluZ19jb3VudHNfZGYgPC0gYXMuZGF0YS5mcmFtZShzdHJpbmdfY291bnRzKQpjb2xuYW1lcyhzdHJpbmdfY291bnRzX2RmKSA8LSBjKCJTdHJpbmciLCAiQ291bnQiKQpzdHJpbmdfY291bnRzX2RmIDwtIHN0cmluZ19jb3VudHNfZGZbb3JkZXIoc3RyaW5nX2NvdW50c19kZiRDb3VudCwgZGVjcmVhc2luZyA9IFRSVUUpLCBdCnRvcF8xMF9zdHJpbmdzIDwtIGhlYWQoc3RyaW5nX2NvdW50c19kZiwgbiA9IDEwKQoKIyBQbG90IHRoZSB0b3AgMTAgbW9zdCBjb21tb24gc3RyaW5ncyB1c2luZyBnZ3Bsb3QyCmdncGxvdCh0b3BfMTBfc3RyaW5ncywgYWVzKHggPSByZW9yZGVyKFN0cmluZywgLUNvdW50KSwgeSA9IENvdW50LCBmaWxsID0gU3RyaW5nKSkgKwogIGdlb21fYmFyKHN0YXQgPSAiaWRlbnRpdHkiLCBwb3NpdGlvbiA9ICJkb2RnZSIsIGNvbG9yID0gImJsYWNrIikgKwogIGxhYnModGl0bGUgPSAiVG9wIDEwIFNwZWNpZXMgaGl0cyIsCiAgICAgICB4ID0gY29sdW1uX25hbWUsCiAgICAgICB5ID0gIkNvdW50IikgKwogIHRoZW1lX21pbmltYWwoKSArCiAgdGhlbWUobGVnZW5kLnBvc2l0aW9uID0gIm5vbmUiKSArCiAgY29vcmRfZmxpcCgpCmBgYAoKYGBge3IgZ2Vub21lLXRvcC1nbywgZXZhbD1UUlVFfQoKI2RhdGEgPC0gcmVhZC5jc3YoIi4uL291dHB1dC9ibGFzdF9hbm5vdF9nby50YWIiLCBzZXAgPSAnXHQnKQoKIyBSZW5hbWUgdGhlIGBHZW5lLk9udG9sb2d5Li5iaW9sb2dpY2FsLnByb2Nlc3MuYCBjb2x1bW4gdG8gYEJpb2xvZ2ljYWxfUHJvY2Vzc2AKY29sbmFtZXMoYW5ub3RfdGFiKVtjb2xuYW1lcyhhbm5vdF90YWIpID09ICJHZW5lLk9udG9sb2d5Li5iaW9sb2dpY2FsLnByb2Nlc3MuIl0gPC0gIkJpb2xvZ2ljYWxfUHJvY2VzcyIKCiMgU2VwYXJhdGUgdGhlIGBCaW9sb2dpY2FsX1Byb2Nlc3NgIGNvbHVtbiBpbnRvIGluZGl2aWR1YWwgYmlvbG9naWNhbCBwcm9jZXNzZXMKZGF0YV9zZXBhcmF0ZWQgPC0gdW5saXN0KHN0cnNwbGl0KGFubm90X3RhYiRCaW9sb2dpY2FsX1Byb2Nlc3MsIHNwbGl0ID0gIjsiKSkKCiMgVHJpbSB3aGl0ZXNwYWNlIGZyb20gdGhlIGJpb2xvZ2ljYWwgcHJvY2Vzc2VzCmRhdGFfc2VwYXJhdGVkIDwtIGdzdWIoIl5cXHMrfFxccyskIiwgIiIsIGRhdGFfc2VwYXJhdGVkKQoKIyBDb3VudCB0aGUgb2NjdXJyZW5jZXMgb2YgZWFjaCBiaW9sb2dpY2FsIHByb2Nlc3MKcHJvY2Vzc19jb3VudHMgPC0gdGFibGUoZGF0YV9zZXBhcmF0ZWQpCnByb2Nlc3NfY291bnRzIDwtIGRhdGEuZnJhbWUoQmlvbG9naWNhbF9Qcm9jZXNzID0gbmFtZXMocHJvY2Vzc19jb3VudHMpLCBDb3VudCA9IGFzLmludGVnZXIocHJvY2Vzc19jb3VudHMpKQpwcm9jZXNzX2NvdW50cyA8LSBwcm9jZXNzX2NvdW50c1tvcmRlcigtcHJvY2Vzc19jb3VudHMkQ291bnQpLCBdCgojIFNlbGVjdCB0aGUgMjAgbW9zdCBwcmVkb21pbmFudCBiaW9sb2dpY2FsIHByb2Nlc3Nlcwp0b3BfMjBfcHJvY2Vzc2VzIDwtIHByb2Nlc3NfY291bnRzWzE6MjAsIF0KCiMgQ3JlYXRlIGEgY29sb3IgcGFsZXR0ZSBmb3IgdGhlIGJhcnMKYmFyX2NvbG9ycyA8LSByYWluYm93KG5yb3codG9wXzIwX3Byb2Nlc3NlcykpCgojIENyZWF0ZSBhIHN0YWdnZXJlZCB2ZXJ0aWNhbCBiYXIgcGxvdCB3aXRoIGRpZmZlcmVudCBjb2xvcnMgZm9yIGVhY2ggYmFyCmJhcnBsb3QodG9wXzIwX3Byb2Nlc3NlcyRDb3VudCwgbmFtZXMuYXJnID0gcmVwKCIiLCBucm93KHRvcF8yMF9wcm9jZXNzZXMpKSwgY29sID0gYmFyX2NvbG9ycywKICAgICAgICB5bGltID0gYygwLCBtYXgodG9wXzIwX3Byb2Nlc3NlcyRDb3VudCkgKiAxLjI1KSwKICAgICAgICBtYWluID0gIk9jY3VycmVuY2VzIG9mIHRoZSAyMCBNb3N0IFByZWRvbWluYW50IEJpb2xvZ2ljYWwgUHJvY2Vzc2VzIiwgeGxhYiA9ICJCaW9sb2dpY2FsIFByb2Nlc3MiLCB5bGFiID0gIkNvdW50IikKCgojIENyZWF0ZSBhIHNlcGFyYXRlIHBsb3QgZm9yIHRoZSBsZWdlbmQKcG5nKCIuLi9vdXRwdXQvMDItQXB1bC1yZWZlcmVuY2UtYW5ub3RhdGlvbi9HT2xlZ2VuZC5wbmciLCB3aWR0aCA9IDgwMCwgaGVpZ2h0ID0gNjAwKQpwYXIobWFyID0gYygwLCAwLCAwLCAwKSkKcGxvdC5uZXcoKQpsZWdlbmQoImNlbnRlciIsIGxlZ2VuZCA9IHRvcF8yMF9wcm9jZXNzZXMkQmlvbG9naWNhbF9Qcm9jZXNzLCBmaWxsID0gYmFyX2NvbG9ycywgY2V4ID0gMSwgdGl0bGUgPSAiQmlvbG9naWNhbCBQcm9jZXNzZXMiKQpkZXYub2ZmKCkKYGBgCgpgYGB7ciBnZW5vbWUtZ28tbGVnZW5kLCBldmFsPVRSVUUsIGZpZy53aWR0aCA9IDEwMCAsZmlnLmhlaWdodCA9IDEwMH0Ka25pdHI6OmluY2x1ZGVfZ3JhcGhpY3MoIi4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0dPbGVnZW5kLnBuZyIpCmBgYAoKYGBge3IgZ2Vub21lLXJlbW92ZS1sZWdlbmQtZmlsZSwgZW5naW5lPSdiYXNoJywgZXZhbD1UUlVFfQpybSAuLi9vdXRwdXQvMDItQXB1bC1yZWZlcmVuY2UtYW5ub3RhdGlvbi9HT2xlZ2VuZC5wbmcKYGBgCgoKIyBUcmFuc2NyaXB0b21lCiMjIFJldHJpZXZlIHRyYW5zY3JpcHRvbWUgZmFzdGEgZmlsZQoKV2Ugd2lsbCBsaWtlbHkgb25seSBtYWtlIHVzZSBvZiB0aGUgYW5ub3RhdGVkIGdlbm9tZSwgc2luY2Ugd2UgaGF2ZSBhbiBBLnB1bGNocmEgZ2Vub21lIG5vdyAoaW5zdGVhZCBvZiBBLm1pbGxlcG9yYSkuIElmIHdlIGRvIG5lZWQgdGhlIG1pbGxlcG9yYSB0cmFuc2NyaXB0b21lIHRob3VnaCwgSSBoYXZlIGNvZGUgYmVsb3cgZm9yIGFubm90YXRpb24KCldlJ2xsIGJlIHVzaW5nIHRoZSAqQS4gbWlsbGlwb3JhKiBbTkNCSV0oaHR0cHM6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9kYXRhc2V0cy9nZW5vbWUvR0NGXzAxMzc1Mzg2NS4xLykgcm5hLmZuYSBmaWxlLCBzdG9yZWQgW2hlcmVdKGh0dHBzOi8vZ2FubmV0LmZpc2gud2FzaGluZ3Rvbi5lZHUvYWNyb3BvcmEvRTUtZGVlcC1kaXZlL1RyYW5zY3JpcHRzL0FwdWxfR0NGXzAxMzc1Mzg2NS4xX3JuYS5mbmEpIGFuZCBhY2Nlc3NpYmxlIG9uIHRoZSBgZGVlcC1kaXZlYCBbZ2Vub21pYyByZXNvdXJjZXMgcGFnZV0oaHR0cHM6Ly9naXRodWIuY29tL3Vyb2wtZTUvZGVlcC1kaXZlL3dpa2kvU3BlY2llcy1DaGFyYWN0ZXJpc3RpY3MtYW5kLUdlbm9taWMtUmVzb3VyY2VzI2dlbm9taWMtcmVzb3VyY2VzKQoKYGBge3IgZG93bmxvYWQtdHJhbnNjcmlwdG9tZSwgZW5naW5lPSdiYXNoJ30KY3VybCBodHRwczovL2dhbm5ldC5maXNoLndhc2hpbmd0b24uZWR1L2Fjcm9wb3JhL0U1LWRlZXAtZGl2ZS9UcmFuc2NyaXB0cy9BcHVsX0dDRl8wMTM3NTM4NjUuMV9ybmEuZm5hIFwKLWsgXAo+IC4uLy4uL2RhdGEvQXB1bF9HQ0ZfMDEzNzUzODY1LjFfcm5hLmZuYQpgYGAKCkxldCdzIGNoZWNrIHRoZSBmaWxlCgpgYGB7ciB0cmFuc2NyaXB0b21lLXZpZXctcXVlcnksIGVuZ2luZT0nYmFzaCd9CmVjaG8gIkZpcnN0IGZldyBsaW5lczoiCmhlYWQgLTMgLi4vLi4vZGF0YS9BcHVsX0dDRl8wMTM3NTM4NjUuMV9ybmEuZm5hCgplY2hvICIiCmVjaG8gIkhvdyBtYW55IHNlcXVlbmNlcyBhcmUgdGhlcmU/IgpncmVwIC1jICI+IiAuLi8uLi9kYXRhL0FwdWxfR0NGXzAxMzc1Mzg2NS4xX3JuYS5mbmEKYGBgCgpgYGB7ciB0cmFuc2NyaXB0b21lLXNlcWxlbmd0aC1oaXN0b2dyYW19CiMgUmVhZCBGQVNUQSBmaWxlCmZhc3RhX2ZpbGUgPC0gIi4uLy4uL2RhdGEvQXB1bF9HQ0ZfMDEzNzUzODY1LjFfcm5hLmZuYSIgICMgUmVwbGFjZSB3aXRoIHRoZSBuYW1lIG9mIHlvdXIgRkFTVEEgZmlsZQpzZXF1ZW5jZXMgPC0gcmVhZEROQVN0cmluZ1NldChmYXN0YV9maWxlKQoKIyBDYWxjdWxhdGUgc2VxdWVuY2UgbGVuZ3RocwpzZXF1ZW5jZV9sZW5ndGhzIDwtIHdpZHRoKHNlcXVlbmNlcykKCiMgQ3JlYXRlIGEgZGF0YSBmcmFtZQpzZXF1ZW5jZV9sZW5ndGhzX2RmIDwtIGRhdGEuZnJhbWUoTGVuZ3RoID0gc2VxdWVuY2VfbGVuZ3RocykKCiMgUGxvdCBoaXN0b2dyYW0gdXNpbmcgZ2dwbG90MgpnZ3Bsb3Qoc2VxdWVuY2VfbGVuZ3Roc19kZiwgYWVzKHggPSBMZW5ndGgpKSArCiAgZ2VvbV9oaXN0b2dyYW0oYmlud2lkdGggPSAxLCBjb2xvciA9ICJibGFjayIsIGZpbGwgPSAiYmx1ZSIsIGFscGhhID0gMC43NSkgKwogIGxhYnModGl0bGUgPSAiSGlzdG9ncmFtIG9mIFNlcXVlbmNlIExlbmd0aHMiLAogICAgICAgeCA9ICJTZXF1ZW5jZSBMZW5ndGgiLAogICAgICAgeSA9ICJGcmVxdWVuY3kiKSArCiAgdGhlbWVfbWluaW1hbCgpCgpzdW1tYXJ5KHNlcXVlbmNlX2xlbmd0aHNfZGYpCmBgYAoKYGBge3IgdHJhbnNjcmlwdG9tZS1BQ0dULWNvbXBvc2l0aW9ufQoKIyBDYWxjdWxhdGUgYmFzZSBjb21wb3NpdGlvbgpiYXNlX2NvbXBvc2l0aW9uIDwtIGFscGhhYmV0RnJlcXVlbmN5KHNlcXVlbmNlcywgYmFzZU9ubHkgPSBUUlVFKQoKIyBDb252ZXJ0IHRvIGRhdGEgZnJhbWUgYW5kIHJlc2hhcGUgZm9yIGdncGxvdDIKYmFzZV9jb21wb3NpdGlvbl9kZiA8LSBhcy5kYXRhLmZyYW1lKGJhc2VfY29tcG9zaXRpb24pCmJhc2VfY29tcG9zaXRpb25fZGYkSUQgPC0gcm93bmFtZXMoYmFzZV9jb21wb3NpdGlvbl9kZikKYmFzZV9jb21wb3NpdGlvbl9tZWx0ZWQgPC0gcmVzaGFwZTI6Om1lbHQoYmFzZV9jb21wb3NpdGlvbl9kZiwgaWQudmFycyA9ICJJRCIsIHZhcmlhYmxlLm5hbWUgPSAiQmFzZSIsIHZhbHVlLm5hbWUgPSAiQ291bnQiKQoKIyBQbG90IGJhc2UgY29tcG9zaXRpb24gYmFyIGNoYXJ0IHVzaW5nIGdncGxvdDIKZ2dwbG90KGJhc2VfY29tcG9zaXRpb25fbWVsdGVkLCBhZXMoeCA9IEJhc2UsIHkgPSBDb3VudCwgZmlsbCA9IEJhc2UpKSArCiAgZ2VvbV9iYXIoc3RhdCA9ICJpZGVudGl0eSIsIHBvc2l0aW9uID0gImRvZGdlIiwgY29sb3IgPSAiYmxhY2siKSArCiAgbGFicyh0aXRsZSA9ICJCYXNlIENvbXBvc2l0aW9uIiwKICAgICAgIHggPSAiQmFzZSIsCiAgICAgICB5ID0gIkNvdW50IikgKwogIHRoZW1lX21pbmltYWwoKSArCiAgc2NhbGVfZmlsbF9tYW51YWwodmFsdWVzID0gYygiQSIgPSAiZ3JlZW4iLCAiQyIgPSAiYmx1ZSIsICJHIiA9ICJ5ZWxsb3ciLCAiVCIgPSAicmVkIikpCmBgYAoKCmBgYHtyIHRyYW5zY3JpcHRvbWUtY2ctbW90aWZzfQoKIyBDb3VudCBDRyBtb3RpZnMgaW4gZWFjaCBzZXF1ZW5jZQpjb3VudF9jZ19tb3RpZnMgPC0gZnVuY3Rpb24oc2VxdWVuY2UpIHsKICBjZ19tb3RpZiA8LSAiQ0ciCiAgcmV0dXJuKGxlbmd0aChncmVnZXhwcihjZ19tb3RpZiwgc2VxdWVuY2UsIGZpeGVkID0gVFJVRSlbWzFdXSkpCn0KCmNnX21vdGlmc19jb3VudHMgPC0gc2FwcGx5KHNlcXVlbmNlcywgY291bnRfY2dfbW90aWZzKQoKIyBDcmVhdGUgYSBkYXRhIGZyYW1lCmNnX21vdGlmc19jb3VudHNfZGYgPC0gZGF0YS5mcmFtZShDR19Db3VudCA9IGNnX21vdGlmc19jb3VudHMpCgojIFBsb3QgQ0cgbW90aWZzIGRpc3RyaWJ1dGlvbiB1c2luZyBnZ3Bsb3QyCmdncGxvdChjZ19tb3RpZnNfY291bnRzX2RmLCBhZXMoeCA9IENHX0NvdW50KSkgKwogIGdlb21faGlzdG9ncmFtKGJpbndpZHRoID0gMSwgY29sb3IgPSAiYmxhY2siLCBmaWxsID0gImJsdWUiLCBhbHBoYSA9IDAuNzUpICsKICBsYWJzKHRpdGxlID0gIkRpc3RyaWJ1dGlvbiBvZiBDRyBNb3RpZnMiLAogICAgICAgeCA9ICJOdW1iZXIgb2YgQ0cgTW90aWZzIiwKICAgICAgIHkgPSAiRnJlcXVlbmN5IikgKwogIHRoZW1lX21pbmltYWwoKQpgYGAKCiMjIERhdGFiYXNlIENyZWF0aW9uCgojIyMgT2J0YWluIEZhc3RhIChVbmlQcm90L1N3aXNzLVByb3QpCgpgYGB7ciBkb3dubG9hZC1VbmlQU3dpc3NQLWRhdGEsIGVuZ2luZT0nYmFzaCd9CmNkIC4uLy4uL2RhdGEKY3VybCAtTyBodHRwczovL2Z0cC51bmlwcm90Lm9yZy9wdWIvZGF0YWJhc2VzL3VuaXByb3QvY3VycmVudF9yZWxlYXNlL2tub3dsZWRnZWJhc2UvY29tcGxldGUvdW5pcHJvdF9zcHJvdC5mYXN0YS5negptdiB1bmlwcm90X3Nwcm90LmZhc3RhLmd6IHVuaXByb3Rfc3Byb3RfcjIwMjRfMTEuZmFzdGEuZ3oKZ3VuemlwIC1rIHVuaXByb3Rfc3Byb3RfcjIwMjRfMTEuZmFzdGEuZ3oKYGBgCgojIyMgTWFraW5nIHRoZSBkYXRhYmFzZQoKYGBge3IgbWFrZS1VbmlQU3dpc3NQLWJsYXN0ZGIsIGVuZ2luZT0nYmFzaCd9Ci9ob21lL3NoYXJlZC9uY2JpLWJsYXN0LTIuMTEuMCsvYmluL21ha2VibGFzdGRiIFwKLWluIC4uLy4uL2RhdGEvdW5pcHJvdF9zcHJvdF9yMjAyNF8xMS5mYXN0YSBcCi1kYnR5cGUgcHJvdCBcCi1vdXQgLi4vLi4vZGF0YS9ibGFzdGRiL3VuaXByb3Rfc3Byb3RfcjIwMjRfMTEKYGBgCgoKIyMgUnVubmluZyBCbGFzdHgKCmBgYHtyIHRyYW5zY3JpcHRvbWUtYmxhc3R4LCBlbmdpbmU9J2Jhc2gnfQovaG9tZS9zaGFyZWQvbmNiaS1ibGFzdC0yLjExLjArL2Jpbi9ibGFzdHggXAotcXVlcnkgLi4vLi4vZGF0YS9BcHVsX0dDRl8wMTM3NTM4NjUuMV9ybmEuZm5hIFwKLWRiIC4uLy4uL2RhdGEvYmxhc3RkYi91bmlwcm90X3Nwcm90X3IyMDI0XzExIFwKLW91dCAuLi9vdXRwdXQvMDItQXB1bC1yZWZlcmVuY2UtYW5ub3RhdGlvbi9BcHVsX0dDRl8wMTM3NTM4NjUuMV9ybmEtdW5pcHJvdF9ibGFzdHgudGFiIFwKLWV2YWx1ZSAxRS0yMCBcCi1udW1fdGhyZWFkcyA0MCBcCi1tYXhfdGFyZ2V0X3NlcXMgMSBcCi1vdXRmbXQgNgpgYGAKCmBgYHtyIHRyYW5zY3JpcHRvbWUtYmxhc3QtbG9vaywgZW5naW5lPSdiYXNoJ30KZWNobyAiRmlyc3QgZmV3IGxpbmVzOiIKaGVhZCAtMiAuLi9vdXRwdXQvMDItQXB1bC1yZWZlcmVuY2UtYW5ub3RhdGlvbi9BcHVsX0dDRl8wMTM3NTM4NjUuMV9ybmEtdW5pcHJvdF9ibGFzdHgudGFiCgplY2hvICJOdW1iZXIgb2YgbGluZXMgaW4gb3V0cHV0OiIKd2MgLWwgLi4vb3V0cHV0LzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vQXB1bF9HQ0ZfMDEzNzUzODY1LjFfcm5hLXVuaXByb3RfYmxhc3R4LnRhYgpgYGAKCgojIyBKb2luaW5nIEJsYXN0IHRhYmxlIHdpdGggYW5ub2F0aW9ucy4KCiMjIyBQcmVwcGluZyBCbGFzdCB0YWJsZSBmb3IgZWFzeSBqb2luCgpgYGB7ciB0cmFuc2NyaXB0b21lLXNlcGFyYXRlLCBlbmdpbmU9J2Jhc2gnfQp0ciAnfCcgJ1x0JyA8IC4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0FwdWxfR0NGXzAxMzc1Mzg2NS4xX3JuYS11bmlwcm90X2JsYXN0eC50YWIgXAo+IC4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0FwdWxfR0NGXzAxMzc1Mzg2NS4xX3JuYS11bmlwcm90X2JsYXN0eF9zZXAudGFiCgpoZWFkIC0xIC4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0FwdWxfR0NGXzAxMzc1Mzg2NS4xX3JuYS11bmlwcm90X2JsYXN0eF9zZXAudGFiCgpgYGAKCiMjIyBDb3VsZCBkbyBzb21lIGNvb2wgc3R1ZmYgaW4gUiBoZXJlIHJlYWRpbmcgaW4gdGFibGUKCmBgYHtyIHRyYW5zY3JpcHRvbWUtcmVhZC1kYXRhfQpibHRhYmwgPC0gcmVhZC5jc3YoIi4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0FwdWxfR0NGXzAxMzc1Mzg2NS4xX3JuYS11bmlwcm90X2JsYXN0eF9zZXAudGFiIiwgc2VwID0gJ1x0JywgaGVhZGVyID0gRkFMU0UpCgpzcGdvIDwtIHJlYWQuY3N2KCJodHRwczovL2dhbm5ldC5maXNoLndhc2hpbmd0b24uZWR1L3NlYXNoZWxsL3NuYXBzL3VuaXByb3RfdGFibGVfcjIwMjNfMDEudGFiIiwgc2VwID0gJ1x0JywgaGVhZGVyID0gVFJVRSkKCmRhdGF0YWJsZShoZWFkKGJsdGFibCksIG9wdGlvbnMgPSBsaXN0KHNjcm9sbFggPSBUUlVFLCBzY3JvbGxZID0gIjQwMHB4Iiwgc2Nyb2xsQ29sbGFwc2UgPSBUUlVFLCBwYWdpbmcgPSBGQUxTRSkpCmBgYAoKYGBge3IgdHJhbnNjcmlwdG9tZS1zcGdvLXRhYmxlfQpkYXRhdGFibGUoaGVhZChzcGdvKSwgb3B0aW9ucyA9IGxpc3Qoc2Nyb2xsWCA9IFRSVUUsIHNjcm9sbFkgPSAiNDAwcHgiLCBzY3JvbGxDb2xsYXBzZSA9IFRSVUUsIHBhZ2luZyA9IEZBTFNFKSkKYGBgCgpgYGB7ciB0cmFuc2NyaXB0b21lLXNlZX0KZGF0YXRhYmxlKAogIGxlZnRfam9pbihibHRhYmwsIHNwZ28sICBieSA9IGMoIlYzIiA9ICJFbnRyeSIpKSAlPiUKICBzZWxlY3QoVjEsIFYzLCBWMTMsIFByb3RlaW4ubmFtZXMsIE9yZ2FuaXNtLCBHZW5lLk9udG9sb2d5Li5iaW9sb2dpY2FsLnByb2Nlc3MuLCBHZW5lLk9udG9sb2d5LklEcykgCiAjICU+JSBtdXRhdGUoVjEgPSBzdHJfcmVwbGFjZV9hbGwoVjEscGF0dGVybiA9ICJzb2xpZDAwNzhfMjAxMTA0MTJfRlJBR19CQ19XSElURV9XSElURV9GM19RVl9TRV90cmltbWVkIiwgcmVwbGFjZW1lbnQgPSAiQWIiKSkKKQpgYGAKCmBgYHtyIHRyYW5zY3JpcHRvbWUtam9pbn0KYW5ub3RfdGFiIDwtCiAgbGVmdF9qb2luKGJsdGFibCwgc3BnbywgIGJ5ID0gYygiVjMiID0gIkVudHJ5IikpICU+JQogIHNlbGVjdChWMSwgVjMsIFYxMywgUHJvdGVpbi5uYW1lcywgT3JnYW5pc20sIEdlbmUuT250b2xvZ3kuLmJpb2xvZ2ljYWwucHJvY2Vzcy4sIEdlbmUuT250b2xvZ3kuSURzKQoKd3JpdGUudGFibGUoYW5ub3RfdGFiLCBmaWxlID0gIi4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0FwdWxfR0NGXzAxMzc1Mzg2NS4xX3JuYS1JRG1hcHBpbmctMjAyNF8wOF8yMS50YWIiLCBzZXAgPSAiXHQiLAogICAgICAgICAgICByb3cubmFtZXMgPSBUUlVFLCBjb2wubmFtZXMgPSBOQSkKYGBgCgpgYGB7ciB0cmFuc2NyaXB0b21lLXZpZXctaGVhZGVycywgZW5naW5lPSdiYXNoJ30KaGVhZCAtbiAzIC4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0FwdWxfR0NGXzAxMzc1Mzg2NS4xX3JuYS1JRG1hcHBpbmctMjAyNF8wOF8yMS50YWIKYGBgCgpgYGB7ciB0cmFuc2NyaXB0b21lLXNwZWNpZXMtaGl0c30KIyBSZWFkIGRhdGFzZXQKI2RhdGFzZXQgPC0gcmVhZC5jc3YoIi4uL291dHB1dC9ibGFzdF9hbm5vdF9nby50YWIiLCBzZXAgPSAnXHQnKSAgIyBSZXBsYWNlIHdpdGggdGhlIHBhdGggdG8geW91ciBkYXRhc2V0CgojIFNlbGVjdCB0aGUgY29sdW1uIG9mIGludGVyZXN0CmNvbHVtbl9uYW1lIDwtICJPcmdhbmlzbSIgICMgUmVwbGFjZSB3aXRoIHRoZSBuYW1lIG9mIHRoZSBjb2x1bW4gb2YgaW50ZXJlc3QKY29sdW1uX2RhdGEgPC0gYW5ub3RfdGFiW1tjb2x1bW5fbmFtZV1dCgojIENvdW50IHRoZSBvY2N1cnJlbmNlcyBvZiB0aGUgc3RyaW5ncyBpbiB0aGUgY29sdW1uCnN0cmluZ19jb3VudHMgPC0gdGFibGUoY29sdW1uX2RhdGEpCgojIENvbnZlcnQgdG8gYSBkYXRhIGZyYW1lLCBzb3J0IGJ5IGNvdW50LCBhbmQgc2VsZWN0IHRoZSB0b3AgMTAKc3RyaW5nX2NvdW50c19kZiA8LSBhcy5kYXRhLmZyYW1lKHN0cmluZ19jb3VudHMpCmNvbG5hbWVzKHN0cmluZ19jb3VudHNfZGYpIDwtIGMoIlN0cmluZyIsICJDb3VudCIpCnN0cmluZ19jb3VudHNfZGYgPC0gc3RyaW5nX2NvdW50c19kZltvcmRlcihzdHJpbmdfY291bnRzX2RmJENvdW50LCBkZWNyZWFzaW5nID0gVFJVRSksIF0KdG9wXzEwX3N0cmluZ3MgPC0gaGVhZChzdHJpbmdfY291bnRzX2RmLCBuID0gMTApCgojIFBsb3QgdGhlIHRvcCAxMCBtb3N0IGNvbW1vbiBzdHJpbmdzIHVzaW5nIGdncGxvdDIKZ2dwbG90KHRvcF8xMF9zdHJpbmdzLCBhZXMoeCA9IHJlb3JkZXIoU3RyaW5nLCAtQ291bnQpLCB5ID0gQ291bnQsIGZpbGwgPSBTdHJpbmcpKSArCiAgZ2VvbV9iYXIoc3RhdCA9ICJpZGVudGl0eSIsIHBvc2l0aW9uID0gImRvZGdlIiwgY29sb3IgPSAiYmxhY2siKSArCiAgbGFicyh0aXRsZSA9ICJUb3AgMTAgU3BlY2llcyBoaXRzIiwKICAgICAgIHggPSBjb2x1bW5fbmFtZSwKICAgICAgIHkgPSAiQ291bnQiKSArCiAgdGhlbWVfbWluaW1hbCgpICsKICB0aGVtZShsZWdlbmQucG9zaXRpb24gPSAibm9uZSIpICsKICBjb29yZF9mbGlwKCkKYGBgCgpgYGB7ciB0cmFuc2NyaXB0b21lLXRvcC1nb30KCiNkYXRhIDwtIHJlYWQuY3N2KCIuLi9vdXRwdXQvYmxhc3RfYW5ub3RfZ28udGFiIiwgc2VwID0gJ1x0JykKCiMgUmVuYW1lIHRoZSBgR2VuZS5PbnRvbG9neS4uYmlvbG9naWNhbC5wcm9jZXNzLmAgY29sdW1uIHRvIGBCaW9sb2dpY2FsX1Byb2Nlc3NgCmNvbG5hbWVzKGFubm90X3RhYilbY29sbmFtZXMoYW5ub3RfdGFiKSA9PSAiR2VuZS5PbnRvbG9neS4uYmlvbG9naWNhbC5wcm9jZXNzLiJdIDwtICJCaW9sb2dpY2FsX1Byb2Nlc3MiCgojIFNlcGFyYXRlIHRoZSBgQmlvbG9naWNhbF9Qcm9jZXNzYCBjb2x1bW4gaW50byBpbmRpdmlkdWFsIGJpb2xvZ2ljYWwgcHJvY2Vzc2VzCmRhdGFfc2VwYXJhdGVkIDwtIHVubGlzdChzdHJzcGxpdChhbm5vdF90YWIkQmlvbG9naWNhbF9Qcm9jZXNzLCBzcGxpdCA9ICI7IikpCgojIFRyaW0gd2hpdGVzcGFjZSBmcm9tIHRoZSBiaW9sb2dpY2FsIHByb2Nlc3NlcwpkYXRhX3NlcGFyYXRlZCA8LSBnc3ViKCJeXFxzK3xcXHMrJCIsICIiLCBkYXRhX3NlcGFyYXRlZCkKCiMgQ291bnQgdGhlIG9jY3VycmVuY2VzIG9mIGVhY2ggYmlvbG9naWNhbCBwcm9jZXNzCnByb2Nlc3NfY291bnRzIDwtIHRhYmxlKGRhdGFfc2VwYXJhdGVkKQpwcm9jZXNzX2NvdW50cyA8LSBkYXRhLmZyYW1lKEJpb2xvZ2ljYWxfUHJvY2VzcyA9IG5hbWVzKHByb2Nlc3NfY291bnRzKSwgQ291bnQgPSBhcy5pbnRlZ2VyKHByb2Nlc3NfY291bnRzKSkKcHJvY2Vzc19jb3VudHMgPC0gcHJvY2Vzc19jb3VudHNbb3JkZXIoLXByb2Nlc3NfY291bnRzJENvdW50KSwgXQoKIyBTZWxlY3QgdGhlIDIwIG1vc3QgcHJlZG9taW5hbnQgYmlvbG9naWNhbCBwcm9jZXNzZXMKdG9wXzIwX3Byb2Nlc3NlcyA8LSBwcm9jZXNzX2NvdW50c1sxOjIwLCBdCgojIENyZWF0ZSBhIGNvbG9yIHBhbGV0dGUgZm9yIHRoZSBiYXJzCmJhcl9jb2xvcnMgPC0gcmFpbmJvdyhucm93KHRvcF8yMF9wcm9jZXNzZXMpKQoKIyBDcmVhdGUgYSBzdGFnZ2VyZWQgdmVydGljYWwgYmFyIHBsb3Qgd2l0aCBkaWZmZXJlbnQgY29sb3JzIGZvciBlYWNoIGJhcgpiYXJwbG90KHRvcF8yMF9wcm9jZXNzZXMkQ291bnQsIG5hbWVzLmFyZyA9IHJlcCgiIiwgbnJvdyh0b3BfMjBfcHJvY2Vzc2VzKSksIGNvbCA9IGJhcl9jb2xvcnMsCiAgICAgICAgeWxpbSA9IGMoMCwgbWF4KHRvcF8yMF9wcm9jZXNzZXMkQ291bnQpICogMS4yNSksCiAgICAgICAgbWFpbiA9ICJPY2N1cnJlbmNlcyBvZiB0aGUgMjAgTW9zdCBQcmVkb21pbmFudCBCaW9sb2dpY2FsIFByb2Nlc3NlcyIsIHhsYWIgPSAiQmlvbG9naWNhbCBQcm9jZXNzIiwgeWxhYiA9ICJDb3VudCIpCgoKIyBDcmVhdGUgYSBzZXBhcmF0ZSBwbG90IGZvciB0aGUgbGVnZW5kCnBuZygiLi4vb3V0cHV0LzAyLUFwdWwtcmVmZXJlbmNlLWFubm90YXRpb24vR09sZWdlbmQucG5nIiwgd2lkdGggPSA4MDAsIGhlaWdodCA9IDYwMCkKcGFyKG1hciA9IGMoMCwgMCwgMCwgMCkpCnBsb3QubmV3KCkKbGVnZW5kKCJjZW50ZXIiLCBsZWdlbmQgPSB0b3BfMjBfcHJvY2Vzc2VzJEJpb2xvZ2ljYWxfUHJvY2VzcywgZmlsbCA9IGJhcl9jb2xvcnMsIGNleCA9IDEsIHRpdGxlID0gIkJpb2xvZ2ljYWwgUHJvY2Vzc2VzIikKZGV2Lm9mZigpCmBgYAoKYGBge3IgdHJhbnNjcmlwdG9tZS1nby1sZWdlbmQsIGZpZy53aWR0aCA9IDEwMCAsZmlnLmhlaWdodCA9IDEwMH0Ka25pdHI6OmluY2x1ZGVfZ3JhcGhpY3MoIi4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0dPbGVnZW5kLnBuZyIpCmBgYAoKYGBge3IgdHJhbnNjcmlwdG9tZS1yZW1vdmUtbGVnZW5kLWZpbGUsIGVuZ2luZT0nYmFzaCd9CnJtIC4uL291dHB1dC8wMi1BcHVsLXJlZmVyZW5jZS1hbm5vdGF0aW9uL0dPbGVnZW5kLnBuZwpgYGAKCgoKCgoKCg==