# Step 1: Get Sequence Data from Azenta through `sftp` Sarah Tanja October 9, 2024 - [Goals](#goals) - [Get sequences from Azenta server](#get-sequences-from-azenta-server) - [Check file integrity with `md5sum`](#check-file-integrity-with-md5sum) - [Summary](#summary) # Goals - Transfer project files to Roberts Lab raven server as a working directory - Transfer project files to Roberts Lab gannet server for backup - Verify checksums using `md5sum` Environment: - raven : RStudio Server hosted on a Unix OS - gannet: Linux , command line # Get sequences from Azenta server 1. First access the server that you want to download sequences to. For this you will `.ssh` into the server. 2. From within the server, navigate using `cd` commands to the directory that you want the files copied to 3. Once inside your target directory, `.ssh` into the Azenta server using `sftp username@azenta.genewiz.com` 4. Type in the password to access the azenta server (sent to you in an email from azenta when they notified you that the sequences were generated) # Check file integrity with `md5sum` What is an MD5 checksum? An MD5 checksum is a set of 32 hexadecimal letters and numbers that represent a file’s mathematical algorithm. It’s used to verify that a file is an accurate copy of the original and hasn’t been modified or corrupted.
[Learn How to Generate and Verify Files with MD5 Checksum in Linux](https://www.tecmint.com/generate-verify-check-files-md5-checksum-linux/)
`6C14_R1_001.fastq.gz.md5` is a MD5 checksum output file that Azenta generated, it looks like: ``` bash cd ../rawfastq/00_fastq/ less -S 6C14_R1_001.fastq.gz.md5 ``` 986886738a844beca568362da97600c9 ./6C14_R1_001.fastq.gz `986886738a844beca568362da97600c9 ./6C14_R1_001.fastq.gz` The `md5sum` command will **generate a MD5 checksum** for the file I downloaded from the Azenta server: ``` bash md5sum ../rawfastq/00_fastq/6C14_R1_001.fastq.gz ``` 986886738a844beca568362da97600c9 ../rawfastq/00_fastq/6C14_R1_001.fastq.gz Success! The checksums are the same. So now we can automate this process with `md5sum -c` ``` bash md5sum --help ``` Usage: md5sum [OPTION]... [FILE]... Print or check MD5 (128-bit) checksums. With no FILE, or when FILE is -, read standard input. -b, --binary read in binary mode -c, --check read MD5 sums from the FILEs and check them --tag create a BSD-style checksum -t, --text read in text mode (default) The following five options are useful only when verifying checksums: --ignore-missing don't fail or report status for missing files --quiet don't print OK for each successfully verified file --status don't output anything, status code shows success --strict exit non-zero for improperly formatted checksum lines -w, --warn warn about improperly formatted checksum lines --help display this help and exit --version output version information and exit The sums are computed as described in RFC 1321. When checking, the input should be a former output of this program. The default mode is to print a line with checksum, a space, a character indicating input mode ('*' for binary, ' ' for text or where binary is insignificant), and name for each FILE. GNU coreutils online help: Full documentation at: or available locally via: info '(coreutils) md5sum invocation' The following command will: - look at all of the `*md5` files generated by azenta - generate MD5 checksums for each `*.fastq.gz` file that the `*md5` file points to - compare the MD5 checksum from the azenta provided `*md5` file to the generated MD5 checksums ``` bash cd ../rawfastq/00_fastq # move to the directory that has both fastq.gz and md5 files md5sum -c 6C14_R1_001.fastq.gz.md5 ``` ./6C14_R1_001.fastq.gz: OK > [!CAUTION] > > **Handling File Paths**: Ensure that the paths inside the `.md5` files > correctly reference their associated files, whether relative or > absolute. `md5sum -c` relies on these paths to locate the files. You > should be running `md5sum -c` from within the directory that contains > both the `.md5` and `fastq.gz` files! Now lets do this for all the files transferred Use the `md5sum -c` command to compare the downloaded against all of Azenta’s generated md5sums (the files ending in `.md5`) : ``` bash cd ../rawfastq/00_fastq md5sum -c *.md5 ``` ./101112C14_R1_001.fastq.gz: OK ./101112C14_R2_001.fastq.gz: OK ./101112C4_R1_001.fastq.gz: OK ./101112C4_R2_001.fastq.gz: OK ./101112C9_R1_001.fastq.gz: OK ./101112C9_R2_001.fastq.gz: OK ./101112H14_R1_001.fastq.gz: OK ./101112H14_R2_001.fastq.gz: OK ./101112H4_R1_001.fastq.gz: OK ./101112H4_R2_001.fastq.gz: OK ./101112H9_R1_001.fastq.gz: OK ./101112H9_R2_001.fastq.gz: OK ./101112L14_R1_001.fastq.gz: OK ./101112L14_R2_001.fastq.gz: OK ./101112L4_R1_001.fastq.gz: OK ./101112L4_R2_001.fastq.gz: OK ./101112L9_R1_001.fastq.gz: OK ./101112L9_R2_001.fastq.gz: OK ./101112M14_R1_001.fastq.gz: OK ./101112M14_R2_001.fastq.gz: OK ./101112M4_R1_001.fastq.gz: OK ./101112M4_R2_001.fastq.gz: OK ./101112M9_R1_001.fastq.gz: OK ./101112M9_R2_001.fastq.gz: OK ./123C14_R1_001.fastq.gz: OK ./123C14_R2_001.fastq.gz: OK ./123C4_R1_001.fastq.gz: OK ./123C4_R2_001.fastq.gz: OK ./123C9_R1_001.fastq.gz: OK ./123C9_R2_001.fastq.gz: OK ./123H14_R1_001.fastq.gz: OK ./123H14_R2_001.fastq.gz: OK ./123H4_R1_001.fastq.gz: OK ./123H4_R2_001.fastq.gz: OK ./123H9_R1_001.fastq.gz: OK ./123H9_R2_001.fastq.gz: OK ./123L14_R1_001.fastq.gz: OK ./123L14_R2_001.fastq.gz: OK ./123L4_R1_001.fastq.gz: OK ./123L4_R2_001.fastq.gz: OK ./123L9_R1_001.fastq.gz: OK ./123L9_R2_001.fastq.gz: OK ./123M4_R1_001.fastq.gz: OK ./123M4_R2_001.fastq.gz: OK ./123M9_R1_001.fastq.gz: OK ./123M9_R2_001.fastq.gz: OK ./131415C14_R1_001.fastq.gz: OK ./131415C14_R2_001.fastq.gz: OK ./131415C4_R1_001.fastq.gz: OK ./131415C4_R2_001.fastq.gz: OK ./131415C9_R1_001.fastq.gz: OK ./131415C9_R2_001.fastq.gz: OK ./131415H14_R1_001.fastq.gz: OK ./131415H14_R2_001.fastq.gz: OK ./131415H4_R1_001.fastq.gz: OK ./131415H4_R2_001.fastq.gz: OK ./131415H9_R1_001.fastq.gz: OK ./131415H9_R2_001.fastq.gz: OK ./131415L14_R1_001.fastq.gz: OK ./131415L14_R2_001.fastq.gz: OK ./131415L4_R1_001.fastq.gz: OK ./131415L4_R2_001.fastq.gz: OK ./131415L9_R1_001.fastq.gz: OK ./131415L9_R2_001.fastq.gz: OK ./131415M14_R1_001.fastq.gz: OK ./131415M14_R2_001.fastq.gz: OK ./131415M4_R1_001.fastq.gz: OK ./131415M4_R2_001.fastq.gz: OK ./131415M9_R1_001.fastq.gz: OK ./131415M9_R2_001.fastq.gz: OK ./13M14_R1_001.fastq.gz: OK ./13M14_R2_001.fastq.gz: OK ./456C4_R1_001.fastq.gz: OK ./456C4_R2_001.fastq.gz: OK ./456H4_R1_001.fastq.gz: OK ./456H4_R2_001.fastq.gz: OK ./456H9_R1_001.fastq.gz: OK ./456H9_R2_001.fastq.gz: OK ./456L14_R1_001.fastq.gz: OK ./456L14_R2_001.fastq.gz: OK ./456L4_R1_001.fastq.gz: OK ./456L4_R2_001.fastq.gz: OK ./456L9_R1_001.fastq.gz: OK ./456L9_R2_001.fastq.gz: OK ./456M14_R1_001.fastq.gz: OK ./456M14_R2_001.fastq.gz: OK ./456M4_R1_001.fastq.gz: OK ./456M4_R2_001.fastq.gz: OK ./456M9_R1_001.fastq.gz: OK ./456M9_R2_001.fastq.gz: OK ./45C14_R1_001.fastq.gz: OK ./45C14_R2_001.fastq.gz: OK ./45C9_R1_001.fastq.gz: OK ./45C9_R2_001.fastq.gz: OK ./45H14_R1_001.fastq.gz: OK ./45H14_R2_001.fastq.gz: OK ./67C9_R1_001.fastq.gz: OK ./67C9_R2_001.fastq.gz: OK ./6C14_R1_001.fastq.gz: OK ./6C14_R2_001.fastq.gz: OK ./789C4_R1_001.fastq.gz: OK ./789C4_R2_001.fastq.gz: OK ./789H14_R1_001.fastq.gz: OK ./789H14_R2_001.fastq.gz: OK ./789H4_R1_001.fastq.gz: OK ./789H4_R2_001.fastq.gz: OK ./789H9_R1_001.fastq.gz: OK ./789H9_R2_001.fastq.gz: OK ./789L14_R1_001.fastq.gz: OK ./789L14_R2_001.fastq.gz: OK ./789L4_R1_001.fastq.gz: OK ./789L4_R2_001.fastq.gz: OK ./789L9_R1_001.fastq.gz: OK ./789L9_R2_001.fastq.gz: OK ./789M14_R1_001.fastq.gz: OK ./789M14_R2_001.fastq.gz: OK ./789M4_R1_001.fastq.gz: OK ./789M4_R2_001.fastq.gz: OK ./7C14_R1_001.fastq.gz: OK ./7C14_R2_001.fastq.gz: OK ./89C14_R1_001.fastq.gz: OK ./89C14_R2_001.fastq.gz: OK ./89C9_R1_001.fastq.gz: OK ./89C9_R2_001.fastq.gz: OK ./89M9_R1_001.fastq.gz: OK ./89M9_R2_001.fastq.gz: OK > [!TIP] > > Each file should show `OK` ! # Summary