# Nucleotide_Essentials.jl

## Data Types

Nucleotide_Essentials.FastqRecordType
Nucleotide_Essentials.FastqRecord

Components

• ID: The unique sequence identifier associated with that entry
• sequence: The nucleotide sequence of that entry
• quality: The quality scores of that entry
• filename: The original file name
source
Nucleotide_Essentials.FastaRecordType
Nucleotide_Essentials.FastaRecord

Components

• ID: The unique sequence identifier associated with that entry
• sequence: The nucleotide sequence of that entry
• filename: The original file name
source

## Functions

Nucleotide_Essentials.readFastqFunction
Nucleotide_Essentials.readFastq
readFastq(Path::String)

.fastq file => readFastq(Path) => FastqRecord(ID, sequence, quality, filename)

supported keyword arguments include:

• Path::String: The full or relative path to a .fastq file

Example:

# Supply the path to a .Fastq file that you would like to import
myfastq = readFastq("myfastq.fastq")
source
Nucleotide_Essentials.readFastaFunction
Nucleotide_Essentials.readFasta

Imports a .fasta file into julia

readFasta(Path::String)

.fasta file => readFasta(Path) => FastaRecord(ID, sequence, filename)

Supported keyword arguments include:

• Path::String: The full or relative path to a .fasta file

Example:

# Supply the path to a .fasta file that you would like to import - it is recommended to include ; in your command to prevent printing potentially large .fasta files in the REPL
myfasta = readFasta("myfasta.fasta");
source
Nucleotide_Essentials.writeFastaFunction
Nucleotide_Essentials.writeFasta

readFasta(Path::String) FastaRecord => writefasta(inputfasta, out, compressed) => .fasta file/.fasta.gz file

Creates a single or multiple entry FastaRecord and outputs either a .fasta or compressed .fasta.gz file to the desired directory

Supported keyword arguments include:

• input_fasta::FastaRecord: A FastaRecord with either a single entry or multiple entries
• out::String: The full or relative path to the directory where files should be written to
• compressed::Bool: Whether or not to write the .fasta files as compressed files or not.
• If true, files will written as .fasta.gz files
• If false, files will written as .fasta files

Example:

# .fasta files can be written as from an already imported FastaRecord in Julia
writeFasta(input_fasta, "example/output/directory, false)

# .fasta files can be written as a .fasta.gz from an already imported FastaRecord in Julia
writeFasta(input_fasta, "example/output/directory", true)

# .fasta files with multiple sequences can be read and written as individual .fasta or .fasta.gz in the same step
writeFasta(readFasta("/myfasta.fasta"), "example/output/directory", true);
source
Nucleotide_Essentials.FastqtoFastaFunction
Nucleotide_Essentials.FastqtoFasta

Converts a FastqRecord to a FastaRecord. Can also input and convert a .fastq file to a FastaRecord in the same function.

FastqtoFasta(Fastq::Union{String, FastqRecord}) .fastq file => FastqtoFasta(Fastq) => FastaRecord(ID, sequence, filename) FastqRecord(ID, sequence, quality, filename) => FastqtoFasta(Fastq) => FastaRecord(ID, sequence, filename)

Supported keyword arguments include:

• Fastq::Union{String, FastqRecord}:
• The full or relative path to a .fastq file
• A FastqRecord

Example:

# Supply the path to a .fastq file that you would like to convert to a FastRecord
myfasta = FastqtoFasta("myfastq.fastq");

# Alternatively, a FastqRecord can be used as the input
myfasta = FastqtoFasta(myFastqRecord);
source
Nucleotide_Essentials.FilterQuality_seFunction
Nucleotide_Essentials.FilterQuality_se

Filters an input .fastq file based upon the encoded Phred+33 or Phred+64 quality scores. The encoding of the reads is automatically deteremined by looking for unique encoding in Phred+33 and Phred+64. Phred+64 encoding is identified by searching for ^, a, ], and f.

Reads are filtered based upon the number of expected errors ($\mathrm{E}$) based on the error rate based on quality score and the sum of error probabilities, following the equation:

$\mathrm{E} = \sum{_ip_i} = \sum{_i}10^{\frac{-Q_i}{10}}$

Stringent filtering (maxEE = 1) is used by default but can be adjusted by the user.

Reads that pass the filtering parameters are output to a file ending in _FilteredReads.fastq in the user-determined directory, as indicated by out.

Supported keyword arguments include:

• read1::String: Path to the reads to undergo quality filtering
• out::String: Path to the directory where reads that pass the quality filtering should be written
• maxEE::Int64 (optional): The max number of expected errors a read can include as the filtering parameter (default: maxEE = 1)
• verbose::Bool (optional): Whether or not to show some intermediary feedback on the progress of the function (default = false)

Example:

FilterQuality_se("forward_R1.fasta", "/outdirectory")
source
Nucleotide_Essentials.FilterQuality_peFunction
Nucleotide_Essentials.FilterQuality_pe

Filters an input .fastq file based upon the encoded Phred+33 or Phred+64 quality scores. The encoding of the reads is automatically deteremined by looking for unique encoding in Phred+33 and Phred+64. Phred+64 encoding is identified by searching for ^, a, ], and f.

Reads are filtered based upon the number of expected errors ($\mathrm{E}$) based on the error rate based on quality score and the sum of error probabilities, following the equation:

$\mathrm{E} = \sum{_ip_i} = \sum{_i}10^{\frac{-Q_i}{10}}$

Stringent filtering (maxEE = 1) is used by default but can be adjusted by the user.

Output Files:

• If both the forward and reverse reads pass the filtering parameters:
• Forward reads are output to a file ending in R1_Paired_filtered.fastq in the user-determined directory, as indicated by out
• Reverse reads are output to a file ending in R2_Paired_filtered.fastq in the user-determined directory, as indicated by out
• If only the forward read passes the filtering parameters:
• Forward reads are output to a file ending in R1_Unpaired_filtered.fastq in the user-determined directory, as indicated by out
• Reverse reads are not written to a file
• If only the reverse read passes the filtering parameters:
• Reverse reads are output to a file ending in R2_Unpaired_filtered.fastq in the user-determined directory, as indicated by out
• Forward reads are not written to a file

Supported keyword arguments include:

• read1::String: Path to the forward reads to undergo quality filtering
• read2::String: Path to the reverse reads to undergo quality filtering
• out::String: Path to the directory where reads that pass the quality filtering should be written
• maxEE::Int64 (optional): The max number of expected errors a read can include as the filtering parameter (default: maxEE = 1)
• verbose::Bool (optional): Whether or not to show some intermediary feedback on the progress of the function (default = false)

Example:

FilterQuality_pe("forward_R1.fasta", "reverse_R2.fasta", "/outdirectory")

# changing the filtering parameters
FilterQuality_pe("forward_R1.fasta", "reverse_R2.fasta", "/outdirectory", 2, true)
source
Nucleotide_Essentials.PlotQualityFunction
Nucleotide_Essentials.PlotQuality

Returns a plot of the quality profile of a .fastq or .fastq.gz file

This function plots a visual summary of the distribution of quality scores (automatically detects Phred+33 or Phred+64 encoding) as a function of sequence position for the input fastq file(s).

The plotted lines show summary statistics at each sequence position:

• green is the mean
• dashed red lines are the 25th and 75th quantiles

Supported keyword arguments include:

• Input::FastqRecord: The name of a FastqRecord for plotting
• verbose::Bool (optional): Whether or not to show some intermediary feedback on the progress of the function (default = false)
• outputfigure::Bool (optional): Whether or not to output a .png file with the created QualityPlot (default = false)
• figurepath::String (optional): If outputting a .png figure to file, specify the path to a directory where the file should be written to (default = pwd())

Example:

# A quality profile can be created by supply the path to a .fastq or .fastq.gz file
PlotQuality("path/to/my/file.fastq")
source
Nucleotide_Essentials.potential_mismatchesFunction
Nucleotide_Essentials.potential_mismatches

Returns an Vector{Any} of potential barcodes with a single nucleotide change, including both deletions and substitutions

Supported keyword arguments include:

• Path::String: The full or relative path to a .fastq file
• mismatch::Int64: The number of altered nucleotides to include (1 is only supported at this time)

Example:

potential_mismatches("GCGT", 1)
17-element Vector{Any}:
"GCGT"
"CCGT"
"ACGT"
"TCGT"
"GGGT"
"GAGT"
"GTGT"
"GCCT"
"GCAT"
"GCTT"
"GCGG"
"GCGC"
"GCGA"
"CGT"
"GGT"
"GCT"
"GCG"
source
Nucleotide_Essentials.reverse_complementFunction
Nucleotide_Essentials.reverse_complement

Takes a string of nucleotide bases and returns the reverse complement of that string. Accepts inputs of String and SubString{String} (input from a FastqRecord)

Supported keyword arguments include:

• sequence::Union{String, SubString{String}}: A string sequence of nucleotide bases or sequence entry from a FastqRecord

Example:

reverse_complement("ATCGT")
"ACGAT"
source
Nucleotide_Essentials.demultiplex_seFunction
Nucleotide_Essentials.demultiplex_se

Compares a list of provided barcodes with the provided multiplexed reads and separates the reads into individual .fastq files. If a barcode is found within the read, the barcode is removed from the sequence. The quality data of the reads is preserved and written to the outputted .fastq file. If a barcode is not found, the sequnce and quality is written to the unassigned .fastq file unchanged.

The mapping file must be either a .csv or .txt file with two columns. The first column heading must be SampleID and the second column heading must be BarcodeSequence.

EXAMPLE MAPPING FILE:

SampleIDBarcodeSequence
Sample1Barcode1
Sample2Barcode2
Sample3Barcode3
Sample4Barcode4
Sample5Barcode5
Sample6Barcode6
Sample7Barcode7
Sample8Barcode8

Supported keyword arguments include:

• R1::String: Path to multiplexed reads
• Map::String: Path to the mapping file
• mismatch::Int64=0 (optional): Number of allowed mismatches in barcode. Potential options include 0 or 1. If 1 mismatch, computation time will significantly increase. Default is to allow for 0 mismatches (exact matches only).
• debug::Bool=false (optional): If true, a log file will be created and debugging data will be printed while the function is running (default is false).

Example:

demultiplex_se("multiplexreads.fastq", "mapping_file.fastq")
source
Nucleotide_Essentials.demultiplex_peFunction
Nucleotide_Essentials.demultiplex_pe

Compares a list of provided barcodes with the provided paired-end multiplexed reads and separates the reads into individual .fastq files. If a barcode is found within R1 reads, the barcode is removed from the sequence. The quality data of the reads is preserved and written to the outputted .fastq file. If a barcode is not found, the sequnce and quality is written to the R1 unassigned .fastq file unchanged. If a barcode is found within R2 reads, the barcode is removed from the sequence. The quality data of the reads is preserved and written to the outputted .fastq file. If a barcode is not found, the sequnce and quality is written to the R2 unassigned .fastq file unchanged.

Dual-indexed reads are not yet supported

The mapping file must be either a .csv or .txt file with two columns. The first column heading must be SampleID and the second column heading must be BarcodeSequence.

EXAMPLE MAPPING FILE:

SampleIDBarcodeSequence
Sample1Barcode1
Sample2Barcode2
Sample3Barcode3
Sample4Barcode4
Sample5Barcode5
Sample6Barcode6
Sample7Barcode7
Sample8Barcode8

Supported keyword arguments include:

• R1::String: Path to forward multiplexed reads
• R2::String: Path to reverse multiplexed reads
• Map::String: Path to the mapping file
• mismatch::Int64=0 (optional): Number of allowed mismatches in barcode. Potential options include 0 or 1. If 1 mismatch, computation time will significantly increase. Default is to allow for 0 mismatches (exact matches only).
• debug::Bool=false (optional): If true, a log file will be created and debugging data will be printed while the function is running (default is false).

Example:

demultiplex_pe("forward_multiplexreads.fastq", "reverse_multiplexreads.fastq", "mapping_file.fastq")
source

## Change Log

#### Nucleotide_Essentials v0.2.0

• Performance improvements in PlotQuality() and added support for exporting quality plots